Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: merging aggregate and survey data with different state codes


From   Austin Nichols <austinnichols@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: RE: merging aggregate and survey data with different state codes
Date   Tue, 20 Nov 2012 21:10:04 -0500

OP contacted me off list saying "thanks but..." and said that only
codes 1 to 50 were shown, whereas he needed codes from 11 to 95.

The example below produces this output, which is clearly a crosswalk
from one ordering 1-51 to Census state codes:

                      st   code   n2
                 Alabama     63    1
                  Alaska     94    2
                 Arizona     86    3
                Arkansas     71    4
              California     93    5
                Colorado     84    6
             Connecticut     16    7
                Delaware     51    8
    District of Columbia     53    9
                 Florida     59   10
                 Georgia     58   11
                  Hawaii     95   12
                   Idaho     82   13
                Illinois     33   14
                 Indiana     32   15
                    Iowa     42   16
                  Kansas     47   17
                Kentucky     61   18
               Louisiana     72   19
                   Maine     11   20
                Maryland     52   21
           Massachusetts     14   22
                Michigan     34   23
               Minnesota     41   24
             Mississippi     64   25
                Missouri     43   26
                 Montana     81   27
                Nebraska     46   28
                  Nevada     88   29
           New Hampshire     12   30
              New Jersey     22   31
              New Mexico     85   32
                New York     21   33
          North Carolina     56   34
            North Dakota     44   35
                    Ohio     31   36
                Oklahoma     73   37
                  Oregon     92   38
            Pennsylvania     23   39
            Rhode Island     15   40
          South Carolina     57   41
            South Dakota     45   42
               Tennessee     62   43
                   Texas     74   44
                    Utah     87   45
                 Vermont     13   46
                Virginia     54   47
              Washington     91   48
           West Virginia     55   49
               Wisconsin     35   50
                 Wyoming     83   51

Normally I ignore any off-list traffic, but can anyone confirm or deny
any problem with this generated crosswalk?
Or any problem with the general approach?
OP says states are 1 to 50, so perhaps DC should be deleted from the
list, but that is easy to do:

ssc inst ddf2dct, replace
 clear
 tempfile do dct
 loc d http://www.nber.org/cps/cpsmar92.ddf
 ddf2dct using `d', dct(`dct') do(`do')
 run `do'
 drop _all
 set obs 95
 g cps=_n
 la val cps hg_st60
 decode cps, gen(st)
 keep if !mi(st)
replace st=trim(st)
 g code=cps
 drop if code==53
 sort st
 g n2=_n
 keep st code n2
 list, noo clean


On Tue, Nov 20, 2012 at 3:51 PM, Austin Nichols <austinnichols@gmail.com> wrote:
> Eric M. Uslaner <euslaner@umd.edu>:
> You should be able to find a crosswalk, or you can make one yourself
> from data that has states defined using those Census codes. Try e.g.
>
> ssc inst ddf2dct, replace
> clear
> tempfile do dct
> loc d http://www.nber.org/cps/cpsmar92.ddf
> ddf2dct using `d', dct(`dct') do(`do')
> run `do'
> drop _all
> set obs 95
> g cps=_n
> la val cps hg_st60
> decode cps, gen(st)
> keep if !mi(st)
> replace st=trim(st)
> g code=cps
> sort st
> g n2=_n
> keep st code n2
> list, noo clean
>
>
> On Tue, Nov 20, 2012 at 3:05 PM, Eric M. Uslaner <euslaner@umd.edu> wrote:
>> Greetings,
>>
>> I have two data sets and want to merge the aggregate data set into the survey data set.  Ultimately I want to merge the two data sets by Congressional district (which is how the aggregate data set is organized).  But the major problem is that the two data sets use different codes for states.  For the aggregate data, the states are numbered from 1-50 alphabetically.  But for the survey data the state codes are:
>>
>> tabl C3_PPSTATEN
>>
>> C3_PPSTATEN (State)     code  |   freq
>> ------------------------------+--------
>>                  me      11   |      4
>>                  nh      12   |      3
>>                  vt      13   |      4
>>                  ma      14   |     34
>>                  ri      15   |      2
>>                  ct      16   |     20
>>                  ny      21   |     76
>>                  nj      22   |     42
>>                  pa      23   |     60
>>                  oh      31   |     56
>>                  in      32   |     24
>>                  il      33   |     49
>>                  mi      34   |     36
>>                  wi      35   |     23
>>                  mn      41   |     35
>>                  ia      42   |     18
>>                  mo      43   |     31
>>                  nd      44   |      1
>>                  sd      45   |      8
>>                  ne      46   |     12
>>                  ks      47   |     14
>>                  de      51   |      5
>>                  md      52   |     33
>>                  dc      53   |      4
>>                  va      54   |     29
>>                  wv      55   |     13
>>                  nc      56   |     41
>>                  sc      57   |     18
>>                  ga      58   |     41
>>                  fl      59   |     89
>>                  ky      61   |     16
>>                  tn      62   |     29
>>                  al      63   |     19
>>                  ms      64   |      8
>>                  ar      71   |     12
>>                  la      72   |     15
>>                  ok      73   |     15
>>                  tx      74   |     92
>>                  mt      81   |      6
>>                  id      82   |      3
>>                  co      84   |     17
>>                  nm      85   |      7
>>                  az      86   |     33
>>                  ut      87   |     10
>>                  nv      88   |     20
>>                  wa      91   |     34
>>                  or      92   |     20
>>                  ca      93   |    127
>>                  ak      94   |      1
>>                  hi      95   |      6
>> ------------------------------+--------
>>                       Total   |   1315
>>
>> Clearly there is no one-to-one function I can think of that would transform the state numbers in the aggregate data set.  I could, of course, create a new variable step by step:
>>
>> gen int statenew=.
>> replace statenew= 1 if statenum == 1  (where statenum is the numeric equivalent of state; or alternatively "if state=="AR")
>> Etc. through replace statenew=55 if statenum == 49 (statenum = 50 is WY and there are no respondents in the survey from WY)
>>
>> This seems rather cumbersome.  If necessary I'll do it.  Does anyone have any easier way of doing this?
>>
>> Thanks much,
>>
>> Ric Uslaner
>> euslaner@umd.edu

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index