Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: merging aggregate and survey data with different state codes


From   Rebecca Pope <rebecca.a.pope@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: RE: merging aggregate and survey data with different state codes
Date   Tue, 20 Nov 2012 22:13:16 -0600

Austin, If by "problem with the crosswalk" you mean an inaccuracy, no,
the crosswalk appears fine. All state names line up w/ their
abbreviations & states are in alphabetical order 1 to 50 (this is w/
the modification to drop DC).

I also think the general approach is good. It is very easy to merge
the generated crosswalk (on "code") to the survey data. This can be
verified using the codes and state abbreviations that Ric posted. From
that it should also be easy to merge the "aggregate" data referred to
in the original post (on n2, which Ric will presumably give the same
name as the ID variable in his data). I don't see how to accomplish
Ric's request without this additional -merge- (crosswalk with survey)
but that is surely better than typing 50 -replace- statements!

-Rebecca
Rebecca



         __o                __o
      _`\ <,_            _`\ <,_
     (_)/   (_)          (_)/   (_)
=========================


On Tue, Nov 20, 2012 at 8:10 PM, Austin Nichols <austinnichols@gmail.com> wrote:
> OP contacted me off list saying "thanks but..." and said that only
> codes 1 to 50 were shown, whereas he needed codes from 11 to 95.
>
> The example below produces this output, which is clearly a crosswalk
> from one ordering 1-51 to Census state codes:
>
>                       st   code   n2
>                  Alabama     63    1
>                   Alaska     94    2
>                  Arizona     86    3
>                 Arkansas     71    4
>               California     93    5
>                 Colorado     84    6
>              Connecticut     16    7
>                 Delaware     51    8
>     District of Columbia     53    9
>                  Florida     59   10
>                  Georgia     58   11
>                   Hawaii     95   12
>                    Idaho     82   13
>                 Illinois     33   14
>                  Indiana     32   15
>                     Iowa     42   16
>                   Kansas     47   17
>                 Kentucky     61   18
>                Louisiana     72   19
>                    Maine     11   20
>                 Maryland     52   21
>            Massachusetts     14   22
>                 Michigan     34   23
>                Minnesota     41   24
>              Mississippi     64   25
>                 Missouri     43   26
>                  Montana     81   27
>                 Nebraska     46   28
>                   Nevada     88   29
>            New Hampshire     12   30
>               New Jersey     22   31
>               New Mexico     85   32
>                 New York     21   33
>           North Carolina     56   34
>             North Dakota     44   35
>                     Ohio     31   36
>                 Oklahoma     73   37
>                   Oregon     92   38
>             Pennsylvania     23   39
>             Rhode Island     15   40
>           South Carolina     57   41
>             South Dakota     45   42
>                Tennessee     62   43
>                    Texas     74   44
>                     Utah     87   45
>                  Vermont     13   46
>                 Virginia     54   47
>               Washington     91   48
>            West Virginia     55   49
>                Wisconsin     35   50
>                  Wyoming     83   51
>
> Normally I ignore any off-list traffic, but can anyone confirm or deny
> any problem with this generated crosswalk?
> Or any problem with the general approach?
> OP says states are 1 to 50, so perhaps DC should be deleted from the
> list, but that is easy to do:
>
> ssc inst ddf2dct, replace
>  clear
>  tempfile do dct
>  loc d http://www.nber.org/cps/cpsmar92.ddf
>  ddf2dct using `d', dct(`dct') do(`do')
>  run `do'
>  drop _all
>  set obs 95
>  g cps=_n
>  la val cps hg_st60
>  decode cps, gen(st)
>  keep if !mi(st)
> replace st=trim(st)
>  g code=cps
>  drop if code==53
>  sort st
>  g n2=_n
>  keep st code n2
>  list, noo clean
>
>
> On Tue, Nov 20, 2012 at 3:51 PM, Austin Nichols <austinnichols@gmail.com> wrote:
>> Eric M. Uslaner <euslaner@umd.edu>:
>> You should be able to find a crosswalk, or you can make one yourself
>> from data that has states defined using those Census codes. Try e.g.
>>
>> ssc inst ddf2dct, replace
>> clear
>> tempfile do dct
>> loc d http://www.nber.org/cps/cpsmar92.ddf
>> ddf2dct using `d', dct(`dct') do(`do')
>> run `do'
>> drop _all
>> set obs 95
>> g cps=_n
>> la val cps hg_st60
>> decode cps, gen(st)
>> keep if !mi(st)
>> replace st=trim(st)
>> g code=cps
>> sort st
>> g n2=_n
>> keep st code n2
>> list, noo clean
>>
>>
>> On Tue, Nov 20, 2012 at 3:05 PM, Eric M. Uslaner <euslaner@umd.edu> wrote:
>>> Greetings,
>>>
>>> I have two data sets and want to merge the aggregate data set into the survey data set.  Ultimately I want to merge the two data sets by Congressional district (which is how the aggregate data set is organized).  But the major problem is that the two data sets use different codes for states.  For the aggregate data, the states are numbered from 1-50 alphabetically.  But for the survey data the state codes are:
>>>
>>> tabl C3_PPSTATEN
>>>
>>> C3_PPSTATEN (State)     code  |   freq
>>> ------------------------------+--------
>>>                  me      11   |      4
>>>                  nh      12   |      3
>>>                  vt      13   |      4
>>>                  ma      14   |     34
>>>                  ri      15   |      2
>>>                  ct      16   |     20
>>>                  ny      21   |     76
>>>                  nj      22   |     42
>>>                  pa      23   |     60
>>>                  oh      31   |     56
>>>                  in      32   |     24
>>>                  il      33   |     49
>>>                  mi      34   |     36
>>>                  wi      35   |     23
>>>                  mn      41   |     35
>>>                  ia      42   |     18
>>>                  mo      43   |     31
>>>                  nd      44   |      1
>>>                  sd      45   |      8
>>>                  ne      46   |     12
>>>                  ks      47   |     14
>>>                  de      51   |      5
>>>                  md      52   |     33
>>>                  dc      53   |      4
>>>                  va      54   |     29
>>>                  wv      55   |     13
>>>                  nc      56   |     41
>>>                  sc      57   |     18
>>>                  ga      58   |     41
>>>                  fl      59   |     89
>>>                  ky      61   |     16
>>>                  tn      62   |     29
>>>                  al      63   |     19
>>>                  ms      64   |      8
>>>                  ar      71   |     12
>>>                  la      72   |     15
>>>                  ok      73   |     15
>>>                  tx      74   |     92
>>>                  mt      81   |      6
>>>                  id      82   |      3
>>>                  co      84   |     17
>>>                  nm      85   |      7
>>>                  az      86   |     33
>>>                  ut      87   |     10
>>>                  nv      88   |     20
>>>                  wa      91   |     34
>>>                  or      92   |     20
>>>                  ca      93   |    127
>>>                  ak      94   |      1
>>>                  hi      95   |      6
>>> ------------------------------+--------
>>>                       Total   |   1315
>>>
>>> Clearly there is no one-to-one function I can think of that would transform the state numbers in the aggregate data set.  I could, of course, create a new variable step by step:
>>>
>>> gen int statenew=.
>>> replace statenew= 1 if statenum == 1  (where statenum is the numeric equivalent of state; or alternatively "if state=="AR")
>>> Etc. through replace statenew=55 if statenum == 49 (statenum = 50 is WY and there are no respondents in the survey from WY)
>>>
>>> This seems rather cumbersome.  If necessary I'll do it.  Does anyone have any easier way of doing this?
>>>
>>> Thanks much,
>>>
>>> Ric Uslaner
>>> euslaner@umd.edu
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index