Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: RE: merging aggregate and survey data with different state codes
From 
 
Rebecca Pope <[email protected]> 
To 
 
[email protected] 
Subject 
 
Re: st: RE: merging aggregate and survey data with different state codes 
Date 
 
Fri, 23 Nov 2012 08:33:49 -0600 
Ric wrote in response to a digest mailing
(http://www.stata.com/statalist/archive/2012-11/msg00874.html):
"Thanks for your efforts Austin and Rebecca.  Problem still not
solved, but I appreciate your efforts." I'm responding to the original
thread in case anyone is following this in future.
Ric,
Can you tell us where in the process Austin's code/proposed approach
is failing? Please be as detailed as possible. I was able to match his
generated crosswalk to the data you pasted into your original post.You
should be able to use the new sequential numbers to match to your
aggregate data without a problem, so if that's not working, there is
clearly some little detail that is off.
Regards,
Rebecca
On Tue, Nov 20, 2012 at 10:13 PM, Rebecca Pope <[email protected]>
wrote:
>
> Austin, If by "problem with the crosswalk" you mean an inaccuracy, no,
> the crosswalk appears fine. All state names line up w/ their
> abbreviations & states are in alphabetical order 1 to 50 (this is w/
> the modification to drop DC).
>
> I also think the general approach is good. It is very easy to merge
> the generated crosswalk (on "code") to the survey data. This can be
> verified using the codes and state abbreviations that Ric posted. From
> that it should also be easy to merge the "aggregate" data referred to
> in the original post (on n2, which Ric will presumably give the same
> name as the ID variable in his data). I don't see how to accomplish
> Ric's request without this additional -merge- (crosswalk with survey)
> but that is surely better than typing 50 -replace- statements!
>
> -Rebecca
> Rebecca
>
>
>
>          __o                __o
>       _`\ <,_            _`\ <,_
>      (_)/   (_)          (_)/   (_)
> =========================
>
>
> On Tue, Nov 20, 2012 at 8:10 PM, Austin Nichols <[email protected]>
> wrote:
> > OP contacted me off list saying "thanks but..." and said that only
> > codes 1 to 50 were shown, whereas he needed codes from 11 to 95.
> >
> > The example below produces this output, which is clearly a crosswalk
> > from one ordering 1-51 to Census state codes:
> >
> >                       st   code   n2
> >                  Alabama     63    1
> >                   Alaska     94    2
> >                  Arizona     86    3
> >                 Arkansas     71    4
> >               California     93    5
> >                 Colorado     84    6
> >              Connecticut     16    7
> >                 Delaware     51    8
> >     District of Columbia     53    9
> >                  Florida     59   10
> >                  Georgia     58   11
> >                   Hawaii     95   12
> >                    Idaho     82   13
> >                 Illinois     33   14
> >                  Indiana     32   15
> >                     Iowa     42   16
> >                   Kansas     47   17
> >                 Kentucky     61   18
> >                Louisiana     72   19
> >                    Maine     11   20
> >                 Maryland     52   21
> >            Massachusetts     14   22
> >                 Michigan     34   23
> >                Minnesota     41   24
> >              Mississippi     64   25
> >                 Missouri     43   26
> >                  Montana     81   27
> >                 Nebraska     46   28
> >                   Nevada     88   29
> >            New Hampshire     12   30
> >               New Jersey     22   31
> >               New Mexico     85   32
> >                 New York     21   33
> >           North Carolina     56   34
> >             North Dakota     44   35
> >                     Ohio     31   36
> >                 Oklahoma     73   37
> >                   Oregon     92   38
> >             Pennsylvania     23   39
> >             Rhode Island     15   40
> >           South Carolina     57   41
> >             South Dakota     45   42
> >                Tennessee     62   43
> >                    Texas     74   44
> >                     Utah     87   45
> >                  Vermont     13   46
> >                 Virginia     54   47
> >               Washington     91   48
> >            West Virginia     55   49
> >                Wisconsin     35   50
> >                  Wyoming     83   51
> >
> > Normally I ignore any off-list traffic, but can anyone confirm or deny
> > any problem with this generated crosswalk?
> > Or any problem with the general approach?
> > OP says states are 1 to 50, so perhaps DC should be deleted from the
> > list, but that is easy to do:
> >
> > ssc inst ddf2dct, replace
> >  clear
> >  tempfile do dct
> >  loc d http://www.nber.org/cps/cpsmar92.ddf
> >  ddf2dct using `d', dct(`dct') do(`do')
> >  run `do'
> >  drop _all
> >  set obs 95
> >  g cps=_n
> >  la val cps hg_st60
> >  decode cps, gen(st)
> >  keep if !mi(st)
> > replace st=trim(st)
> >  g code=cps
> >  drop if code==53
> >  sort st
> >  g n2=_n
> >  keep st code n2
> >  list, noo clean
> >
> >
> > On Tue, Nov 20, 2012 at 3:51 PM, Austin Nichols
> > <[email protected]> wrote:
> >> Eric M. Uslaner <[email protected]>:
> >> You should be able to find a crosswalk, or you can make one yourself
> >> from data that has states defined using those Census codes. Try e.g.
> >>
> >> ssc inst ddf2dct, replace
> >> clear
> >> tempfile do dct
> >> loc d http://www.nber.org/cps/cpsmar92.ddf
> >> ddf2dct using `d', dct(`dct') do(`do')
> >> run `do'
> >> drop _all
> >> set obs 95
> >> g cps=_n
> >> la val cps hg_st60
> >> decode cps, gen(st)
> >> keep if !mi(st)
> >> replace st=trim(st)
> >> g code=cps
> >> sort st
> >> g n2=_n
> >> keep st code n2
> >> list, noo clean
> >>
> >>
> >> On Tue, Nov 20, 2012 at 3:05 PM, Eric M. Uslaner <[email protected]>
> >> wrote:
> >>> Greetings,
> >>>
> >>> I have two data sets and want to merge the aggregate data set into the
> >>> survey data set.  Ultimately I want to merge the two data sets by
> >>> Congressional district (which is how the aggregate data set is organized).
> >>> But the major problem is that the two data sets use different codes for
> >>> states.  For the aggregate data, the states are numbered from 1-50
> >>> alphabetically.  But for the survey data the state codes are:
> >>>
> >>> tabl C3_PPSTATEN
> >>>
> >>> C3_PPSTATEN (State)     code  |   freq
> >>> ------------------------------+--------
> >>>                  me      11   |      4
> >>>                  nh      12   |      3
> >>>                  vt      13   |      4
> >>>                  ma      14   |     34
> >>>                  ri      15   |      2
> >>>                  ct      16   |     20
> >>>                  ny      21   |     76
> >>>                  nj      22   |     42
> >>>                  pa      23   |     60
> >>>                  oh      31   |     56
> >>>                  in      32   |     24
> >>>                  il      33   |     49
> >>>                  mi      34   |     36
> >>>                  wi      35   |     23
> >>>                  mn      41   |     35
> >>>                  ia      42   |     18
> >>>                  mo      43   |     31
> >>>                  nd      44   |      1
> >>>                  sd      45   |      8
> >>>                  ne      46   |     12
> >>>                  ks      47   |     14
> >>>                  de      51   |      5
> >>>                  md      52   |     33
> >>>                  dc      53   |      4
> >>>                  va      54   |     29
> >>>                  wv      55   |     13
> >>>                  nc      56   |     41
> >>>                  sc      57   |     18
> >>>                  ga      58   |     41
> >>>                  fl      59   |     89
> >>>                  ky      61   |     16
> >>>                  tn      62   |     29
> >>>                  al      63   |     19
> >>>                  ms      64   |      8
> >>>                  ar      71   |     12
> >>>                  la      72   |     15
> >>>                  ok      73   |     15
> >>>                  tx      74   |     92
> >>>                  mt      81   |      6
> >>>                  id      82   |      3
> >>>                  co      84   |     17
> >>>                  nm      85   |      7
> >>>                  az      86   |     33
> >>>                  ut      87   |     10
> >>>                  nv      88   |     20
> >>>                  wa      91   |     34
> >>>                  or      92   |     20
> >>>                  ca      93   |    127
> >>>                  ak      94   |      1
> >>>                  hi      95   |      6
> >>> ------------------------------+--------
> >>>                       Total   |   1315
> >>>
> >>> Clearly there is no one-to-one function I can think of that would
> >>> transform the state numbers in the aggregate data set.  I could, of course,
> >>> create a new variable step by step:
> >>>
> >>> gen int statenew=.
> >>> replace statenew= 1 if statenum == 1  (where statenum is the numeric
> >>> equivalent of state; or alternatively "if state=="AR")
> >>> Etc. through replace statenew=55 if statenum == 49 (statenum = 50 is
> >>> WY and there are no respondents in the survey from WY)
> >>>
> >>> This seems rather cumbersome.  If necessary I'll do it.  Does anyone
> >>> have any easier way of doing this?
> >>>
> >>> Thanks much,
> >>>
> >>> Ric Uslaner
> >>> [email protected]
> >
> > *
> > *   For searches and help try:
> > *   http://www.stata.com/help.cgi?search
> > *   http://www.stata.com/support/faqs/resources/statalist-faq/
> > *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/