Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: merging aggregate and survey data with different state codes


From   Rebecca Pope <rebecca.a.pope@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: RE: merging aggregate and survey data with different state codes
Date   Mon, 26 Nov 2012 14:36:36 -0600

I received a private e-mail with datasets attached and saying the following:
"Thanks much for your interest.  The situation is more complicated
[...] At a more fundamental level I don’t understand what Austin
suggested—it is nice that people give answers but neither he nor you
gave me any idea as to how to use this solution with my data rather
than data from NBER).  This solution may just as well have been
written in Japanese.  But as you can see the problem is more
complicated especially since encode seems to have failed to give the
correct numerical values of a string variable."

I thought sufficient detail on how to use the solution with his data
was supplied. In my previous post, I noted that 2 merges would be
required: crosswalk-to-data and data-to-data. Austin may have had
another strategy in mind, but to my knowledge, a crosswalk implies a
merge. I am not sure what was unclear about Austin's crosswalk or my
follow-up verifying it. In case of not understanding/still having
trouble, the appropriate response is to post _exactly_ where you run
into a problem. This sort of public dialog helps those who might have
a problem similar to yours. However, here it is again, as clear as I
can make it. I am going to confine my remarks here to the original
question: How do you merge datasets when the IDs are different?
Separate questions, i.e. issues with -encode-, should be handled in a
separate post.

Step 1: Get a crosswalk. Austin's previously posted code to do this
(http://www.stata.com/statalist/archive/2012-11/msg00819.html).
                st      code   n2
         Alabama     63    1
            Alaska     94    2
           Arizona     86    3
        Arkansas     71    4
        California     93    5

Step 2: Rename variables in crosswalk to match study datasets.
***
rename code C3_PPSTATEN /* State code for survey data */
rename n2 statenum /* State code for aggregate data */
***

Step 3: Merge crosswalk to aggregate data, adding "C3_PPTATEN" to the
aggregate data. You can visually compare the state names (st) from
Austin's crosswalk to the state names from the aggregate data (State)
and see that this crosswalk is accurate. For that matter, you could
have merged on the text field and skipped the whole business of
generating n2 in this case.
***
merge 1:1 statenum using "Statalist\teapartyfactions2010.dta"
list st State C3_PPSTATEN statenum in 1/5, noobs clean
***
            st           State   C3_PPS~N   statenum
    Alabama      Alabama         63          1
        Alaska         Alaska         94          2
       Arizona       Arizona         86          3
    Arkansas     Arkansas         71          4
    California    California         93           5

Step 4: Merge survey data to aggregate data using C3_PPSTATEN.
***
merge 1:m C3_PPSTATEN using "Statalist\anes2010egss3small.dta", gen(_merge2)
drop if _merge2!=3  *Gets rid of WY (no survey) & DC (no aggregate
data); modify at will
preserve
bys st: keep if _n==1
list st C3_PPSTATEN statenum statenew in 1/5, noobs clean
restore
***
                st   C3_PPS~N   statenum   statenew
         Alabama         63          1         al
            Alaska         94          2         ak
           Arizona         86          3         az
        Arkansas         71          4         ar
        California         93          5         ca

If a step in this particular process fails, please let us know what
the error is. If it produces results different from what you want,
post a _short_ example of the ideal end result and your input data.
Specifics will get you better help.

Regards,
Rebecca

<snip>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index