Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: How can I combine datasets


From   Teresio Poggio <terlist@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: How can I combine datasets
Date   Fri, 16 Sep 2011 00:20:20 +0200

Hello Kevin,

it seems to me more an issue of the rationale beyond your data
management problem, or an issue in data cleaning, rather than an issue
of stata commands.

If I've not misunderstood, you have 2 datasets containing different
variables for five years (I expect the same 5 years in both dataset,
coded in the same way). Counties vary in the different years (but I
expect they are coded in the same way in both the datasets, for the
same year; you should have carefully checked for this).
I also expect you don't have perfect correspondence on the
year-counties between the datasets, otherwise you won't experience
problems. So this seems to me not just a problem of time-variant
county definitions but also a problem of either missing information
for some year-counties or different codings between the two datasets.
I'd suggest to carefully check for this.

Two possible strategies:

a) keeping only time-invariant counties (your idea - is this ok for
your purposes?):
just merge 1:1 the two dataset using  both year and county as key
variables. Then ispect the (authomatically produced) variable _merge
for matched (the ok ones) and unmatched cases.
(help merge for details)
Before dropping unmatched cases I would check for possible error in
codings and I'd assess missing data

b) dropping cases is a waste of available information. If it make
sense to your purposes and given the economy of your research, you may
wish to find a way to conciliate the different county definitions.
(I'm not considering here possible missing data or differences in
codings)

Switching to geographical areas and not counties (but the same logic
applies), suppose you have (either in the same data set or in the two
datasets)

a record like this (absolutely fictional):

Area      Year   Population
UK         2001  1,000

and a few records like this

Area     Year   Population
England 1991  987
Scotland 1991 456
Wales   1991    345

In order to avoid dropping cases you may wish to transform the latter
records in

Area      Year   Population
UK         1991  1,788

and then manage (merge) in accordance with the data for UK 2001
In this case you'd have to do some extra work (find a consistent way
to conciliate county definitions) and use the collapse command in
Stata (help collapse for details) . This would also imply that the
collapsing functions you use (sum in my example, it may be another
one) is meaningful for your data and to your purposes.

HTH

Teresio







-- 
____________________________________________________
dr. Teresio Poggio
LaboR - Dipartimento di Sociologia e ricerca sociale
Università degli studi di Trento
Via Verdi, 26
38100 Trento, Italy
Tel   +39 0461/881406
fax:  +39 0461/881348

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index