Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Merging longitudinal data set

From   Maarten buis <>
Subject   Re: st: Merging longitudinal data set
Date   Fri, 16 Jul 2010 22:16:47 +0000 (GMT)

--- On Fri, 16/7/10, Andreas Jensen wrote:
> What I'm troubled about is that there are people from wave
> 1 that has dropped out when wave 2 was conducted (their ID 
> does not exist in the wave 2 data file), and there has been
> added additional people in wave 2 that aren't present in
> wave 1 (their ID does not exist in the wave 1 data file).
> I have sorted each data file according to the ID variable
> and then executed a merge 1:1 on the ID with wave 1 as
> master. I get the following output.
>     Result          # of obs.
>     -----------------------------------------
>     not matched     28,046
>     from master     12,373  (_merge==1)
>     from using      15,673  (_merge==2)
>     matched         18,742  (_merge==3)
>     -----------------------------------------
> So assuming that my command is correct, is it then true
> that there are 18742 individuals in both waves, 12373
> individuals which has dropped out after wave 1 and 15673
> individuals that have been added in wave 2?

That interpretation is correct. One thing that might be
going on is the precision problem. Stata stores by default
all variables as floats, i.e. with 8 digits of accuracy. 
I think that is a good default: the typical variable in a
dataset is a measurement of some sort, and we are often
happy if such measurements have 2 digits of accuracy, so 8
digits is more than enough. However, it can cause problems
with id variables: it is not uncommon that ids are 
generated such that they contain more than 8 digits, and
in order for them to match between datasets they need to
be stored exactly. So you need to make sure that when 
this is the case, you import your dataset so that this 
variable is imported as either a double or a long or a 
string. See for example:

Hope this helps,

Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen


*   For searches and help try:

© Copyright 1996–2016 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index