Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: AW: Merging observations


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: AW: Merging observations
Date   Thu, 11 Dec 2008 20:22:02 -0000

Whether a variable is an identifier can be asked directly using -isid- without the need for installing user-written add-ons. That's not quite the main issue here, as there is no problem working with -Case Person- together. 

That's said, let's spell out Jochen's good solution in a little more detail. 

In terms of Jim's original question, the implication is that 

var1 var2 var3 

each have at most one non-missing value for each distinct combination of -Case- and -Person-. Otherwise what Jim wants to do is a bad idea. 

That's directly checkable: 

foreach v in var1 var2 var3 { 
	bysort Case Person (`v') : assert missing(`v'[2]) 
} 

Consider -var1- as an example. 

After -sort Case Person var1-, any single non-missing value of -var1- will be first within groups defined by -Case Person-. The second value should be missing. If it's not, the assumption is incorrect, i.e. there are at least 2 non-missing values. 

Assuming good health in this sense, Jochen's solution is then to -collapse-. 
	
collapse var1 var2 var3, by(Case Person)

which will work fine as the mean of one non-missing value is always the same value. 

Nick 
n.j.cox@durham.ac.uk 

Jochen Späth

I am wondering if case and person are actually the only identifiers in your data set because they do not uniquely identify your observations, maybe you should check this with the -unique- command first (if not installed type 
-ssc install unique-). Anyway, if they are the only identifiers as you stated in your question, you could try -collape var1 var2 var3, by(case person)-. This, however, will only help if per case and person you have really only one non-missing observation in each variable to be collapsed, as showed in your example. Otherwise, the collapsed mean would be false.

Jim O'Grady

I have a dataset like this:

Case Person var1 var2 var3
345   01         1     .      .
345   01         .      1     .
345   02         .      1     .
345   02         .      .      1

I want to merge the observations which have the same identifiers (case
and person) to get

Case Person var1 var2 var3
345   01         1     1      .
345   02         .      1     1

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index