Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: Collapsing with strings


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   RE: st: Collapsing with strings
Date   Mon, 16 Jan 2006 13:04:35 -0000

L'esprit de l'escalier:

When observations come in blocks of 2 another
approach is possible that removes the need for 
sorting on each variable, although one prior

sort id 

is needed. 

foreach var of varlist var1-var3 {
	by id: replace `var' = `var'[3 - _n] if missing(`var') 
}

When the missing values are in the first observation, _n 
is 1 and 3 - _n is 2. When the second, _n is 2 and 3 - _n
is 1. Hence 3 - _n indexes the other observation in blocks 
of 2. This works regardless of variable type. 

Nick 
[email protected] 

Nick Cox 
 
> You're correct. I misread the question. Sorry. 
 
Gary Longton
  
> > Daphna Bassok wrote:
> > 
> > > I have several duplicate observations in my data set.  
> > However, they are 
> > > not perfect duplicates.  Only the id # is the same. So 
> > there might be 
> > > two observations with id#16 for instance, the first will 
> > have values for 
> > > some variables, and missing values for others. The second 
> > also have some 
> > > values filled and some missing.  There are no cases in 
> > which both have 
> > > values- that is... either the first in the pair has the 
> > value OR the 
> > > second has a value (or neither).
> > > 
> > > For example: suppose I have two observations with id# 16... 
> >  The first 
> > > has values for var1 and  2 and not 3.   The second ONLY has 
> > values for 
> > > var 3.   What i would like to do is simply collapse these 
> > into a single 
> > > observation with all the relevant info. meaning, 1 
> observation with 
> > > id#16 that has values for all three variables.
> > > 
> > > I am trying to do this with the collapse command with no success.
> > > 
> > > My code is:
> > > 
> > > collapse (min) var1-var3, by(id)
> > > 
> > > I thought this would create a new observation that has all 
> > the data in it.
> > > 
> > > I am getting a "type mismatch" error.
> > > 
> > > Is this because some of my variables are string variables?
> > 
> > Nick Cox suggested:
> > 
> > > What you can do is -- if your description is correct -- 
> > > 
> > > egen nmiss = rowmiss(<insert variable names>) 
> > > bysort id (nmiss) : keep if _n == 1
> > > 
> > > as the sort will sort the observation with 
> > > more missings to second place. 
> > 
> > and Austin Nichols suggested:
> > 
> > > foreach v of varlist put all the relevant varnames here {
> > >  bys id (`v'): qui replace `v'=`v'[_n-1] if mi(`v')
> > > }
> > > bys id: drop if _n>1
> > 
> > It is a rare day when one can make a correction to a 
> > typically accurate and 
> > elegant Nick Cox solution, so I make this one fearing that 
> > I've probably missed 
> > somthing obvious.
> > 
> > If I understand the problem correctly, I think this solution 
> > will discard 
> > non-missing data for some variables.
> > 
> > Eg. a simplified dataset like this seems consistent with 
> > Daphna's description:
> > 
> > obs   id   var1   var2  var3  nmiss
> > 
> > 1     16    a      .      3     1
> > 2     16    .      2      .     2
> > 3     17    .      .      7     2
> > 4     17    c      3      .     1
> > 
> > sorting on nmiss will discard observations 2 & 3, throwing 
> > away non-missing data 
> > for var2
> > 
> > As Austin's solution suggests, one needs to sort separately 
> > for each variable in 
> > the list and carry out the replace for that variable.  
> > However missings sort 
> > differently for string and numeric variables, taking first 
> > place for strings and 
> > last for numerics, so need to be handled differently in a 
> > sort solution. 
> > Austin's solution won't sort correctly for string variables.
> > 
> > There is probably a shorter approach, but I think this will do it:
> > 
> > ********
> > foreach var of varlist var1-var3 {
> >       if substr("`:type `var''",1,3) == "str" {
> > 		bysort id (`var') replace `var' = `var'[_N]
> > 	}
> > 	else {
> > 		bysort id (`var') replace `var' = `var'[1]
> > 	}
> > }
> > bysort id: drop if _n>1
> > ************
>  
> 

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index