When observations come in blocks of 1
then missings are replaced by missings
with this approach. So, a little
unnecessary, but no harm done. Some
might prefer to use the code below
and add an extra condition
... & _N == 2
Nick
[email protected]
Nick Cox
> L'esprit de l'escalier:
>
> When observations come in blocks of 2 another
> approach is possible that removes the need for
> sorting on each variable, although one prior
>
> sort id
>
> is needed.
>
> foreach var of varlist var1-var3 {
> by id: replace `var' = `var'[3 - _n] if missing(`var')
> }
>
> When the missing values are in the first observation, _n
> is 1 and 3 - _n is 2. When the second, _n is 2 and 3 - _n
> is 1. Hence 3 - _n indexes the other observation in blocks
> of 2. This works regardless of variable type.
Nick Cox
> > You're correct. I misread the question. Sorry.
>
> Gary Longton
>
> > > Daphna Bassok wrote:
> > >
> > > > I have several duplicate observations in my data set.
> > > However, they are
> > > > not perfect duplicates. Only the id # is the same. So
> > > there might be
> > > > two observations with id#16 for instance, the first will
> > > have values for
> > > > some variables, and missing values for others. The second
> > > also have some
> > > > values filled and some missing. There are no cases in
> > > which both have
> > > > values- that is... either the first in the pair has the
> > > value OR the
> > > > second has a value (or neither).
> > > >
> > > > For example: suppose I have two observations with id# 16...
> > > The first
> > > > has values for var1 and 2 and not 3. The second ONLY has
> > > values for
> > > > var 3. What i would like to do is simply collapse these
> > > into a single
> > > > observation with all the relevant info. meaning, 1
> > observation with
> > > > id#16 that has values for all three variables.
> > > >
> > > > I am trying to do this with the collapse command with
> no success.
> > > >
> > > > My code is:
> > > >
> > > > collapse (min) var1-var3, by(id)
> > > >
> > > > I thought this would create a new observation that has all
> > > the data in it.
> > > >
> > > > I am getting a "type mismatch" error.
> > > >
> > > > Is this because some of my variables are string variables?
> > >
> > > Nick Cox suggested:
> > >
> > > > What you can do is -- if your description is correct --
> > > >
> > > > egen nmiss = rowmiss(<insert variable names>)
> > > > bysort id (nmiss) : keep if _n == 1
> > > >
> > > > as the sort will sort the observation with
> > > > more missings to second place.
> > >
> > > and Austin Nichols suggested:
> > >
> > > > foreach v of varlist put all the relevant varnames here {
> > > > bys id (`v'): qui replace `v'=`v'[_n-1] if mi(`v')
> > > > }
> > > > bys id: drop if _n>1
> > >
> > > It is a rare day when one can make a correction to a
> > > typically accurate and
> > > elegant Nick Cox solution, so I make this one fearing that
> > > I've probably missed
> > > somthing obvious.
> > >
> > > If I understand the problem correctly, I think this solution
> > > will discard
> > > non-missing data for some variables.
> > >
> > > Eg. a simplified dataset like this seems consistent with
> > > Daphna's description:
> > >
> > > obs id var1 var2 var3 nmiss
> > >
> > > 1 16 a . 3 1
> > > 2 16 . 2 . 2
> > > 3 17 . . 7 2
> > > 4 17 c 3 . 1
> > >
> > > sorting on nmiss will discard observations 2 & 3, throwing
> > > away non-missing data
> > > for var2
> > >
> > > As Austin's solution suggests, one needs to sort separately
> > > for each variable in
> > > the list and carry out the replace for that variable.
> > > However missings sort
> > > differently for string and numeric variables, taking first
> > > place for strings and
> > > last for numerics, so need to be handled differently in a
> > > sort solution.
> > > Austin's solution won't sort correctly for string variables.
> > >
> > > There is probably a shorter approach, but I think this will do it:
> > >
> > > ********
> > > foreach var of varlist var1-var3 {
> > > if substr("`:type `var''",1,3) == "str" {
> > > bysort id (`var') replace `var' = `var'[_N]
> > > }
> > > else {
> > > bysort id (`var') replace `var' = `var'[1]
> > > }
> > > }
> > > bysort id: drop if _n>1
> > > ************
> >
> >
>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/