Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: Deleteing all observations for individuals with anomalous data


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: Deleteing all observations for individuals with anomalous data
Date   Tue, 9 Aug 2005 23:06:55 +0100

The -sort- sorts missing values to the end
of each panel. So afterwards if any values in the panel 
are missing, then the last one will be too. That 
is necessary and sufficient information for a -drop-. 

The -drop- then drops all observations in the panel 
if (iff) the last one is missing. 

Nick 
n.j.cox@durham.ac.uk 

Christian Holz
 
> I think, however, that Nick's approach does not work, if a value for 
> year 5 is there and another year has a missing value, as 
> Nick's command 
> only checks the last observation of each ID group.
> I might be wrong, but in case I am not, it's worth mentionning...
 
Nick Cox wrote:
 > Another way of doing this, without any new 
> > variables: 
> > 
> > bysort ID (Cost) : drop if missing(Cost[_N]) 
> > 
> > Nick 
> > n.j.cox@durham.ac.uk 
> > 
> > Antoine Terracol
> > 
> > 
> >>I would try something like :
> >>
> >>generate tag=(cost==.)
> >>egen toberemoved=sum(tag), by(ID)
> >>drop if toberemoved>0
> >>drop tag toberemoved
> >>
> >>
> >>You will need to replace the "cost==." in the fisrt line by a more 
> >>general way to tag your erroneous values (such as "cost==. | 
> >>cost>9999")
> > 
> >  
> > Murray Lowe 
> > 
> > 
> >>>I am working with a large dataset and have discovered that 
> >>
> >>some of the data
> >>
> >>>are missing values or have erroneous values. The data is 
> >>
> >>panel data with
> >>
> >>>observations per individual over a 5 year period. For example:
> >>>
> >>>ID	Year	Cost
> >>>
> >>>1	1	100
> >>>1	2	200	
> >>>1	3	500
> >>>1	4	150
> >>>1	5	x
> >>>2	1	100	
> >>>2	2	200	
> >>>2	3	500
> >>>2	4	600
> >>>2	5	100
> >>>
> >>>The problem is this: If an individual has a missing / 
> >>
> >>erroneous value for a
> >>
> >>>particular year, I want to exclude ALL of their 
> >>
> >>observations from the
> >>
> >>>dataset. In the example patient 1 would be removed from the dataset
> >>>entirely. How can this be done through an automated-type process?
> >>>Essentially I need a code / method that looks for the 
> >>
> >>anomalous data;
> >>
> >>>identifies the patient and then removes all of their 
> >>
> >>observations from the dataset.

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index