Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Deleteing all observations for individuals with anomalous data


From   Seb Buechte <[email protected]>
To   [email protected]
Subject   Re: st: Deleteing all observations for individuals with anomalous data
Date   Wed, 10 Aug 2005 11:46:42 +0200

Nick,

you are right, -sort- sorts missing _numeric_ values to the end.
Still, from what I observe in case of a string variable - sort - sorts
missings, i.e. empty strings, to the top, which certainly makes sence.
However, if "cost" was a string variable the command you have
presented will not work as wanted..

Kind regards,
sebastian


On 8/10/05, Nick Cox <[email protected]> wrote:
> The -sort- sorts missing values to the end
> of each panel. So afterwards if any values in the panel
> are missing, then the last one will be too. That
> is necessary and sufficient information for a -drop-.
> 
> The -drop- then drops all observations in the panel
> if (iff) the last one is missing.
> 
> Nick
> [email protected]
> 
> Christian Holz
> 
> > I think, however, that Nick's approach does not work, if a value for
> > year 5 is there and another year has a missing value, as
> > Nick's command
> > only checks the last observation of each ID group.
> > I might be wrong, but in case I am not, it's worth mentionning...
> 
> Nick Cox wrote:
>  > Another way of doing this, without any new
> > > variables:
> > >
> > > bysort ID (Cost) : drop if missing(Cost[_N])
> > >
> > > Nick
> > > [email protected]
> > >
> > > Antoine Terracol
> > >
> > >
> > >>I would try something like :
> > >>
> > >>generate tag=(cost==.)
> > >>egen toberemoved=sum(tag), by(ID)
> > >>drop if toberemoved>0
> > >>drop tag toberemoved
> > >>
> > >>
> > >>You will need to replace the "cost==." in the fisrt line by a more
> > >>general way to tag your erroneous values (such as "cost==. |
> > >>cost>9999")
> > >
> > >
> > > Murray Lowe
> > >
> > >
> > >>>I am working with a large dataset and have discovered that
> > >>
> > >>some of the data
> > >>
> > >>>are missing values or have erroneous values. The data is
> > >>
> > >>panel data with
> > >>
> > >>>observations per individual over a 5 year period. For example:
> > >>>
> > >>>ID Year    Cost
> > >>>
> > >>>1  1       100
> > >>>1  2       200
> > >>>1  3       500
> > >>>1  4       150
> > >>>1  5       x
> > >>>2  1       100
> > >>>2  2       200
> > >>>2  3       500
> > >>>2  4       600
> > >>>2  5       100
> > >>>
> > >>>The problem is this: If an individual has a missing /
> > >>
> > >>erroneous value for a
> > >>
> > >>>particular year, I want to exclude ALL of their
> > >>
> > >>observations from the
> > >>
> > >>>dataset. In the example patient 1 would be removed from the dataset
> > >>>entirely. How can this be done through an automated-type process?
> > >>>Essentially I need a code / method that looks for the
> > >>
> > >>anomalous data;
> > >>
> > >>>identifies the patient and then removes all of their
> > >>
> > >>observations from the dataset.
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
> 


-- 
- Seb F Buechte
-
- Stay tuned!

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index