[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Deleteing all observations for individuals with anomalousdata

From	Christian Holz <[email protected]>
To	[email protected]
Subject	Re: st: Deleteing all observations for individuals with anomalousdata
Date	Wed, 10 Aug 2005 15:02:42 +0100

Sebastian,
good remark! However, in that case it's even more straight forward. Nick's command would then have to be altered to

.bysort ID (Cost) : drop if missing(Cost[1])

Best regards,
Christian Holz

Seb Buechte wrote:

Nick,

you are right, -sort- sorts missing _numeric_ values to the end.
Still, from what I observe in case of a string variable - sort - sorts
missings, i.e. empty strings, to the top, which certainly makes sence.
However, if "cost" was a string variable the command you have
presented will not work as wanted..

Kind regards,
sebastian


On 8/10/05, Nick Cox <[email protected]> wrote:

The -sort- sorts missing values to the end
of each panel. So afterwards if any values in the panel
are missing, then the last one will be too. That
is necessary and sufficient information for a -drop-.

The -drop- then drops all observations in the panel
if (iff) the last one is missing.

Nick
[email protected]

Christian Holz

I think, however, that Nick's approach does not work, if a value for
year 5 is there and another year has a missing value, as
Nick's command
only checks the last observation of each ID group.
I might be wrong, but in case I am not, it's worth mentionning...

Nick Cox wrote:
> Another way of doing this, without any new

variables:

bysort ID (Cost) : drop if missing(Cost[_N])

Nick
[email protected]

Antoine Terracol

I would try something like :

generate tag=(cost==.)
egen toberemoved=sum(tag), by(ID)
drop if toberemoved>0
drop tag toberemoved


You will need to replace the "cost==." in the fisrt line by a more
general way to tag your erroneous values (such as "cost==. |
cost>9999")


Murray Lowe

I am working with a large dataset and have discovered that

some of the data

are missing values or have erroneous values. The data is

panel data with

observations per individual over a 5 year period. For example:

ID Year    Cost

1  1       100
1  2       200
1  3       500
1  4       150
1  5       x
2  1       100
2  2       200
2  3       500
2  4       600
2  5       100

The problem is this: If an individual has a missing /

erroneous value for a

particular year, I want to exclude ALL of their

observations from the

dataset. In the example patient 1 would be removed from the dataset
entirely. How can this be done through an automated-type process?
Essentially I need a code / method that looks for the

anomalous data;

identifies the patient and then removes all of their

observations from the dataset.

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- RE: st: Deleteing all observations for individuals with anomalous data
  - From: "Nick Cox" <[email protected]>
- Re: st: Deleteing all observations for individuals with anomalous data
  - From: Seb Buechte <[email protected]>

Prev by Date: Re: st: Question about strate and stsplit
Next by Date: st: time dummy or trend
Previous by thread: Re: st: Deleteing all observations for individuals with anomalous data
Next by thread: st: [Non Stata] Estimation strategy for a belief learning model.
Index(es):
- Date
- Thread