you are right, -sort- sorts missing _numeric_ values to the end.
Still, from what I observe in case of a string variable - sort - sorts
missings, i.e. empty strings, to the top, which certainly makes sence.
However, if "cost" was a string variable the command you have
presented will not work as wanted..
On 8/10/05, Nick Cox <firstname.lastname@example.org> wrote:
The -sort- sorts missing values to the end
of each panel. So afterwards if any values in the panel
are missing, then the last one will be too. That
is necessary and sufficient information for a -drop-.
The -drop- then drops all observations in the panel
if (iff) the last one is missing.
I think, however, that Nick's approach does not work, if a value for
year 5 is there and another year has a missing value, as
only checks the last observation of each ID group.
I might be wrong, but in case I am not, it's worth mentionning...
Nick Cox wrote:
> Another way of doing this, without any new
bysort ID (Cost) : drop if missing(Cost[_N])
I would try something like :
egen toberemoved=sum(tag), by(ID)
drop if toberemoved>0
drop tag toberemoved
You will need to replace the "cost==." in the fisrt line by a more
general way to tag your erroneous values (such as "cost==. |
I am working with a large dataset and have discovered that
some of the data
are missing values or have erroneous values. The data is
panel data with
observations per individual over a 5 year period. For example:
ID Year Cost
1 1 100
1 2 200
1 3 500
1 4 150
1 5 x
2 1 100
2 2 200
2 3 500
2 4 600
2 5 100
The problem is this: If an individual has a missing /
erroneous value for a
particular year, I want to exclude ALL of their
observations from the
dataset. In the example patient 1 would be removed from the dataset
entirely. How can this be done through an automated-type process?
Essentially I need a code / method that looks for the
identifies the patient and then removes all of their
observations from the dataset.
* For searches and help try: