Gauri:
Michael pointed this out already, but our discussion might
have been a bit too terse.
My answer was tainted by the typical reason why I would
use such -keep- or -drop- commands in combination with
graphs. I keep just one case for each combination of x
and y, since multiple observations would just be
plotted on top of one another so don't show but do make
your graph file a lot larger. This is probably not the
case for you, so Michael solution is just fine.
The command I used that droped cases was:
-by rep78: keep if _n==_N & _N>1 & rep78 <. -
So my command contained three conditions a case should
match in order to be retained in the dataset:
_n==_N: _n is the current observation number within the
by group, and _N is the total within the by group. So
say a group consists of 5 cases than only the last case
(with observation number 5) will be retained.
_N>1: only groups with more than one case are retained.
rep78 < .: only retain cases with non-missing values on
the grouping variable.
In my example code I needed a group with only one case
to show that it would drop that group. The grouping
variable (rep78) in the example case did not contain
such a group, so I created one by replacing one value
with a unique number. This was done with the command
-replace rep78 = 10 in 10- So in your case you should not
include this line.
HTH,
Maarten
-----------------------------------------
Maarten L. Buis
Department of Social Research Methodology
Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands
visiting adress:
Buitenveldertselaan 3 (Metropolitan), room Z214
+31 20 5986715
http://home.fsw.vu.nl/m.buis/
-----------------------------------------
-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu]On Behalf Of Gauri Khanna
Sent: dinsdag 25 april 2006 18:15
To: statalist@hsphsun2.harvard.edu
Subject: Re: st: Dropping/Keeping observations-25 April
I tried Michael's approach and typed
bysort fid: drop if _N==1
*113 farmers remained in the dataset and 49 plots with only one farmer id
were dropped.
I decided to try Maarten's method but admit that I don't understand
completely what I am doing. So I followed the instructions exactly
(pseudopanel is the name of my dataset)
. sysuse pseudopanel, clear
. drop if stype==2
(164 observations deleted) ( I need to do for another reason, needed to
eliminate plots on which another crop was being grown. So now I am down to
the crop if interest and have data on plots where some farmer's id appear
only once and I need to get rid ot)
. replace fid = 10 in 10 /*create a unique value in fid*/
(1 real change made)
Gauri: what does this mean ?
. sort fid
. list fid in 1/10
+-----+
| fid |
|-----|
1. | 1 |
2. | 2 |
3. | 2 |
4. | 3 |
5. | 3 |
|-----|
6. | 3 |
7. | 4 |
8. | 4 |
9. | 5 |
10. | 6 |
+-----+
. by fid: keep if _n==_N & _N>1 & fid <.
(110 observations deleted)
Gauri: only 49 plots had one fid (farmerid) associated with it. So this has
dropped more...
. list fid
+-----+
| fid |
|-----|
1. | 2 |
2. | 3 |
3. | 4 |
<snip>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/