Statalist The Stata Listserver

[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: Dropping/Keeping observations-25 April

From   "Maarten Buis" <[email protected]>
To   <[email protected]>
Subject   RE: st: Dropping/Keeping observations-25 April
Date   Tue, 25 Apr 2006 18:45:15 +0200

Michael pointed this out already, but our discussion might 
have been a bit too terse. 

My answer was tainted by the typical reason why I would 
use such -keep- or -drop- commands in combination with 
graphs. I keep just one case for each combination of x 
and y, since multiple observations would just be 
plotted on top of one another so don't show but do make 
your graph file a lot larger. This is probably not the 
case for you, so Michael solution is just fine. 

The command I used that droped cases was: 
-by rep78: keep if _n==_N & _N>1 & rep78 <. -

So my command contained three conditions a case should 
match in order to be retained in the dataset:
_n==_N: _n is the current observation number within the
by group, and _N is the total within the by group. So 
say a group consists of 5 cases than only the last case 
(with observation number 5) will be retained.

_N>1: only groups with more than one case are retained.

rep78 < .: only retain cases with non-missing values on
the grouping variable.

In my example code I needed a group with only one case
to show that it would drop that group. The grouping
variable (rep78) in the example case did not contain
such a group, so I created one by replacing one value
with a unique number. This was done with the command
-replace rep78 = 10 in 10- So in your case you should not
include this line.


Maarten L. Buis
Department of Social Research Methodology 
Vrije Universiteit Amsterdam 
Boelelaan 1081 
1081 HV Amsterdam 
The Netherlands

visiting adress:
Buitenveldertselaan 3 (Metropolitan), room Z214 

+31 20 5986715

-----Original Message-----
From: [email protected] [mailto:[email protected]]On Behalf Of Gauri Khanna
Sent: dinsdag 25 april 2006 18:15
To: [email protected]
Subject: Re: st: Dropping/Keeping observations-25 April

I tried Michael's approach and typed
bysort fid: drop if _N==1
*113 farmers remained in the dataset and 49 plots with only one farmer id
were dropped.

I decided to try Maarten's method but admit that I don't understand
completely what I am doing. So I followed the instructions exactly
(pseudopanel is the name of my dataset)

. sysuse pseudopanel, clear

. drop if stype==2
(164 observations deleted) ( I need to do for another reason, needed to
eliminate plots on which another crop was being grown. So now I am down to
the crop if interest and have data on plots where some farmer's id appear
only once and I need to get rid ot)

. replace fid = 10 in 10 /*create a unique value in fid*/
(1 real change made)
Gauri: what does this mean ?

. sort fid

. list fid in 1/10

     | fid |
  1. |   1 |
  2. |   2 |
  3. |   2 |
  4. |   3 |
  5. |   3 |
  6. |   3 |
  7. |   4 |
  8. |   4 |
  9. |   5 |
10. |   6 |

. by fid: keep if _n==_N & _N>1 & fid <.
(110 observations deleted)
Gauri: only 49 plots had one fid (farmerid) associated with it. So this has
dropped more...
. list fid

     | fid |
  1. |   2 |
  2. |   3 |
  3. |   4 |

*   For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index