Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: dropping observation


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: dropping observation
Date   Thu, 11 Jun 2009 09:26:23 +0100

The solutions suggested all work with this kind of data and all have a clear logic. 

Note that only Tirthankar's and Kieran's would apply as well to a string identifier. 

They all involve a constructed extra variable. That can be avoided in this way: 

bysort analystID (employerID) : drop if employerID[1] == employerID[_N] 

The logic here is that if all values are the same in a group, then the first will equal the last, except that we must sort too. 

See also the FAQ 

How do I list observations in a group that differ on a variable?
http://www.stata.com/support/faqs/data/diff.html

This may not sound like the same problem, but change != to == and -list- to -drop- and the logic carries over. 

Experienced users would want me to underline that any missing values on -employerID- would need consideration. 

Nick 
n.j.cox@durham.ac.uk

Eric A. Booth
==============

bysort analystID: egen max = max(employerID) 
bysort analystID: egen min = min(employerID) 
drop if max==min

Tirthankar Chakravarty
======================

Using Nick Cox's -egenmore- package (SSC):

/* Spells */
clear
// ssc install egenmore, replace
input forecast_no analystID employerID
1                 1            1
2                 1            1
3                 1            1
1                 2            1
2                 2            1
3                 2            2
4                 2            2
1                 3            3
2                 3            4
end
egen nvalsID = nvals(employerID), by(analystID) 
drop if nvalsID==1 
list, clean

Howie Lempel
============

Create a variable with the mean absolute deviation from the mean of employer ID for each analyst.  This will be 0 if the employer ID never changes.

bysort analystID: egen Demp = mdev(employerID)

Drop observations where the employer ID never changed.

drop if Demp==0

Kieran McCaul
=============

sort analystID employerID
by analystID employerID: gen N1=_N
by analystID: gen N2=_N
drop if N2==N1


Stefano Bonini
==============

I have a huge panel dataset containing analyst forecasts. Each analyst is associated with an employer. Sometimes analyst change employer. I want to restrict my dataset, dropping the observations of analysts that never change employer. The dataset may look like this

forecast#     analystID   employer ID
1                 1            1
2                 1            1
3                 1            1

1                 2            1
2                 2            1
3                 2            2
4                 2            2

1                 3            3
2                 3            4

In this case I'd nee to drop all observations by analyst 1 because he never changes employer, while keeping those of analysts 2 and 3.

I really cannot figure out the way to do it as visual inspection is just impossible with over 1.2m obs.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index