[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Martin Weiss" <martin.weiss1@gmx.de> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: AW: RE: dropping observation |

Date |
Thu, 11 Jun 2009 17:56:58 +0200 |

<> "Experienced users would want me to underline that any missing values on -employerID- would need consideration." Difficult indeed, everything depends on what Stefano wants to assume about the missing cases. In the code below, I have included several guys with various degrees of "missingness"... ************* clear* input forecast /* */ analystID employerID 1 1 1 2 1 1 3 1 1 1 2 1 2 2 1 3 2 2 4 2 2 1 3 3 2 3 4 1 4 . 2 4 5 3 4 . 4 4 5 1 5 6 2 5 . 3 5 7 4 5 . 1 6 . 2 6 . end compress list, noobs /* */ sepby(analy) bys anal (employ): /* get the last nonmissing employer, trick from http://www.stata.com/support/faqs/data/dropmiss.html */ egen lastnonmiempl =/* egen allows expressions for some of its functions */ max(cond(!missing(employ), employ, .)) bys anal:/* */ egen miss=/* */ total(mi(employ)) replace miss=miss!=0 list, noobs /* */ sepby(analy) bysort analystID (employerID) :/* */ drop if employerID[1] /* */ == lastnonmiempl[1] /* additionally: only those w/o missings on the employer var */ & miss==0 list, noobs /* */ sepby(analy) /* Now it really depends whether you want to drop those who did not change jobs during the "visible" part of their career. If so, comment this in: bysort analystID (employerID) :/* */ drop if employerID[1] /* */ == lastnonmiempl[1] */ /* OR you could give them the benefit of doubt, assuming that the missing indicates a job change. Leave everyting as it is, then. You still have to decide how to go about this business regarding analyst # 6 who has all missings... */ list, noobs /* */ sepby(analy) ************* HTH Martin -----Ursprüngliche Nachricht----- Von: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] Im Auftrag von Nick Cox Gesendet: Donnerstag, 11. Juni 2009 10:26 An: statalist@hsphsun2.harvard.edu Betreff: st: RE: dropping observation The solutions suggested all work with this kind of data and all have a clear logic. Note that only Tirthankar's and Kieran's would apply as well to a string identifier. They all involve a constructed extra variable. That can be avoided in this way: bysort analystID (employerID) : drop if employerID[1] == employerID[_N] The logic here is that if all values are the same in a group, then the first will equal the last, except that we must sort too. See also the FAQ How do I list observations in a group that differ on a variable? http://www.stata.com/support/faqs/data/diff.html This may not sound like the same problem, but change != to == and -list- to -drop- and the logic carries over. Experienced users would want me to underline that any missing values on -employerID- would need consideration. Nick n.j.cox@durham.ac.uk Eric A. Booth ============== bysort analystID: egen max = max(employerID) bysort analystID: egen min = min(employerID) drop if max==min Tirthankar Chakravarty ====================== Using Nick Cox's -egenmore- package (SSC): /* Spells */ clear // ssc install egenmore, replace input forecast_no analystID employerID 1 1 1 2 1 1 3 1 1 1 2 1 2 2 1 3 2 2 4 2 2 1 3 3 2 3 4 end egen nvalsID = nvals(employerID), by(analystID) drop if nvalsID==1 list, clean Howie Lempel ============ Create a variable with the mean absolute deviation from the mean of employer ID for each analyst. This will be 0 if the employer ID never changes. bysort analystID: egen Demp = mdev(employerID) Drop observations where the employer ID never changed. drop if Demp==0 Kieran McCaul ============= sort analystID employerID by analystID employerID: gen N1=_N by analystID: gen N2=_N drop if N2==N1 Stefano Bonini ============== I have a huge panel dataset containing analyst forecasts. Each analyst is associated with an employer. Sometimes analyst change employer. I want to restrict my dataset, dropping the observations of analysts that never change employer. The dataset may look like this forecast# analystID employer ID 1 1 1 2 1 1 3 1 1 1 2 1 2 2 1 3 2 2 4 2 2 1 3 3 2 3 4 In this case I'd nee to drop all observations by analyst 1 because he never changes employer, while keeping those of analysts 2 and 3. I really cannot figure out the way to do it as visual inspection is just impossible with over 1.2m obs. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: dropping observation***From:*Stefano Bonini <sbonini@stern.nyu.edu>

**st: RE: dropping observation***From:*"Nick Cox" <n.j.cox@durham.ac.uk>

- Prev by Date:
**AW: st: test difference in quintiles** - Next by Date:
**RE: st: test difference in quintiles** - Previous by thread:
**st: RE: dropping observation** - Next by thread:
**st: RE: Omega statistic** - Index(es):

© Copyright 1996–2016 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |