Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Re: RE: Re: counting and eliminating data


From   "Michael Blasnik" <michael.blasnik@verizon.net>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: Re: RE: Re: counting and eliminating data
Date   Wed, 25 Jan 2006 13:39:47 -0500

Woops...I forgot about that. When I look at how many lines of code it required, I started to think that going back to first principles might be better than using nvals and actually require fewer lines of code and be faster:

bysort subarea (year): gen byte persubyear=_n==1
by subarea (year): gen nyears=sum(persubyear)
by subarea (year): gen firstfive=sum(persubyear*(year<=5))
by subarea (year): gen lastfive=sum(persubyear*(year>21))
by subarea (year): gen tokeep= nyears[_N]>=13 & lastfive[_N]>=2 & firstfive[_N]>=2
keep if tokeep
drop nyears firstfive lastfive tokeep

Michael Blasnik

----- Original Message ----- From: "Nick Cox" <n.j.cox@durham.ac.uk>
To: <statalist@hsphsun2.harvard.edu>
Sent: Wednesday, January 25, 2006 12:09 PM
Subject: st: RE: Re: counting and eliminating data



Note that use of the -egen- function -nvals()-
depends on prior installation of the -egenmore-
package from SSC.

Nick
n.j.cox@durham.ac.uk

Michael Blasnik

There are a couple of approaches you could take, but I think
using egen
nvals is the best bet.

sort subarea
by subarea: egen nyears=nvals(year)
keep if nyears>=13
sort subarea
by subarea: egen firstfive=nvals(year) if year<=5
by subarea: egen lastfive=nvals(year) if year>21
* fill out missing values within subarea
bysort subarea (firstfive): replace firstfive=firstfive[1]
bysort subarea (lastfive): replace lastfive=lastfive[1]
keep if firstfive>=2 & lastfive>=2
drop nyears firstfive lastfive
Jennifer Devine

> Can someone please set me in the right direction for coding
a program to
> count and eliminate data if it doesn't meet a certain criteria?
>
>  I have survey data taken over 26 years and the survey area
is divided
> into subareas. I want to only include a subarea if data was
collected 13
> years out of the 26 and data must have been collected 2
years of the first
> 5 years and 2 years of the last 5 years. If the subarea
does not meet that
> criteria, I want Stata to drop that subarea from the
analysis. At the
> moment, I'm having to look at everything individually and
it takes several
> days to eliminate subareas.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index