Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re:st: Looping the Loop


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   Re:st: Looping the Loop
Date   Mon, 31 Mar 2003 14:27:05 +0100

Joel Clovis

> A couple of weeks (more like 5 or 6 ) Nick Cox helped me with a
variable
> dropping schema for my large dataset when observations fell below a
> predefined number (30).  This prog (below) works quite well when
there is
> only one country in the dataset.  That is, I have been using drop to
> eliminate the other 130 countries from my dataset then deleted the
vars,
> saving and repeating the process for another country. This seem to
be
> crying  out for a simplier method.

> I have been following the recent exchange between Chris Rohlfs, Nick
> Winter and Edwin Leuven and their solution requires that you call
each
> macro by name, (in Nick's solution `C' and `V' and in Edwin's `v1'
and `v2' ),
> and this calling is not going to help me.    I need to say something
like:  by
> County:  drop var if obs <=30 but I don't have the skill to do it,
can anyone
> help?

> 2. How to -drop- according to your criterion
> ============================================
>
> I'd do it this way:
>
> . foreach v of var cba2tfina-region {
> .  	qui count if !missing(`v')
> .	if r(N) < 30 {
> .		drop `v'
> .	}
> . }

Kit Baum

> It is not clear, in a panel context, how you want to handle this.
Once
> you drop a variable, it is dropped for all countries. Do you mean
that
> a variable should be dropped if ANY country fails the test of having
30
> obs, or that it should only be dropped if ALL countries fail? The
> latter will probably never happen, and the former will probably
cause
> all variables to be thrown away. Perhaps you would like to set a
> particular variable to missing if it fails the test for each country
in
> which it does? Here is a fragment of code in which I do something
> similar to a panel of firm data:

> g byte enn = 1
> bys gvkey : egen byte nobsf=sum(enn)
> bys gvkey : drop if nobsf<4

> here 'gvkey' is the firm identifier, and I want only those firms who
> have four or more years of data. I could, instead, set some variable
to
> missing if nobsf<4 (or <30, in your case). Of course, you want to
test
> whether that variable is missing, not merely whether that number of
> obs. exist.

The limit is not Joel's skill. It is that Stata lets you
drop observations or variables at any one time, but not
both.

Kit's code can be condensed to

bysort gvkey : drop if _N < 4

Perhaps what Joel wants is something like

bysort country : drop if _N < 30

That is, his problem sounds more like one of
dropping observations than of dropping variables.

Nick
[email protected]

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index