Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: comparing dates: keep if dates<d(DDMMYYYY)


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   RE: st: comparing dates: keep if dates<d(DDMMYYYY)
Date   Tue, 27 Jan 2009 21:14:11 -0000

Given that Ana Gabriela has indicators (dummies), there is an even
easier way to do it 

egen tokeep = rowmax(k*) 

Given lots of indicators, which are Booleans with 1 meaning true and 0
meaning false, then their row minimum indicates whether any is true and
the row maximum indicates whether all are true in each observation. 

My last paragraphs on my previous emails still stand. 

Nick 
[email protected] 

Nick Cox

Another way to do it would be to use the -egen- function -rany()- which
is included in -egenmore- from SSC. 

egen tokeep = rany(k*), cond(@ == 1) 

The name -rany- was chosen at a time when official -egen- functions for
rowwise operations bore names like -rmean-, the "r" signalling row.
Those in turn were named concisely as a hangover from days when Stata
for DOS was being shipped and command names were limited by the
filename.ext limit for DOS filenames. 

Despite the brevity of this solution, I wouldn't recommend it strongly.
It is better in my view to become accustomed to solving these problems
with a loop over the variables, using -forval- (as below) or -foreach-.
There is no efficiency loss either, as the -egen- function is also
looping over the variables, so your code with -egen- is really longer,
not shorter. 

My guess remains that no dummies are really needed, but that wasn't the
immediate question. 

Nick Cox

Precisely this question was asked by Steve a.k.a. sdm1 on 11 January 

<http://www.hsph.harvard.edu/cgi-bin/lwgate/STATALIST/archives/statalist
.0901/date/article-390.html> 

And the thread attracted several replies. 

It's disappointing that the question should (a) re-emerge so quickly and
(b) not be answered. 

First off, why doesn't that -if k*==1- syntax work? 

The syntax of -if- is that -if- can be followed by an expression. An
expression in Stata, as in mathematics, is such that you can plug in
values and then calculate to get a result. For example, -mpg > 14- is an
expression. For each observation you can get a result for that
expression if you have a numeric variable -mpg- (or, lacking a variable,
a scalar). In this example the result would be 1 or 0, depending on
-mpg-.  

A wildcard varlist such as k* will be expanded by Stata in contexts
where Stata expects a varlist. But even if expansion were allowed
whenever a wild card was part of an expression you would _at the most
generous_ get something like 

... if k1 k2 k3 k4 k5 k6 k7 k8 k9 k10 k11 k12 k13 k14 k15 k16 == 1 

which would be illegal. (Stata could make sense of -if k1-, but the rest
makes no sense.) 

As it happens, expansion is _not_ allowed whenever a wild card is part
of an expression. So even if in context -if mpg* > 14- could only mean
-if mpg > 14- as far as you were concerned, because -mpg- is the only
variable that satisfies -mpg*-, that still won't work. 

The important point to carry forward is that there is no way that Stata
will ever directly interpret k* in the way you want. 

Second, how do you do it? The thread started by the posting cited above
gave some possibilities. 

Suppose that we start with the basis that you have several dummy
variables. You could use a loop 

gen tokeep = 0 
quietly forval j = 1/16 { 
	replace tokeep = max(tokeep, k`j') 
} 
keep if tokeep 

or you could use -inlist- 

keep if inlist(1, k1, k2, k3, k4, k5, k6, k7, k8, k9, k10, k11, k12,
k13, k14, k15, k16) 

The last could be automated 

unab kvars : k* 
local kvars : subinstr local kvars " " ",", all 
keep if inlist(1, `kvars') 

Looking at your problem I suspect that your entire problem could be
automated without needing any dummy variables, but quite how would
depend on how the critical dates for the governorates were defined. 

Nick 
[email protected] 

Ana Gabriela Guerrero Serdan

Just had the time to work this out. I actually had to create some
dummies first: 

tabulate   loc2prov, gen(Governorate)

gen k1= date<td(30may2004) & Governorate1==1
gen k2= date<td(20may2004) & Governorate2==1
gen k3= date<td(21may2004) & Governorate3==1
gen k4= date<td(15may2004) & Governorate4==1
gen k5= date<td(25aug2004) & Governorate5==1
gen k6= date<td(15may2004) & Governorate6==1
gen k7= date<td(30aug2004) & Governorate7==1
gen k8= date<td(29may2004) & Governorate8==1
gen k9= date<td(16may2004) & Governorate9==1
gen k10= date<td(20may2004) & Governorate10==1
gen k11= date<td(25may2004) & Governorate11==1
gen k12= date<td(17may2004) & Governorate12==1
gen k13= date<td(23may2004) & Governorate13==1
gen k14= date<td(19may2004) & Governorate14==1
gen k15= date<td(7jun2004) & Governorate15==1
gen k16= date<td(20may2004) & Governorate16==1
gen k17= date<td(19may2004) & Governorate17==1
gen k18= date<td(18may2004) & Governorate18==1


keep if k1==1 | k2==1 | k3==1 | k4==1 | k5==1 | k6==1 | k7==1 | k8==1 |
k9==1 | k10==1 | k11==1 | k12==1 | k13==1 | k14==1 | k15==1 | k16==1 |
k17==1 | k18==1 

I wonder if there is a way to tell stata to create one variable whenever
the conditions above hold? 

I tried this but it doesnt work: 

gen dummy=1 if k*

invalid syntax
r(198);


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index