Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: AW: RE: Evaluating a set of conditions

From	Jeph Herrin <[email protected]>
To	[email protected]
Subject	Re: st: RE: AW: RE: Evaluating a set of conditions
Date	Wed, 23 Jun 2010 20:03:55 -0400

gen byte disease=cond(a==1&(-(b+c+d+e)<=-2),1,cond(mi(a,b,c,d,e),.,0))

?

On 6/23/2010 6:58 PM, Thomas Speidel wrote:

I have fixed the missing disease below.
I am trying to have as few lines of code as possible and avoid a series
of replace. Replacements are easy to follow but they create a hierarchy
that is hard to keep track of and error prone. Below I have purposly
created a series of replacements. I am looking for the shortest possible
way to achieve this that I often see in posts by you and Nick...

======================
input byte(id a b c d e disease)
1 0 0 1 0 0 0
2 1 0 1 1 0 1
3 . 1 1 1 1 .
4 0 1 1 0 1 0
5 1 0 0 0 0 0
6 1 . 1 1 0 1
7 0 0 0 0 0 0
8 1 . . . 1 .
9 1 0 0 0 0 0
10 1 . . 1 1 1
11 1 . 1 0 0 .
12 1 0 1 0 0 0
13 1 0 1 0 0 0
14 . 0 1 0 0 .
15 1 0 1 0 0 0
16 1 0 1 1 0 1
17 0 0 1 0 0 0
18 1 . . 0 . .
19 0 . . . 1 0
end

egen missing = rowmiss(b c d e)
egen sum = rowtotal(b c d e)
gen byte test = 1 if a==1 & missing<=1 & sum>=2
replace test = 0 if a==1 & missing<=1 & sum<2
replace test = . if a==1 & missing==1 & sum==1
replace test = 1 if a==1 & missing==2 & sum>=2
replace test = . if a==1 & missing==2 & sum<2
replace test = . if a==1 & missing>=3
replace test = 0 if a==0
. assert disease==test

.
======================

Quoting Martin Weiss <[email protected]> Wed 23 Jun 16:22:33 2010:


<>

In your -input- call, the "disease" variable seems to be missing.

Anyway, if the only problem is with a certain combination, why not add

***********
replace test=. if a==1 & missing==3
***********


HTH
Martin

-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Thomas Speidel
Sent: Donnerstag, 24. Juni 2010 00:09
To: [email protected]
Subject: RE: st: RE: AW: RE: Evaluating a set of conditions

Forcing myself to use -cond-, I get one step closer, yet not quite there
yet:
(in the code below I added one more obs at id==19)

======================
input byte(id a b c d e disease)
1 0 0 1 0 0
2 1 0 1 1 0
3 . 1 1 1 1
4 0 1 1 0 1
5 1 0 0 0 0
6 1 . 1 1 0
7 0 0 0 0 0
8 1 . . . 1
9 1 0 0 0 0
10 1 . . 1 1
11 1 . 1 0 0
12 1 0 1 0 0
13 1 0 1 0 0
14 . 0 1 0 0
15 1 0 1 0 0
16 1 0 1 1 0
17 0 0 1 0 0
18 1 . . 0 .
19 0 . . . 1
end

egen missing = rowmiss(b c d e)
gen byte test =cond(missing(a), ., cond(a==1, cond(missing<=1, (b + c
+ d + e)>=2, cond((b + c + d + e)>=2, 1, .)), 0))

. assert disease==test
3 contradictions in 19 observations
assertion is false
=======================
this seems to fail whenever a==1 & missing==3

Thomas

Quoting Thomas Speidel <[email protected]> Wed 23 Jun 13:13:54 2010:

While trying to simplify the problem for the list (my variables are
not actually called a, b, c, etc) I must have inadvertantly
introduced some problems. Sorry for the confusion.

Nonetheless, the variable called "disease" in the n=18 dataset is
indeed what I am trying to achieve.

Thomas

Quoting Martin Weiss <[email protected]> Wed 23 Jun 13:02:40 2010:


<>

Your own code returns "1" for id==11. Have you changed your mind?

***********
clear*

inp byte(id a b c d e)
1 0 0 1 0 0
2 1 0 1 1 0
3 . 1 1 1 1
4 0 1 1 0 1
5 1 0 0 0 0
6 1 . 1 1 0
7 0 0 0 0 0
8 1 . . . 1
9 1 0 0 0 0
10 1 . . 1 1
11 1 . 1 0 0
12 1 0 1 0 0
13 1 0 1 0 0
14 . 0 1 0 0
15 1 0 1 0 0
16 1 0 1 1 0
17 0 0 1 0 0
18 1 . . 0 .
end

egen anytwo = rowtotal(a b c d e), missing
egen missing = rowmiss(a b c d e)
replace anytwo = . if (anytwo==0 & missing>=2 & missing<.)
replace anytwo = . if (anytwo==1 & missing==1)
replace anytwo = . if (anytwo==1 & missing==3)
replace anytwo = . if (missing>=4)
gen disease = 1 if (a==1 & anytwo>=2 & anytwo<.)
replace disease = 0 if (a==1 & anytwo<2)
replace disease = 0 if a==0
replace disease =. if a==.


list in 11, noo
***********


HTH
Martin

-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Thomas
Speidel
Sent: Mittwoch, 23. Juni 2010 17:10
To: [email protected]
Subject: Re: st: RE: AW: RE: Evaluating a set of conditions

Thanks Martin and Nick. Here is an example where I have added more
missing and manually created "disease" to clarify how the missing
would impact the results:

id a b c d e disease
1 0 0 1 0 0 0
2 1 0 1 1 0 1
3 . 1 1 1 1 .
4 0 1 1 0 1 0
5 1 0 0 0 0 0
6 1 . 1 1 0 1
7 0 0 0 0 0 0
8 1 . . . 1 .
9 1 0 0 0 0 0
10 1 . . 1 1 1
11 1 . 1 0 0 .
12 1 0 1 0 0 0
13 1 0 1 0 0 0
14 . 0 1 0 0 .
15 1 0 1 0 0 0
16 1 0 1 1 0 1
17 0 0 1 0 0 0
18 1 . . 0 . .

Take a look at id==11 for example, where I don't have enough
information to determine disease presence.

Thomas Speidel

Quoting Nick Cox <[email protected]> Wed 23 Jun 06:59:44 2010:

Yes, if there are missings it's more complicated than my initial
answer
could suggest.

(a == 1) & (((b == 1) + (c ==1) + (d == 1) + (e == 1)) >= 2)

would seem to match the possibilities better.

Nick
[email protected]

Martin Weiss

The result does seem to differ much, though, from the one Thomas
evidently
wants - as expressed by his example:

*************
clear*
set obs 10000
set seed 12345

foreach var of newlist a b c d e{
gen byte `var'=runiform()<.5
replace `var'=. if runiform()<.15
}

//NJC
gen disease_true = a & (b + c + d + e >= 2) /*
*/ if !missing(a, b, c, d, e)

//Thomas
egen anytwo = rowtotal(a b c d e), missing
egen missing = rowmiss(a b c d e)
replace anytwo = . if (anytwo==0 & missing>=2 & missing<.)
replace anytwo = . if (anytwo==1 & missing==1)
replace anytwo = . if (anytwo==1 & missing==3)
replace anytwo = . if (missing>=4)
gen disease = 1 if (a==1 & anytwo>=2 & anytwo<.)
replace disease = 0 if (a==1 & anytwo<2)
replace disease = 0 if a==0
replace disease =. if a==.

//Comparison
compare disease_true disease
as disease_true ==disease
*************

Nick Cox

I think you need to be clear whether missing means true, false or
indeterminate as far as this is concerned.

Setting aside missings, as a, b, c, d, e are Booleans (1 = true, 0 =
false) then

gen disease_true = a & (b + c + d + e >= 2)

is one way to do it. If missings make the problem indeterminate then
tack on

... if !missing(a, b, c, d, e)

Nick
[email protected]

Thomas Speidel

Following up on my previous post:
http://www.stata.com/statalist/archive/2010-06/msg00984.html
here is an example for something I am trying to achieve in a
nice/efficient/eleganty way.

I have a number of dummies: a, b, c, d, e (missing values do exist)
Disease=true if the following conditions are met:

1) a must be true AND
2) any two of b, c, d, e are true

As I said missing values are crucial, especially when evaluating the
second condition.

My current program works, but I don't think it is efficient and it
probably does things that are unnecessary:

*******************************************
egen anytwo = rowtotal(a b c d e), missing
egen missing = rowmiss(a b c d e)
replace anytwo = . if (anytwo==0 & missing>=2 & missing<.)
replace anytwo = . if (anytwo==1 & missing==1)
replace anytwo = . if (anytwo==1 & missing==3)
replace anytwo = . if (missing>=4)

gen disease = 1 if (a==1 & anytwo>=2 & anytwo<.)
replace disease = 0 if (a==1 & anytwo<2)
replace disease = 0 if a==0
replace disease =. if a==.
*******************************************

I tried to play around with cond, but I found it was making this much
more complicated then it is. I know I am complicating my life more
than I need to which is why I am looking for alternative solutions.

*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




--
Thomas Speidel


*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/

*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




--
Thomas Speidel


*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




--
Thomas Speidel


*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/

*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: Evaluating a set of conditions
  - From: Thomas Speidel <[email protected]>
- st: RE: Evaluating a set of conditions
  - From: "Nick Cox" <[email protected]>
- st: AW: RE: Evaluating a set of conditions
  - From: "Martin Weiss" <[email protected]>
- st: RE: AW: RE: Evaluating a set of conditions
  - From: "Nick Cox" <[email protected]>
- Re: st: RE: AW: RE: Evaluating a set of conditions
  - From: Thomas Speidel <[email protected]>
- RE: st: RE: AW: RE: Evaluating a set of conditions
  - From: "Martin Weiss" <[email protected]>
- RE: st: RE: AW: RE: Evaluating a set of conditions
  - From: Thomas Speidel <[email protected]>
- RE: st: RE: AW: RE: Evaluating a set of conditions
  - From: Thomas Speidel <[email protected]>
- RE: st: RE: AW: RE: Evaluating a set of conditions
  - From: "Martin Weiss" <[email protected]>
- RE: st: RE: AW: RE: Evaluating a set of conditions
  - From: Thomas Speidel <[email protected]>

Prev by Date: RE: st: RE: AW: RE: Evaluating a set of conditions
Next by Date: Re: st: All coefficients significant
Previous by thread: RE: st: RE: AW: RE: Evaluating a set of conditions
Next by thread: st: Comparison of the R-squared in a loglog and linear model
Index(es):
- Date
- Thread