Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Sergiy Radyakin <serjradyakin@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st:any easy alternative way when -egen- is not allowed to combine with by |

Date |
Tue, 21 Dec 2010 18:40:20 -0500 |

On Tue, Dec 21, 2010 at 6:01 PM, Stas Kolenikov <skolenik@gmail.com> wrote: > I think your second email gives the healthiest approach. Very little > of what -egen, by()- does cannot be done with a few lines of -bysort-, > and it is often more lucid than the -egen- code. > > tempvar sumnew > bysort id: g `sumnew' =sum(indicator) > bysort id: g byte new=(sumnew>0,1,0) > assert indicator == 0 if new == 0 > assert new == 1 if indicator > 0 & !missing(indicator) Hi Stas! there are at least three problems with your code: 1) you are missing ` ' around sumnew in the second bysort; 2) you are missing cond() in the second bysort; 3) the program does not work :( To add to the third point Amanda wanted to have 1 if for ANY year in the given ID group there is an indicator >0. here are the results of your program for the data example that Amanda provided: id year indica~r new 1. 1 1985 0 0 2. 1 1986 1 1 3. 1 1987 2 1 4. 2 1985 0 0 5. 2 1986 0 0 6. 2 1987 0 0 The problem is for (id==1) I smelled a problem with the solution once I saw the asserts. Indeed if you can check the results with such simple checks (no by or sorts) then you could generate the desired results in the same easy manner. The assert block is incomplete, it tests the two most trivial cases: 1) if the result is zero - then for none of the observations in the group there should be anything different from zero (missing is not accounted for), and hence for the current observation too, so we check it is equal to zero. 2) if the current observation is non-zero and not missing, then it is already sufficient for the result to be 1, regardless of anything else, so we check the result is equal to 1. The most interesting assert is missing - the one which tests that if the current observation is zero, but somewhere else in the group there is 1, then the result is 1. The simpliest code that satisfies your tests and generates the same result is generate byte new=indicator>0 if !missing(indicator) but this is not the result Amanda wanted. Also, having two bysorts in this manner is IMHO confusing to read. Can the sort order change as a result of the first statement? The code implies yes. I'd rather clearly write three statements sort, by, by, then two statements bysort, bysort, but that is a matter of preferences of course. For Amanda I suggest the following code: tempvar temptotal sort id by id: egen `temptotal' = total(indicator) generate byte new2 = `temptotal'>0 if !missing(`temptotal') There is an important difference between sum() and total(). See help for details. Best, Sergiy Radyakin > > > On Tue, Dec 21, 2010 at 4:50 PM, Amanda Fu <mandy.fu1@gmail.com> wrote: >> Just a supplement to my question: >> >> I know I could create an intermediate variable as following: >> >> bysort id: g sumnew=sum(indicator) >> bysort id: g new=(sumnew>0,1,0) >> >> But I do not like this way because of the intermediate variable >> "sumnew". It is created for the purpose of getting "new". If I keep >> it, it will not be useful in the analysis; if I drop it, what if I >> want to check if "new" is correct or not ? >> >> I am looking forward to hearing how you deal with this kind of >> intermediate variables. Thank you! >> >> Amanda Fu >> On Tue, Dec 21, 2010 at 5:23 PM, Amanda Fu <mandy.fu1@gmail.com> wrote: >>> Hi all, >>> >>> I notice some options of -egen- are not allowed to combine with by. >>> I just wondered if there is any good way to handle these situations. >>> >>> Let me use an example. >>> ---------------------------------------- >>> ID surveyYear indicator (maximum value is 10) >>> 1 1985 0 >>> 1 1986 1 >>> 1 1987 2 >>> 2 1985 0 >>> 2 1986 0 >>> 2 1987 0 >>> ............... >>> ---------------------------------------- >>> I want to create a variable "new" that takes value 1 if there is at >>> least one year for a ID's indicator is above 0 and takes the value 0 >>> (like ID 1) is all the years the indicator is 0 (like ID 2). >>> >>> What I wish to use is as following: >>> . bysort id: egen new=(indicator), anymatch(1/10) >>> >>> Is there any simple way to do this? Any comments will be helpful. Thank you! >>> >>> Sincerely, >>> Amanda Fu >>> >> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/statalist/faq >> * http://www.ats.ucla.edu/stat/stata/ >> > > > > -- > Stas Kolenikov, also found at http://stas.kolenikov.name > Small print: I use this email account for mailing lists only. > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st:any easy alternative way when -egen- is not allowed to combine with by***From:*Stas Kolenikov <skolenik@gmail.com>

**References**:**st:any easy alternative way when -egen- is not allowed to combine with by***From:*Amanda Fu <mandy.fu1@gmail.com>

**Re: st:any easy alternative way when -egen- is not allowed to combine with by***From:*Amanda Fu <mandy.fu1@gmail.com>

**Re: st:any easy alternative way when -egen- is not allowed to combine with by***From:*Stas Kolenikov <skolenik@gmail.com>

- Prev by Date:
**Re: st: Difficult merging process** - Next by Date:
**st: Error in 2008 post on survey covariances** - Previous by thread:
**Re: st:any easy alternative way when -egen- is not allowed to combine with by** - Next by thread:
**Re: st:any easy alternative way when -egen- is not allowed to combine with by** - Index(es):