Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st:any easy alternative way when -egen- is not allowed to combine with by


From   Sergiy Radyakin <serjradyakin@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st:any easy alternative way when -egen- is not allowed to combine with by
Date   Tue, 21 Dec 2010 18:40:20 -0500

On Tue, Dec 21, 2010 at 6:01 PM, Stas Kolenikov <skolenik@gmail.com> wrote:
> I think your second email gives the healthiest approach. Very little
> of what -egen, by()- does cannot be done with a few lines of -bysort-,
> and it is often more lucid than the -egen- code.
>
> tempvar sumnew
> bysort id: g `sumnew' =sum(indicator)
> bysort id: g byte new=(sumnew>0,1,0)
> assert indicator == 0 if new == 0
> assert new == 1 if indicator > 0 & !missing(indicator)

Hi Stas!

there are at least three problems with your code:
1) you are missing ` ' around sumnew in the second bysort;
2) you are missing cond() in the second bysort;
3) the program does not work :(


To add to the third point Amanda wanted to have 1 if for ANY year in
the given ID group
there is an indicator >0. here are the results of your program for the
data example that
Amanda provided:

      id   year   indica~r   new
  1.    1   1985          0     0
  2.    1   1986          1     1
  3.    1   1987          2     1
  4.    2   1985          0     0
  5.    2   1986          0     0
  6.    2   1987          0     0

The problem is for (id==1)

I smelled a problem with the solution once I saw the asserts. Indeed if you can
check the results with such simple checks (no by or sorts) then you
could generate
the desired results in the same easy manner.

The assert block is incomplete, it tests the two most trivial cases:
1) if the result is zero - then for none of the observations in the
group there should be
anything different from zero (missing is not accounted for), and hence
for the current
observation too, so we check it is equal to zero.
2) if the current observation is non-zero and not missing, then it is
already sufficient
for the result to be 1, regardless of anything else, so we check the
result is equal to 1.

The most interesting assert is missing - the one which tests that if
the current observation
is zero, but somewhere else in the group there is 1, then the result is 1.

The simpliest code that satisfies your tests and generates the same result is
generate byte new=indicator>0 if !missing(indicator)
but this is not the result Amanda wanted.

Also, having two bysorts in this manner is IMHO confusing to read. Can
the sort order change
as a result of the first statement? The code implies yes. I'd rather
clearly write three statements
sort, by, by, then two statements bysort, bysort, but that is a matter
of preferences of course.

For Amanda I suggest the following code:

tempvar temptotal
sort id
by id: egen `temptotal' = total(indicator)
generate byte new2 = `temptotal'>0 if !missing(`temptotal')

There is an important difference between sum() and total(). See help
for details.

Best,
    Sergiy Radyakin





>
>
> On Tue, Dec 21, 2010 at 4:50 PM, Amanda Fu <mandy.fu1@gmail.com> wrote:
>> Just a supplement to my question:
>>
>> I know I could create an intermediate variable as following:
>>
>> bysort id: g  sumnew=sum(indicator)
>> bysort id: g  new=(sumnew>0,1,0)
>>
>> But I do not like this way because of the intermediate variable
>> "sumnew". It is created for the purpose of getting "new". If I keep
>> it, it will not be useful in the analysis; if I drop it, what if I
>> want to check if "new" is correct or not ?
>>
>> I am looking forward to hearing how you deal with this kind of
>> intermediate variables. Thank you!
>>
>> Amanda Fu
>> On Tue, Dec 21, 2010 at 5:23 PM, Amanda Fu <mandy.fu1@gmail.com> wrote:
>>> Hi all,
>>>
>>> I notice  some options of -egen- are not allowed to combine with by.
>>> I just wondered if there is any good way to handle these situations.
>>>
>>> Let me use an example.
>>> ----------------------------------------
>>> ID        surveyYear       indicator (maximum value is 10)
>>> 1         1985                     0
>>> 1         1986                     1
>>> 1         1987                     2
>>> 2         1985                     0
>>> 2         1986                     0
>>> 2         1987                     0
>>> ...............
>>> ----------------------------------------
>>> I want to create a variable "new" that takes value 1 if there is at
>>> least one year for a ID's indicator is above 0  and takes the value 0
>>> (like ID 1) is all the years the indicator is 0 (like ID 2).
>>>
>>> What I wish to use is as following:
>>> . bysort id: egen  new=(indicator), anymatch(1/10)
>>>
>>> Is there any simple way to do this? Any comments will be helpful. Thank you!
>>>
>>> Sincerely,
>>> Amanda Fu
>>>
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>>
>
>
>
> --
> Stas Kolenikov, also found at http://stas.kolenikov.name
> Small print: I use this email account for mailing lists only.
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index