Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Re: RE: why the different?

From   "Wanli Zhao" <>
To   <>
Subject   st: RE: Re: RE: why the different?
Date   Thu, 27 Oct 2005 11:55:52 -0400

Thanks a lot. BTW, I think you mean "egen sumgt68=sum(age>68), by(gvkey)".
Muck quicker than my way. Learn something.

Wanli Zhao
-----Original Message-----
[] On Behalf Of Michael Blasnik
Sent: Thursday, October 27, 2005 8:13 AM
Subject: st: Re: RE: why the different?

"Wanli Zhao" <> wrote:

>I found something interesting & puzzling. Maybe I just miss something. 
>I  have a dataset like this:
> Now, I want to have the number of people older than 68 by each gvkey. 
> So I do {egen old=count(age) if age>=69,by(gvkey)}. Then I found that 
> the number is correct but it only shows when the age variable is 69 or 
> bigger. I thought it would put the same number within gvkey for each 
> age, just as I experienced a lot of such functions do. Certainly, I 
> did the following:
> gsort gvkey -old
> by gvkey: replace old=old=[_n-1] if old==.
> That's OK. But for the outsider, I want the number of 1's within each 
> gvkey so I did {egen outside=sum(outsider), by(gvkey)}. This time, 
> there is no missing value. Why the "count" behaves differently? 
> Certainly, I can generate another dummy for age bigger than 68 and 
> then sum that up. Same result. But I just wonder why "count" did not 
> fill in all the values?
> Cheers,
> Wanli Zhao

I think this behavior can be frustrating at times, but it certainly isn't
puzzling and I'd like to know what your examples are of other Stata commands
that don't follow this convention.  Commands that use -if- clauses usually
only operate on observations meeting the qualifier: gen x2=x^2 if x>5  will
create missing values in x2 for any cases where x is not greater than .
-egen- follows this same behavior and your example with the egen sum doesn't
have an -if- clause.  I have long thought that there ought to be an egen
option for filling in these missing values when a function yields a constant
for each by group.  Sometimes you can use logical conditions within the
function to accomplish this, as in egen sumgt68=sum(x*(age>68)), by(gvkey).

Michael Blasnik

*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index