Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Identifying first observation in each panel after regression

From	Ivan Png <[email protected]>
To	[email protected]
Subject	Re: st: Identifying first observation in each panel after regression
Date	Tue, 5 Jun 2012 07:30:09 -0400
Brilliant.  That must explain my puzzle.  Many thanks.

On 5 June 2012 06:30, Steve Samuels <[email protected]> wrote:
>
> One likely reason: There are missing values for some of your covariates:
> Some of these occur in the first observation (year) a company has and
> are excluded from analysis sample even when other observations from
> the company are included.
>
>
>
> Steve
> [email protected]
>
>
>
>
> On Jun 5, 2012, at 5:50 AM, Ivan Png wrote:
>
> Many thanks.  Sorry, you are right.  I wrote wrongly.  What I meant was that,
>
> When I run the regression, it shows 2773 groups (companies).  But when I run
> . gen rdsample = 1 if e(sample)
> . by gvkey , sort : gen flag = 1 if _n == 1
>  /* flag first observation of each company */
>
> . su year if flag == 1 & rdsample == 1
> It indicates 1048 unique companies.  I do not understand where are the
> other 2773 - 1048 = 1725 companies.
>
> Anyhow, a friend just suggested the following (and it works)
>
> . sort  rdsample gvkey year
> . by  rdsample gvkey , sort: gen flag = 1 if rdsample == 1 & _n == 1
> . su year if flag == 1
> This shows 2773 companies.  I just do not understand why.
>
>
>
>
>
>
>
> On 4 June 2012 22:36, Steve Samuels <[email protected]> wrote:
>>
>> Correction: the "flag2" statement is run after the regression.
>>
>>
>>
>> Your claim of discrepancy is false, and you did not test it in the do file, which runs the "by gvkey:" statement only after -xtreg-.
>>
>> When I run your do file with:
>>
>> . by gvkey , sort : gen flag1 = 1 if _n ==1   // before the xtreg statement
>>
>> . by gvkey , sort : gen flag2 = 1 if _n ==1  // after the xtreg statement
>>
>> tab flag1 flag2, missing
>>
>>           |         flag2
>>     flag1 |         1          . |     Total
>> -----------+----------------------+----------
>>        1 |     6,982          0 |     6,982
>>        . |         0     70,797 |    70,797
>> -----------+----------------------+----------
>>    Total |     6,982     70,797 |    77,779
>>
>>
>>
>> Steve
>> [email protected]
>>
>> On Jun 4, 2012, at 8:13 PM, Ivan Png wrote:
>>
>> Thanks, Nick.
>>
>> Here's the code
>> https://docs.google.com/open?id=0Bxt3Gm6VpSgiZmJkRUZUUktJQzA
>>
>> And here's the data
>> https://docs.google.com/open?id=0Bxt3Gm6VpSgiNFhhV3dsang4b3M
>>
>>
>>
>> On 4 June 2012 19:00, Nick Cox <[email protected]> wrote:
>>> It should make absolutely no difference whether you do this before or
>>> after a regression. I think we need to see evidence of what you think
>>> is happening in terms of a dataset you provide in its entirety or
>>> using a dataset downloadable by all. Otherwise I'd advise taking up
>>> your puzzlement with Stata tech-support. They would want a copy of
>>> your dataset.
>>>
>>> On Mon, Jun 4, 2012 at 11:43 PM, Ivan Png <[email protected]> wrote:
>>>> What I don't understand: Why the
>>>>
>>>> . by gvkey , sort : gen flag = 1 if _n ==1
>>>>
>>>> works when I invoke it before the regression (it then picks up the
>>>> first observation of each company), but not when I invoke it after the
>>>> regression (it misses many companies).
>>>>
>>>> I used exactly the same command in both cases.
>>>>
>>>>
>>>> On 4 June 2012 18:31, Nick Cox <[email protected]> wrote:
>>>>> Which bit don't you understand?
>>>>>
>>>>> On Mon, Jun 4, 2012 at 11:16 PM, Ivan Png <[email protected]> wrote:
>>>>>> Dear Nick--
>>>>>>
>>>>>> Many thanks for your hint.  I found the solution.  I execute
>>>>>> . by gvkey , sort: gen flag = 1 if  _n == 1
>>>>>> before the regression.
>>>>>>
>>>>>> Then, after the regression, I execute
>>>>>> . gen regsample == 1 if e(sample)
>>>>>>
>>>>>> And, to identify the first observation of each company in the
>>>>>> regression sample, I use
>>>>>>  regsample == 1 & flag == 1
>>>>>>
>>>>>> However, I still don't understand the reason it works.
>>>>>>
>>>>>>
>>>>>> On 4 June 2012 14:24, Nick Cox <[email protected]> wrote:
>>>>>>> What code do you mean by "the code below"?
>>>>>>>
>>>>>>> I suspect there's something else up with your dataset that leads to
>>>>>>> what you see. Examine the data omitted by
>>>>>>>
>>>>>>> . edit if !e(sample)
>>>>>>>
>>>>>>> after your -xtreg- command.
>>>>>>>
>>>>>>> Nick
>>>>>>>
>>>>>>> On Mon, Jun 4, 2012 at 6:44 PM, Ivan Png <[email protected]> wrote:
>>>>>>>> Many thanks, Nick.  Incidentally, thanks for the yeoman service to all
>>>>>>>> STATAlisters.
>>>>>>>>
>>>>>>>> The discrepancy I found was by using xtreg to run a fixed-effects
>>>>>>>> regression on the sample.  xtreg reported 2773 companies.  Yet, when I
>>>>>>>> used the code below on the regression sample, I got only 1048
>>>>>>>> companies.  So, the only reason I could think of was that the flag
>>>>>>>> identified only companies that were present in year 1.
>>>>>>>
>>>>>>> On 4 June 2012 13:21, Nick Cox <[email protected]> wrote:
>>>>>>>
>>>>>>>>> Your code looks fine to me, so I have difficulty understanding why you think it doesn't work.
>>>>>>>>>
>>>>>>>>> The -sort- on the second command is unnecessary given the previous command, but I don't see that it will change the sort order.
>>>>>>>>>
>>>>>>>>> You can check logic in terms of this example:
>>>>>>>>>
>>>>>>>>> . webuse grunfeld
>>>>>>>>>
>>>>>>>>> . su year
>>>>>>>>>
>>>>>>>>>   Variable |       Obs        Mean    Std. Dev.       Min        Max
>>>>>>>>> -------------+--------------------------------------------------------
>>>>>>>>>       year |       200      1944.5    5.780751       1935       1954
>>>>>>>>>
>>>>>>>>> . drop if year == 1935 & mod(company, 2)
>>>>>>>>> (5 observations deleted)
>>>>>>>>>
>>>>>>>>> . tab year
>>>>>>>>>
>>>>>>>>>      year |      Freq.     Percent        Cum.
>>>>>>>>> ------------+-----------------------------------
>>>>>>>>>      1935 |          5        2.56        2.56
>>>>>>>>>      1936 |         10        5.13        7.69
>>>>>>>>>      1937 |         10        5.13       12.82
>>>>>>>>>      1938 |         10        5.13       17.95
>>>>>>>>>      1939 |         10        5.13       23.08
>>>>>>>>>      1940 |         10        5.13       28.21
>>>>>>>>>      1941 |         10        5.13       33.33
>>>>>>>>>      1942 |         10        5.13       38.46
>>>>>>>>>      1943 |         10        5.13       43.59
>>>>>>>>>      1944 |         10        5.13       48.72
>>>>>>>>>      1945 |         10        5.13       53.85
>>>>>>>>>      1946 |         10        5.13       58.97
>>>>>>>>>      1947 |         10        5.13       64.10
>>>>>>>>>      1948 |         10        5.13       69.23
>>>>>>>>>      1949 |         10        5.13       74.36
>>>>>>>>>      1950 |         10        5.13       79.49
>>>>>>>>>      1951 |         10        5.13       84.62
>>>>>>>>>      1952 |         10        5.13       89.74
>>>>>>>>>      1953 |         10        5.13       94.87
>>>>>>>>>      1954 |         10        5.13      100.00
>>>>>>>>> ------------+-----------------------------------
>>>>>>>>>     Total |        195      100.00
>>>>>>>>>
>>>>>>>>> . bysort company (year) : gen first = _n == 1
>>>>>>>>>
>>>>>>>>> . l company year  if first
>>>>>>>>>
>>>>>>>>>    +----------------+
>>>>>>>>>    | company   year |
>>>>>>>>>    |----------------|
>>>>>>>>> 1. |       1   1936 |
>>>>>>>>> 20. |       2   1935 |
>>>>>>>>> 40. |       3   1936 |
>>>>>>>>> 59. |       4   1935 |
>>>>>>>>> 79. |       5   1936 |
>>>>>>>>>    |----------------|
>>>>>>>>> 98. |       6   1935 |
>>>>>>>>> 118. |       7   1936 |
>>>>>>>>> 137. |       8   1935 |
>>>>>>>>> 157. |       9   1936 |
>>>>>>>>> 176. |      10   1935 |
>>>>>>>>>    +----------------+
>>>>>>>>>
>>>>>>>>> Nick
>>>>>>>>> [email protected]
>>>>>>>>>
>>>>>>>>> Ivan Png
>>>>>>>>>
>>>>>>>>> I am analyzing an unbalanced panel of company data, organized by
>>>>>>>>> company (gvkey) and year.  I want to create  a flag to the first
>>>>>>>>> observation of each company in the panel.  I tried
>>>>>>>>>
>>>>>>>>> . sort gvkey year
>>>>>>>>> . by gvkey , sort: gen flag = 1 if  _n == 1
>>>>>>>>>
>>>>>>>>> However, this only flagged flag = 1 if a company was present in year 1
>>>>>>>>> of the panel.  It missed any company that appeared in later years.
>>>>>>>>>
>>>>>>>>> I searched statalist and found this:
>>>>>>>>> http://www.stata.com/statalist/archive/2005-04/msg00334.html
>>>>>>>>>
>>>>>>>>> But it doesn't work.  I'd be grateful for any relevant help.
>>>>>
>>>>> *
>>>>> *   For searches and help try:
>>>>> *   http://www.stata.com/help.cgi?search
>>>>> *   http://www.stata.com/support/statalist/faq
>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>
>>>>
>>>>
>>>> --
>>>> Best wishes
>>>> Ivan Png
>>>> Skype: ipng00
>>>>
>>>> *
>>>> *   For searches and help try:
>>>> *   http://www.stata.com/help.cgi?search
>>>> *   http://www.stata.com/support/statalist/faq
>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/statalist/faq
>>> *   http://www.ats.ucla.edu/stat/stata/
>>
>>
>>
>> --
>> Best wishes
>> Ivan Png
>> Skype: ipng00
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>>
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>
>
>
> --
> Best wishes
> Ivan Png
> Skype: ipng00
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/



-- 
Best wishes
Ivan Png
Skype: ipng00

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
References:
- st: Identifying first observation in each panel after regression
  - From: Ivan Png <[email protected]>
- Re: st: Identifying first observation in each panel after regression
  - From: Steve Samuels <[email protected]>
Prev by Date: st: Minimal Detectable Difference
Next by Date: st: DESUG 2012, Berlin
Previous by thread: Re: st: Identifying first observation in each panel after regression
Next by thread: st: data analysis problem. Extrapolation or something else?
Index(es):
- Date
- Thread