Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Identifying first observation in each panel after regression

From	Steve Samuels <[email protected]>
To	[email protected]
Subject	Re: st: Identifying first observation in each panel after regression
Date	Mon, 4 Jun 2012 22:27:12 -0400

Your claim of discrepancy is false, and  did  not test it  in the do file; it runs the "by gvkey:" statement only after  -xtreg-.
 
When I run your do file with:

. by gvkey , sort : gen flag1 = 1 if _n ==1   // before the xtreg statement
  
. by gvkey , sort : gen flag2 = 1 if _n ==1  // before the xtreg statement

tab flag1 flag2, missing

            |         flag2
      flag1 |         1          . |     Total
-----------+----------------------+----------
         1 |     6,982          0 |     6,982 
         . |         0     70,797 |    70,797 
-----------+----------------------+----------
     Total |     6,982     70,797 |    77,779 



Steve
[email protected]

On Jun 4, 2012, at 8:13 PM, Ivan Png wrote:

Thanks, Nick.

Here's the code
https://docs.google.com/open?id=0Bxt3Gm6VpSgiZmJkRUZUUktJQzA

And here's the data
https://docs.google.com/open?id=0Bxt3Gm6VpSgiNFhhV3dsang4b3M



On 4 June 2012 19:00, Nick Cox <[email protected]> wrote:
> It should make absolutely no difference whether you do this before or
> after a regression. I think we need to see evidence of what you think
> is happening in terms of a dataset you provide in its entirety or
> using a dataset downloadable by all. Otherwise I'd advise taking up
> your puzzlement with Stata tech-support. They would want a copy of
> your dataset.
> 
> On Mon, Jun 4, 2012 at 11:43 PM, Ivan Png <[email protected]> wrote:
>> What I don't understand: Why the
>> 
>> . by gvkey , sort : gen flag = 1 if _n ==1
>> 
>> works when I invoke it before the regression (it then picks up the
>> first observation of each company), but not when I invoke it after the
>> regression (it misses many companies).
>> 
>> I used exactly the same command in both cases.
>> 
>> 
>> On 4 June 2012 18:31, Nick Cox <[email protected]> wrote:
>>> Which bit don't you understand?
>>> 
>>> On Mon, Jun 4, 2012 at 11:16 PM, Ivan Png <[email protected]> wrote:
>>>> Dear Nick--
>>>> 
>>>> Many thanks for your hint.  I found the solution.  I execute
>>>>  . by gvkey , sort: gen flag = 1 if  _n == 1
>>>> before the regression.
>>>> 
>>>> Then, after the regression, I execute
>>>>  . gen regsample == 1 if e(sample)
>>>> 
>>>> And, to identify the first observation of each company in the
>>>> regression sample, I use
>>>>   regsample == 1 & flag == 1
>>>> 
>>>> However, I still don't understand the reason it works.
>>>> 
>>>> 
>>>> On 4 June 2012 14:24, Nick Cox <[email protected]> wrote:
>>>>> What code do you mean by "the code below"?
>>>>> 
>>>>> I suspect there's something else up with your dataset that leads to
>>>>> what you see. Examine the data omitted by
>>>>> 
>>>>> . edit if !e(sample)
>>>>> 
>>>>> after your -xtreg- command.
>>>>> 
>>>>> Nick
>>>>> 
>>>>> On Mon, Jun 4, 2012 at 6:44 PM, Ivan Png <[email protected]> wrote:
>>>>>> Many thanks, Nick.  Incidentally, thanks for the yeoman service to all
>>>>>> STATAlisters.
>>>>>> 
>>>>>> The discrepancy I found was by using xtreg to run a fixed-effects
>>>>>> regression on the sample.  xtreg reported 2773 companies.  Yet, when I
>>>>>> used the code below on the regression sample, I got only 1048
>>>>>> companies.  So, the only reason I could think of was that the flag
>>>>>> identified only companies that were present in year 1.
>>>>> 
>>>>> On 4 June 2012 13:21, Nick Cox <[email protected]> wrote:
>>>>> 
>>>>>>> Your code looks fine to me, so I have difficulty understanding why you think it doesn't work.
>>>>>>> 
>>>>>>> The -sort- on the second command is unnecessary given the previous command, but I don't see that it will change the sort order.
>>>>>>> 
>>>>>>> You can check logic in terms of this example:
>>>>>>> 
>>>>>>> . webuse grunfeld
>>>>>>> 
>>>>>>> . su year
>>>>>>> 
>>>>>>>    Variable |       Obs        Mean    Std. Dev.       Min        Max
>>>>>>> -------------+--------------------------------------------------------
>>>>>>>        year |       200      1944.5    5.780751       1935       1954
>>>>>>> 
>>>>>>> . drop if year == 1935 & mod(company, 2)
>>>>>>> (5 observations deleted)
>>>>>>> 
>>>>>>> . tab year
>>>>>>> 
>>>>>>>       year |      Freq.     Percent        Cum.
>>>>>>> ------------+-----------------------------------
>>>>>>>       1935 |          5        2.56        2.56
>>>>>>>       1936 |         10        5.13        7.69
>>>>>>>       1937 |         10        5.13       12.82
>>>>>>>       1938 |         10        5.13       17.95
>>>>>>>       1939 |         10        5.13       23.08
>>>>>>>       1940 |         10        5.13       28.21
>>>>>>>       1941 |         10        5.13       33.33
>>>>>>>       1942 |         10        5.13       38.46
>>>>>>>       1943 |         10        5.13       43.59
>>>>>>>       1944 |         10        5.13       48.72
>>>>>>>       1945 |         10        5.13       53.85
>>>>>>>       1946 |         10        5.13       58.97
>>>>>>>       1947 |         10        5.13       64.10
>>>>>>>       1948 |         10        5.13       69.23
>>>>>>>       1949 |         10        5.13       74.36
>>>>>>>       1950 |         10        5.13       79.49
>>>>>>>       1951 |         10        5.13       84.62
>>>>>>>       1952 |         10        5.13       89.74
>>>>>>>       1953 |         10        5.13       94.87
>>>>>>>       1954 |         10        5.13      100.00
>>>>>>> ------------+-----------------------------------
>>>>>>>      Total |        195      100.00
>>>>>>> 
>>>>>>> . bysort company (year) : gen first = _n == 1
>>>>>>> 
>>>>>>> . l company year  if first
>>>>>>> 
>>>>>>>     +----------------+
>>>>>>>     | company   year |
>>>>>>>     |----------------|
>>>>>>>  1. |       1   1936 |
>>>>>>>  20. |       2   1935 |
>>>>>>>  40. |       3   1936 |
>>>>>>>  59. |       4   1935 |
>>>>>>>  79. |       5   1936 |
>>>>>>>     |----------------|
>>>>>>>  98. |       6   1935 |
>>>>>>> 118. |       7   1936 |
>>>>>>> 137. |       8   1935 |
>>>>>>> 157. |       9   1936 |
>>>>>>> 176. |      10   1935 |
>>>>>>>     +----------------+
>>>>>>> 
>>>>>>> Nick
>>>>>>> [email protected]
>>>>>>> 
>>>>>>> Ivan Png
>>>>>>> 
>>>>>>> I am analyzing an unbalanced panel of company data, organized by
>>>>>>> company (gvkey) and year.  I want to create  a flag to the first
>>>>>>> observation of each company in the panel.  I tried
>>>>>>> 
>>>>>>>  . sort gvkey year
>>>>>>>  . by gvkey , sort: gen flag = 1 if  _n == 1
>>>>>>> 
>>>>>>> However, this only flagged flag = 1 if a company was present in year 1
>>>>>>> of the panel.  It missed any company that appeared in later years.
>>>>>>> 
>>>>>>> I searched statalist and found this:
>>>>>>> http://www.stata.com/statalist/archive/2005-04/msg00334.html
>>>>>>> 
>>>>>>> But it doesn't work.  I'd be grateful for any relevant help.
>>> 
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/statalist/faq
>>> *   http://www.ats.ucla.edu/stat/stata/
>> 
>> 
>> 
>> --
>> Best wishes
>> Ivan Png
>> Skype: ipng00
>> 
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/



-- 
Best wishes
Ivan Png
Skype: ipng00

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: Identifying first observation in each panel after regression
  - From: Ivan Png <[email protected]>
- Re: st: Identifying first observation in each panel after regression
  - From: Nick Cox <[email protected]>
- Re: st: Identifying first observation in each panel after regression
  - From: Ivan Png <[email protected]>
- Re: st: Identifying first observation in each panel after regression
  - From: Nick Cox <[email protected]>
- Re: st: Identifying first observation in each panel after regression
  - From: Ivan Png <[email protected]>

Prev by Date: Re: st: instrumenting Moving average variable
Next by Date: Re: st: Identifying first observation in each panel after regression
Previous by thread: Re: st: Identifying first observation in each panel after regression
Next by thread: Re: st: Identifying first observation in each panel after regression
Index(es):
- Date
- Thread