Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Identifying first observation in each panel after regression


From   Ivan Png <[email protected]>
To   [email protected]
Subject   st: Identifying first observation in each panel after regression
Date   Mon, 4 Jun 2012 18:16:54 -0400

Dear Nick--

Many thanks for your hint.  I found the solution.  I execute
  . by gvkey , sort: gen flag = 1 if  _n == 1
before the regression.

Then, after the regression, I execute
  . gen regsample == 1 if e(sample)

And, to identify the first observation of each company in the
regression sample, I use
   regsample == 1 & flag == 1

However, I still don't understand the reason it works.


On 4 June 2012 14:24, Nick Cox <[email protected]> wrote:
> What code do you mean by "the code below"?
>
> I suspect there's something else up with your dataset that leads to
> what you see. Examine the data omitted by
>
> . edit if !e(sample)
>
> after your -xtreg- command.
>
> Nick
>
> On Mon, Jun 4, 2012 at 6:44 PM, Ivan Png <[email protected]> wrote:
>> Many thanks, Nick.  Incidentally, thanks for the yeoman service to all
>> STATAlisters.
>>
>> The discrepancy I found was by using xtreg to run a fixed-effects
>> regression on the sample.  xtreg reported 2773 companies.  Yet, when I
>> used the code below on the regression sample, I got only 1048
>> companies.  So, the only reason I could think of was that the flag
>> identified only companies that were present in year 1.
>
> On 4 June 2012 13:21, Nick Cox <[email protected]> wrote:
>
>>> Your code looks fine to me, so I have difficulty understanding why you think it doesn't work.
>>>
>>> The -sort- on the second command is unnecessary given the previous command, but I don't see that it will change the sort order.
>>>
>>> You can check logic in terms of this example:
>>>
>>> . webuse grunfeld
>>>
>>> . su year
>>>
>>>    Variable |       Obs        Mean    Std. Dev.       Min        Max
>>> -------------+--------------------------------------------------------
>>>        year |       200      1944.5    5.780751       1935       1954
>>>
>>> . drop if year == 1935 & mod(company, 2)
>>> (5 observations deleted)
>>>
>>> . tab year
>>>
>>>       year |      Freq.     Percent        Cum.
>>> ------------+-----------------------------------
>>>       1935 |          5        2.56        2.56
>>>       1936 |         10        5.13        7.69
>>>       1937 |         10        5.13       12.82
>>>       1938 |         10        5.13       17.95
>>>       1939 |         10        5.13       23.08
>>>       1940 |         10        5.13       28.21
>>>       1941 |         10        5.13       33.33
>>>       1942 |         10        5.13       38.46
>>>       1943 |         10        5.13       43.59
>>>       1944 |         10        5.13       48.72
>>>       1945 |         10        5.13       53.85
>>>       1946 |         10        5.13       58.97
>>>       1947 |         10        5.13       64.10
>>>       1948 |         10        5.13       69.23
>>>       1949 |         10        5.13       74.36
>>>       1950 |         10        5.13       79.49
>>>       1951 |         10        5.13       84.62
>>>       1952 |         10        5.13       89.74
>>>       1953 |         10        5.13       94.87
>>>       1954 |         10        5.13      100.00
>>> ------------+-----------------------------------
>>>      Total |        195      100.00
>>>
>>> . bysort company (year) : gen first = _n == 1
>>>
>>> . l company year  if first
>>>
>>>     +----------------+
>>>     | company   year |
>>>     |----------------|
>>>  1. |       1   1936 |
>>>  20. |       2   1935 |
>>>  40. |       3   1936 |
>>>  59. |       4   1935 |
>>>  79. |       5   1936 |
>>>     |----------------|
>>>  98. |       6   1935 |
>>> 118. |       7   1936 |
>>> 137. |       8   1935 |
>>> 157. |       9   1936 |
>>> 176. |      10   1935 |
>>>     +----------------+
>>>
>>> Nick
>>> [email protected]
>>>
>>> Ivan Png
>>>
>>> I am analyzing an unbalanced panel of company data, organized by
>>> company (gvkey) and year.  I want to create  a flag to the first
>>> observation of each company in the panel.  I tried
>>>
>>>  . sort gvkey year
>>>  . by gvkey , sort: gen flag = 1 if  _n == 1
>>>
>>> However, this only flagged flag = 1 if a company was present in year 1
>>> of the panel.  It missed any company that appeared in later years.
>>>
>>> I searched statalist and found this:
>>> http://www.stata.com/statalist/archive/2005-04/msg00334.html
>>>
>>> But it doesn't work.  I'd be grateful for any relevant help.
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/



-- 
Best wishes
Ivan Png
Skype: ipng00

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index