Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Ivan Png <iplpng@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Identifying first observation in each panel after regression |

Date |
Tue, 5 Jun 2012 07:30:09 -0400 |

Brilliant. That must explain my puzzle. Many thanks. On 5 June 2012 06:30, Steve Samuels <sjsamuels@gmail.com> wrote: > > One likely reason: There are missing values for some of your covariates: > Some of these occur in the first observation (year) a company has and > are excluded from analysis sample even when other observations from > the company are included. > > > > Steve > sjsamuels@gmail.com > > > > > On Jun 5, 2012, at 5:50 AM, Ivan Png wrote: > > Many thanks. Sorry, you are right. I wrote wrongly. What I meant was that, > > When I run the regression, it shows 2773 groups (companies). But when I run > . gen rdsample = 1 if e(sample) > . by gvkey , sort : gen flag = 1 if _n == 1 > /* flag first observation of each company */ > > . su year if flag == 1 & rdsample == 1 > It indicates 1048 unique companies. I do not understand where are the > other 2773 - 1048 = 1725 companies. > > Anyhow, a friend just suggested the following (and it works) > > . sort rdsample gvkey year > . by rdsample gvkey , sort: gen flag = 1 if rdsample == 1 & _n == 1 > . su year if flag == 1 > This shows 2773 companies. I just do not understand why. > > > > > > > > On 4 June 2012 22:36, Steve Samuels <sjsamuels@gmail.com> wrote: >> >> Correction: the "flag2" statement is run after the regression. >> >> >> >> Your claim of discrepancy is false, and you did not test it in the do file, which runs the "by gvkey:" statement only after -xtreg-. >> >> When I run your do file with: >> >> . by gvkey , sort : gen flag1 = 1 if _n ==1 // before the xtreg statement >> >> . by gvkey , sort : gen flag2 = 1 if _n ==1 // after the xtreg statement >> >> tab flag1 flag2, missing >> >> | flag2 >> flag1 | 1 . | Total >> -----------+----------------------+---------- >> 1 | 6,982 0 | 6,982 >> . | 0 70,797 | 70,797 >> -----------+----------------------+---------- >> Total | 6,982 70,797 | 77,779 >> >> >> >> Steve >> sjsamuels@gmail.com >> >> On Jun 4, 2012, at 8:13 PM, Ivan Png wrote: >> >> Thanks, Nick. >> >> Here's the code >> https://docs.google.com/open?id=0Bxt3Gm6VpSgiZmJkRUZUUktJQzA >> >> And here's the data >> https://docs.google.com/open?id=0Bxt3Gm6VpSgiNFhhV3dsang4b3M >> >> >> >> On 4 June 2012 19:00, Nick Cox <njcoxstata@gmail.com> wrote: >>> It should make absolutely no difference whether you do this before or >>> after a regression. I think we need to see evidence of what you think >>> is happening in terms of a dataset you provide in its entirety or >>> using a dataset downloadable by all. Otherwise I'd advise taking up >>> your puzzlement with Stata tech-support. They would want a copy of >>> your dataset. >>> >>> On Mon, Jun 4, 2012 at 11:43 PM, Ivan Png <iplpng@gmail.com> wrote: >>>> What I don't understand: Why the >>>> >>>> . by gvkey , sort : gen flag = 1 if _n ==1 >>>> >>>> works when I invoke it before the regression (it then picks up the >>>> first observation of each company), but not when I invoke it after the >>>> regression (it misses many companies). >>>> >>>> I used exactly the same command in both cases. >>>> >>>> >>>> On 4 June 2012 18:31, Nick Cox <njcoxstata@gmail.com> wrote: >>>>> Which bit don't you understand? >>>>> >>>>> On Mon, Jun 4, 2012 at 11:16 PM, Ivan Png <iplpng@gmail.com> wrote: >>>>>> Dear Nick-- >>>>>> >>>>>> Many thanks for your hint. I found the solution. I execute >>>>>> . by gvkey , sort: gen flag = 1 if _n == 1 >>>>>> before the regression. >>>>>> >>>>>> Then, after the regression, I execute >>>>>> . gen regsample == 1 if e(sample) >>>>>> >>>>>> And, to identify the first observation of each company in the >>>>>> regression sample, I use >>>>>> regsample == 1 & flag == 1 >>>>>> >>>>>> However, I still don't understand the reason it works. >>>>>> >>>>>> >>>>>> On 4 June 2012 14:24, Nick Cox <njcoxstata@gmail.com> wrote: >>>>>>> What code do you mean by "the code below"? >>>>>>> >>>>>>> I suspect there's something else up with your dataset that leads to >>>>>>> what you see. Examine the data omitted by >>>>>>> >>>>>>> . edit if !e(sample) >>>>>>> >>>>>>> after your -xtreg- command. >>>>>>> >>>>>>> Nick >>>>>>> >>>>>>> On Mon, Jun 4, 2012 at 6:44 PM, Ivan Png <iplpng@gmail.com> wrote: >>>>>>>> Many thanks, Nick. Incidentally, thanks for the yeoman service to all >>>>>>>> STATAlisters. >>>>>>>> >>>>>>>> The discrepancy I found was by using xtreg to run a fixed-effects >>>>>>>> regression on the sample. xtreg reported 2773 companies. Yet, when I >>>>>>>> used the code below on the regression sample, I got only 1048 >>>>>>>> companies. So, the only reason I could think of was that the flag >>>>>>>> identified only companies that were present in year 1. >>>>>>> >>>>>>> On 4 June 2012 13:21, Nick Cox <n.j.cox@durham.ac.uk> wrote: >>>>>>> >>>>>>>>> Your code looks fine to me, so I have difficulty understanding why you think it doesn't work. >>>>>>>>> >>>>>>>>> The -sort- on the second command is unnecessary given the previous command, but I don't see that it will change the sort order. >>>>>>>>> >>>>>>>>> You can check logic in terms of this example: >>>>>>>>> >>>>>>>>> . webuse grunfeld >>>>>>>>> >>>>>>>>> . su year >>>>>>>>> >>>>>>>>> Variable | Obs Mean Std. Dev. Min Max >>>>>>>>> -------------+-------------------------------------------------------- >>>>>>>>> year | 200 1944.5 5.780751 1935 1954 >>>>>>>>> >>>>>>>>> . drop if year == 1935 & mod(company, 2) >>>>>>>>> (5 observations deleted) >>>>>>>>> >>>>>>>>> . tab year >>>>>>>>> >>>>>>>>> year | Freq. Percent Cum. >>>>>>>>> ------------+----------------------------------- >>>>>>>>> 1935 | 5 2.56 2.56 >>>>>>>>> 1936 | 10 5.13 7.69 >>>>>>>>> 1937 | 10 5.13 12.82 >>>>>>>>> 1938 | 10 5.13 17.95 >>>>>>>>> 1939 | 10 5.13 23.08 >>>>>>>>> 1940 | 10 5.13 28.21 >>>>>>>>> 1941 | 10 5.13 33.33 >>>>>>>>> 1942 | 10 5.13 38.46 >>>>>>>>> 1943 | 10 5.13 43.59 >>>>>>>>> 1944 | 10 5.13 48.72 >>>>>>>>> 1945 | 10 5.13 53.85 >>>>>>>>> 1946 | 10 5.13 58.97 >>>>>>>>> 1947 | 10 5.13 64.10 >>>>>>>>> 1948 | 10 5.13 69.23 >>>>>>>>> 1949 | 10 5.13 74.36 >>>>>>>>> 1950 | 10 5.13 79.49 >>>>>>>>> 1951 | 10 5.13 84.62 >>>>>>>>> 1952 | 10 5.13 89.74 >>>>>>>>> 1953 | 10 5.13 94.87 >>>>>>>>> 1954 | 10 5.13 100.00 >>>>>>>>> ------------+----------------------------------- >>>>>>>>> Total | 195 100.00 >>>>>>>>> >>>>>>>>> . bysort company (year) : gen first = _n == 1 >>>>>>>>> >>>>>>>>> . l company year if first >>>>>>>>> >>>>>>>>> +----------------+ >>>>>>>>> | company year | >>>>>>>>> |----------------| >>>>>>>>> 1. | 1 1936 | >>>>>>>>> 20. | 2 1935 | >>>>>>>>> 40. | 3 1936 | >>>>>>>>> 59. | 4 1935 | >>>>>>>>> 79. | 5 1936 | >>>>>>>>> |----------------| >>>>>>>>> 98. | 6 1935 | >>>>>>>>> 118. | 7 1936 | >>>>>>>>> 137. | 8 1935 | >>>>>>>>> 157. | 9 1936 | >>>>>>>>> 176. | 10 1935 | >>>>>>>>> +----------------+ >>>>>>>>> >>>>>>>>> Nick >>>>>>>>> n.j.cox@durham.ac.uk >>>>>>>>> >>>>>>>>> Ivan Png >>>>>>>>> >>>>>>>>> I am analyzing an unbalanced panel of company data, organized by >>>>>>>>> company (gvkey) and year. I want to create a flag to the first >>>>>>>>> observation of each company in the panel. I tried >>>>>>>>> >>>>>>>>> . sort gvkey year >>>>>>>>> . by gvkey , sort: gen flag = 1 if _n == 1 >>>>>>>>> >>>>>>>>> However, this only flagged flag = 1 if a company was present in year 1 >>>>>>>>> of the panel. It missed any company that appeared in later years. >>>>>>>>> >>>>>>>>> I searched statalist and found this: >>>>>>>>> http://www.stata.com/statalist/archive/2005-04/msg00334.html >>>>>>>>> >>>>>>>>> But it doesn't work. I'd be grateful for any relevant help. >>>>> >>>>> * >>>>> * For searches and help try: >>>>> * http://www.stata.com/help.cgi?search >>>>> * http://www.stata.com/support/statalist/faq >>>>> * http://www.ats.ucla.edu/stat/stata/ >>>> >>>> >>>> >>>> -- >>>> Best wishes >>>> Ivan Png >>>> Skype: ipng00 >>>> >>>> * >>>> * For searches and help try: >>>> * http://www.stata.com/help.cgi?search >>>> * http://www.stata.com/support/statalist/faq >>>> * http://www.ats.ucla.edu/stat/stata/ >>> >>> * >>> * For searches and help try: >>> * http://www.stata.com/help.cgi?search >>> * http://www.stata.com/support/statalist/faq >>> * http://www.ats.ucla.edu/stat/stata/ >> >> >> >> -- >> Best wishes >> Ivan Png >> Skype: ipng00 >> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/statalist/faq >> * http://www.ats.ucla.edu/stat/stata/ >> >> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/statalist/faq >> * http://www.ats.ucla.edu/stat/stata/ > > > > -- > Best wishes > Ivan Png > Skype: ipng00 > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ -- Best wishes Ivan Png Skype: ipng00 * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Identifying first observation in each panel after regression***From:*Ivan Png <iplpng@gmail.com>

**Re: st: Identifying first observation in each panel after regression***From:*Steve Samuels <sjsamuels@gmail.com>

- Prev by Date:
**st: Minimal Detectable Difference** - Next by Date:
**st: DESUG 2012, Berlin** - Previous by thread:
**Re: st: Identifying first observation in each panel after regression** - Next by thread:
**st: data analysis problem. Extrapolation or something else?** - Index(es):