Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Steve Samuels <sjsamuels@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Identifying first observation in each panel after regression |

Date |
Tue, 5 Jun 2012 06:30:20 -0400 |

One likely reason: There are missing values for some of your covariates: Some of these occur in the first observation (year) a company has and are excluded from analysis sample even when other observations from the company are included. Steve sjsamuels@gmail.com On Jun 5, 2012, at 5:50 AM, Ivan Png wrote: Many thanks. Sorry, you are right. I wrote wrongly. What I meant was that, When I run the regression, it shows 2773 groups (companies). But when I run . gen rdsample = 1 if e(sample) . by gvkey , sort : gen flag = 1 if _n == 1 /* flag first observation of each company */ . su year if flag == 1 & rdsample == 1 It indicates 1048 unique companies. I do not understand where are the other 2773 - 1048 = 1725 companies. Anyhow, a friend just suggested the following (and it works) . sort rdsample gvkey year . by rdsample gvkey , sort: gen flag = 1 if rdsample == 1 & _n == 1 . su year if flag == 1 This shows 2773 companies. I just do not understand why. On 4 June 2012 22:36, Steve Samuels <sjsamuels@gmail.com> wrote: > > Correction: the "flag2" statement is run after the regression. > > > > Your claim of discrepancy is false, and you did not test it in the do file, which runs the "by gvkey:" statement only after -xtreg-. > > When I run your do file with: > > . by gvkey , sort : gen flag1 = 1 if _n ==1 // before the xtreg statement > > . by gvkey , sort : gen flag2 = 1 if _n ==1 // after the xtreg statement > > tab flag1 flag2, missing > > | flag2 > flag1 | 1 . | Total > -----------+----------------------+---------- > 1 | 6,982 0 | 6,982 > . | 0 70,797 | 70,797 > -----------+----------------------+---------- > Total | 6,982 70,797 | 77,779 > > > > Steve > sjsamuels@gmail.com > > On Jun 4, 2012, at 8:13 PM, Ivan Png wrote: > > Thanks, Nick. > > Here's the code > https://docs.google.com/open?id=0Bxt3Gm6VpSgiZmJkRUZUUktJQzA > > And here's the data > https://docs.google.com/open?id=0Bxt3Gm6VpSgiNFhhV3dsang4b3M > > > > On 4 June 2012 19:00, Nick Cox <njcoxstata@gmail.com> wrote: >> It should make absolutely no difference whether you do this before or >> after a regression. I think we need to see evidence of what you think >> is happening in terms of a dataset you provide in its entirety or >> using a dataset downloadable by all. Otherwise I'd advise taking up >> your puzzlement with Stata tech-support. They would want a copy of >> your dataset. >> >> On Mon, Jun 4, 2012 at 11:43 PM, Ivan Png <iplpng@gmail.com> wrote: >>> What I don't understand: Why the >>> >>> . by gvkey , sort : gen flag = 1 if _n ==1 >>> >>> works when I invoke it before the regression (it then picks up the >>> first observation of each company), but not when I invoke it after the >>> regression (it misses many companies). >>> >>> I used exactly the same command in both cases. >>> >>> >>> On 4 June 2012 18:31, Nick Cox <njcoxstata@gmail.com> wrote: >>>> Which bit don't you understand? >>>> >>>> On Mon, Jun 4, 2012 at 11:16 PM, Ivan Png <iplpng@gmail.com> wrote: >>>>> Dear Nick-- >>>>> >>>>> Many thanks for your hint. I found the solution. I execute >>>>> . by gvkey , sort: gen flag = 1 if _n == 1 >>>>> before the regression. >>>>> >>>>> Then, after the regression, I execute >>>>> . gen regsample == 1 if e(sample) >>>>> >>>>> And, to identify the first observation of each company in the >>>>> regression sample, I use >>>>> regsample == 1 & flag == 1 >>>>> >>>>> However, I still don't understand the reason it works. >>>>> >>>>> >>>>> On 4 June 2012 14:24, Nick Cox <njcoxstata@gmail.com> wrote: >>>>>> What code do you mean by "the code below"? >>>>>> >>>>>> I suspect there's something else up with your dataset that leads to >>>>>> what you see. Examine the data omitted by >>>>>> >>>>>> . edit if !e(sample) >>>>>> >>>>>> after your -xtreg- command. >>>>>> >>>>>> Nick >>>>>> >>>>>> On Mon, Jun 4, 2012 at 6:44 PM, Ivan Png <iplpng@gmail.com> wrote: >>>>>>> Many thanks, Nick. Incidentally, thanks for the yeoman service to all >>>>>>> STATAlisters. >>>>>>> >>>>>>> The discrepancy I found was by using xtreg to run a fixed-effects >>>>>>> regression on the sample. xtreg reported 2773 companies. Yet, when I >>>>>>> used the code below on the regression sample, I got only 1048 >>>>>>> companies. So, the only reason I could think of was that the flag >>>>>>> identified only companies that were present in year 1. >>>>>> >>>>>> On 4 June 2012 13:21, Nick Cox <n.j.cox@durham.ac.uk> wrote: >>>>>> >>>>>>>> Your code looks fine to me, so I have difficulty understanding why you think it doesn't work. >>>>>>>> >>>>>>>> The -sort- on the second command is unnecessary given the previous command, but I don't see that it will change the sort order. >>>>>>>> >>>>>>>> You can check logic in terms of this example: >>>>>>>> >>>>>>>> . webuse grunfeld >>>>>>>> >>>>>>>> . su year >>>>>>>> >>>>>>>> Variable | Obs Mean Std. Dev. Min Max >>>>>>>> -------------+-------------------------------------------------------- >>>>>>>> year | 200 1944.5 5.780751 1935 1954 >>>>>>>> >>>>>>>> . drop if year == 1935 & mod(company, 2) >>>>>>>> (5 observations deleted) >>>>>>>> >>>>>>>> . tab year >>>>>>>> >>>>>>>> year | Freq. Percent Cum. >>>>>>>> ------------+----------------------------------- >>>>>>>> 1935 | 5 2.56 2.56 >>>>>>>> 1936 | 10 5.13 7.69 >>>>>>>> 1937 | 10 5.13 12.82 >>>>>>>> 1938 | 10 5.13 17.95 >>>>>>>> 1939 | 10 5.13 23.08 >>>>>>>> 1940 | 10 5.13 28.21 >>>>>>>> 1941 | 10 5.13 33.33 >>>>>>>> 1942 | 10 5.13 38.46 >>>>>>>> 1943 | 10 5.13 43.59 >>>>>>>> 1944 | 10 5.13 48.72 >>>>>>>> 1945 | 10 5.13 53.85 >>>>>>>> 1946 | 10 5.13 58.97 >>>>>>>> 1947 | 10 5.13 64.10 >>>>>>>> 1948 | 10 5.13 69.23 >>>>>>>> 1949 | 10 5.13 74.36 >>>>>>>> 1950 | 10 5.13 79.49 >>>>>>>> 1951 | 10 5.13 84.62 >>>>>>>> 1952 | 10 5.13 89.74 >>>>>>>> 1953 | 10 5.13 94.87 >>>>>>>> 1954 | 10 5.13 100.00 >>>>>>>> ------------+----------------------------------- >>>>>>>> Total | 195 100.00 >>>>>>>> >>>>>>>> . bysort company (year) : gen first = _n == 1 >>>>>>>> >>>>>>>> . l company year if first >>>>>>>> >>>>>>>> +----------------+ >>>>>>>> | company year | >>>>>>>> |----------------| >>>>>>>> 1. | 1 1936 | >>>>>>>> 20. | 2 1935 | >>>>>>>> 40. | 3 1936 | >>>>>>>> 59. | 4 1935 | >>>>>>>> 79. | 5 1936 | >>>>>>>> |----------------| >>>>>>>> 98. | 6 1935 | >>>>>>>> 118. | 7 1936 | >>>>>>>> 137. | 8 1935 | >>>>>>>> 157. | 9 1936 | >>>>>>>> 176. | 10 1935 | >>>>>>>> +----------------+ >>>>>>>> >>>>>>>> Nick >>>>>>>> n.j.cox@durham.ac.uk >>>>>>>> >>>>>>>> Ivan Png >>>>>>>> >>>>>>>> I am analyzing an unbalanced panel of company data, organized by >>>>>>>> company (gvkey) and year. I want to create a flag to the first >>>>>>>> observation of each company in the panel. I tried >>>>>>>> >>>>>>>> . sort gvkey year >>>>>>>> . by gvkey , sort: gen flag = 1 if _n == 1 >>>>>>>> >>>>>>>> However, this only flagged flag = 1 if a company was present in year 1 >>>>>>>> of the panel. It missed any company that appeared in later years. >>>>>>>> >>>>>>>> I searched statalist and found this: >>>>>>>> http://www.stata.com/statalist/archive/2005-04/msg00334.html >>>>>>>> >>>>>>>> But it doesn't work. I'd be grateful for any relevant help. >>>> >>>> * >>>> * For searches and help try: >>>> * http://www.stata.com/help.cgi?search >>>> * http://www.stata.com/support/statalist/faq >>>> * http://www.ats.ucla.edu/stat/stata/ >>> >>> >>> >>> -- >>> Best wishes >>> Ivan Png >>> Skype: ipng00 >>> >>> * >>> * For searches and help try: >>> * http://www.stata.com/help.cgi?search >>> * http://www.stata.com/support/statalist/faq >>> * http://www.ats.ucla.edu/stat/stata/ >> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/statalist/faq >> * http://www.ats.ucla.edu/stat/stata/ > > > > -- > Best wishes > Ivan Png > Skype: ipng00 > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ -- Best wishes Ivan Png Skype: ipng00 * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Identifying first observation in each panel after regression***From:*Ivan Png <iplpng@gmail.com>

**References**:**st: Identifying first observation in each panel after regression***From:*Ivan Png <iplpng@gmail.com>

- Prev by Date:
**RE: st: instrumenting Moving average variable** - Next by Date:
**st: Minimal Detectable Difference** - Previous by thread:
**st: Identifying first observation in each panel after regression** - Next by thread:
**Re: st: Identifying first observation in each panel after regression** - Index(es):