Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Regression with cluster error hangs program


From   Nick Cox <[email protected]>
To   "[email protected]" <[email protected]>
Subject   Re: st: Regression with cluster error hangs program
Date   Tue, 11 Feb 2014 21:25:54 +0000

Crucial details aside on fitting elephantine models to enormous
datasets, I want to
urge caution and restraint on the terms "crashes", "hangs", and so forth.

1. Part of the problem is that these terms are not descriptive unless
you say exactly what you mean.

2. The other part of the problem is a widespread tendency to
exaggerate when talking of programming problems.

Thus to me a "crash" can only mean that a command, or Stata, or your
operating system, or your computer stopped abruptly and threw you out
by terminating prematurely.

A "hang" I take to mean that the program appears to stop doing
anything or to be moving so slowly that no progress is apparent. No
results are evident (yet).

If your terminology is different, then we may have problem 1 or
problem 2 or both. Either way, it is better to describe in detail what
is or is not happening.


Nick
[email protected]


On 11 February 2014 21:15, Sebastian Say <[email protected]> wrote:
> Hi Alfonso, thanks for taking your time and explanation. I understand
> your point. I have an unbalanced panel, with 4900 firms. Years per
> firm range from 3 - 22.
>
> Yes you are right I was looking at the output and I saw several lines
> saying "XXX#yrdiff omitted because of collinearity".
>
> Say if I am not really interested in the specific firm time trend per
> se, but rather I would like to see if my IVs that I am interested in
> remain significant even after modelling for firm specific time trend,
> is using the xtreg DV IV yrdiff i.firm#c.yrdiff, noconstant a better
> way?
>
> That being said, i'm doing some additional postestimation using the
> testnl command and would like to know if the time saving using xtreg
> (in the manner I highlighted above) applies. Right now with this
> pooled OLS, even the testnl command seems to take really long too.
>
> Thanks
>
> On Tue, Feb 11, 2014 at 2:35 PM, Alfonso Sanchez-Penalver
> <[email protected]> wrote:
>> Hi Seb,
>>
>> As I'm sure you know the problem is collinearity. The simple addition of all the dummy variables is equal to a constant term (vector of ones). Which is why Stata ought to be dropping one firm dummy in your estimation. Since you're interested in the specific firm trend, I would just drop the constant and include i.firm##c.yrdiff in your pooled OLS regression (-regress-).
>>
>> I'm not surprised that -xtreg, fe- was much faster because what it does is demean the variables by firm, thus not needing to include the firms' dummies, so it reduces the size of the matrices and thus makes the inversion of them much faster. If you don't need the firm specific effects then this is much more efficient way of estimating the model.
>>
>> I don't know if you mentioned this before, but how many firms do you have, and how many years per firm? Is it a balanced or an unbalanced panel?
>>
>> Let me know if I cleared out my point.
>>
>> Alfonso Sanchez-Penalver
>>
>>> On Feb 11, 2014, at 3:20 PM, Gui Deng Say <[email protected]> wrote:
>>>
>>> Hi Alfonso, thanks for your reply and for your time. I see what you mean.
>>>
>>> Could you elaborate regarding the noconstant point why it's not necessary?
>>>
>>> At this point I do not need the coefficients for dummies. But i do
>>> need to take into account the time trend interaction to see if it's
>>> driving my results.
>>>
>>> To save some time, I tried xtreg, fe  without including i. Firm but
>>> still including the i.firm#yrdiff. This cuts down about 20 percent of
>>> waiting time. However, when I tried to cluster by firm, the software
>>> once again kept running and hangs.
>>>
>>> I'm wondering why stata crashes whenever I try to cluster errors by
>>> firm. Or more great broadly, under what circumstances does stata hang
>>> when running regressions?
>>>
>>> Best,
>>> Seb
>>>
>>>> On Tue, Feb 11, 2014 at 1:24 PM, Gui Deng Say <[email protected]> wrote:
>>>> Hi Alfonso, thanks for your reply and for your time. I see what you mean.
>>>>
>>>> Could you elaborate regarding the noconstant point why it's not necessary?
>>>>
>>>> At this point I do not need the coefficients for dummies. But i do need to
>>>> take into account the time trend interaction to see if it's driving my
>>>> results.
>>>>
>>>> To save some time, I tried xtreg, fe  without including i. Firm but still
>>>> including the i.firm#yrdiff. This cuts down about 20 percent of waiting
>>>> time. However, when I tried to cluster by firm, the software once again kept
>>>> running and hangs.
>>>>
>>>> I'm wondering why stata crashes whenever I try to cluster errors by firm. Or
>>>> more great broadly, under what circumstances does stata hang when running
>>>> regressions?
>>>>
>>>> Best,
>>>> Seb
>>>>
>>>> On Feb 11, 2014 6:48 AM, "Alfonso Sanchez-Penalver"
>>>> <[email protected]> wrote:
>>>>>
>>>>> Hi Seb,
>>>>>
>>>>> A couple of comments. First if you want both the main effects and the
>>>>> interaction effect you can write -i.firm##c.yrdiff- instead of having to
>>>>> write things twice.
>>>>>
>>>>> My second question is why do you expect further correlation of the errors
>>>>> by firm, which is what clustering the variance corrects for. By further
>>>>> correlation I mean that you are already accounting for differences in the
>>>>> unobserved means by firm by introducing he firms' dummies, so how would the
>>>>> errors be correlated within the firms now that they don't have differences
>>>>> in values?
>>>>>
>>>>> Lastly I would suggest using no constant in your regression since you have
>>>>> both firm fixed effects and firm specific trends.
>>>>>
>>>>> I hope this helps,
>>>>>
>>>>> Alfonso Sanchez-Penalver
>>>>>
>>>>>> On Feb 11, 2014, at 4:03 AM, Gui Deng Say <[email protected]> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>   I am using Stata13MP and I have two questions regarding OLS
>>>>>> regressions. I have an unbalanced firm-year panel consisting of 35k
>>>>>> observations, about 4900 firms.
>>>>>>
>>>>>> I am trying trying to estimate the following model.
>>>>>>
>>>>>> regress DV IV i.firm yrdiff i.firm#c.yrdiff
>>>>>>
>>>>>> where yrdiff is a time counter variable, measured relative to a
>>>>>> particular year. The reason i'm using i.firm#c.yrdiff is to control
>>>>>> for firm specific time trend
>>>>>>
>>>>>> q1. Firstly, estimating this model takes very long ~ 2 hours. Is this
>>>>>> normal? If not, what might be the reason(s)?
>>>>>>
>>>>>> q2. Secondly I tried to cluster the standard errors by firm. i.e. i
>>>>>> tried this model
>>>>>> regress DV IV i.firm yrdiff i.firm#c.yrdiff, vce(cluster firm)
>>>>>>
>>>>>> This regression kept running...and in the end, the Stata program
>>>>>> freezes. Any ideas?
>>>>>>
>>>>>> Many thanks,
>>>>>> Seb
>>>>>> *
>>>>>> *   For searches and help try:
>>>>>> *   http://www.stata.com/help.cgi?search
>>>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>>
>>>>> *
>>>>> *   For searches and help try:
>>>>> *   http://www.stata.com/help.cgi?search
>>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>> *   http://www.ats.ucla.edu/stat/stata/
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index