Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Regression with cluster error hangs program


From   Alfonso Sánchez-Peñalver <[email protected]>
To   Stata List <[email protected]>
Subject   Re: st: Regression with cluster error hangs program
Date   Thu, 13 Feb 2014 14:28:14 -0500

Hi Sebastian,

I’m sorry I didn’t reply earlier but I can’t seem to find the last response you wrote to this subject. I know there was something I wanted to comment on. You mention that the smallest number of observations in a panel is 3 observations, yet you want to model correlation across 3 observations? Notice that is what you’re doing when asking for the cluster robust variance. I’m not sure that this is what’s causing your estimations to take so long, but it was something that just stood out when reading it.

Alfonso Sánchez-Peñalver, PhD

Visiting Assistant Professor
Suffolk University
Senior Instructor
UMass Boston



On Feb 11, 2014, at 4:37 PM, Sebastian Say <[email protected]> wrote:

> Hi Nick, good point. I refer to the latter, whereby the program
> remains running -i.e. I can hear my CPU fan on my mac at higher rpm
> speeds.
> 
> When I run the regression without cluster (firm), the big model took
> me about 2h to give an output on average.
> When I run the regression with cluster(firm) errors, even after
> waiting for 4 hrs, the fan is still running. I noticed this might be
> odd so i pulled up the computer's task manager. It showed Stata is
> 'not responding'.
> 
> Same thing happens when I try it on pc.
> 
> Thanks
> 
> On Tue, Feb 11, 2014 at 3:25 PM, Nick Cox <[email protected]> wrote:
>> Crucial details aside on fitting elephantine models to enormous
>> datasets, I want to
>> urge caution and restraint on the terms "crashes", "hangs", and so forth.
>> 
>> 1. Part of the problem is that these terms are not descriptive unless
>> you say exactly what you mean.
>> 
>> 2. The other part of the problem is a widespread tendency to
>> exaggerate when talking of programming problems.
>> 
>> Thus to me a "crash" can only mean that a command, or Stata, or your
>> operating system, or your computer stopped abruptly and threw you out
>> by terminating prematurely.
>> 
>> A "hang" I take to mean that the program appears to stop doing
>> anything or to be moving so slowly that no progress is apparent. No
>> results are evident (yet).
>> 
>> If your terminology is different, then we may have problem 1 or
>> problem 2 or both. Either way, it is better to describe in detail what
>> is or is not happening.
>> 
>> 
>> Nick
>> [email protected]
>> 
>> 
>> On 11 February 2014 21:15, Sebastian Say <[email protected]> wrote:
>>> Hi Alfonso, thanks for taking your time and explanation. I understand
>>> your point. I have an unbalanced panel, with 4900 firms. Years per
>>> firm range from 3 - 22.
>>> 
>>> Yes you are right I was looking at the output and I saw several lines
>>> saying "XXX#yrdiff omitted because of collinearity".
>>> 
>>> Say if I am not really interested in the specific firm time trend per
>>> se, but rather I would like to see if my IVs that I am interested in
>>> remain significant even after modelling for firm specific time trend,
>>> is using the xtreg DV IV yrdiff i.firm#c.yrdiff, noconstant a better
>>> way?
>>> 
>>> That being said, i'm doing some additional postestimation using the
>>> testnl command and would like to know if the time saving using xtreg
>>> (in the manner I highlighted above) applies. Right now with this
>>> pooled OLS, even the testnl command seems to take really long too.
>>> 
>>> Thanks
>>> 
>>> On Tue, Feb 11, 2014 at 2:35 PM, Alfonso Sanchez-Penalver
>>> <[email protected]> wrote:
>>>> Hi Seb,
>>>> 
>>>> As I'm sure you know the problem is collinearity. The simple addition of all the dummy variables is equal to a constant term (vector of ones). Which is why Stata ought to be dropping one firm dummy in your estimation. Since you're interested in the specific firm trend, I would just drop the constant and include i.firm##c.yrdiff in your pooled OLS regression (-regress-).
>>>> 
>>>> I'm not surprised that -xtreg, fe- was much faster because what it does is demean the variables by firm, thus not needing to include the firms' dummies, so it reduces the size of the matrices and thus makes the inversion of them much faster. If you don't need the firm specific effects then this is much more efficient way of estimating the model.
>>>> 
>>>> I don't know if you mentioned this before, but how many firms do you have, and how many years per firm? Is it a balanced or an unbalanced panel?
>>>> 
>>>> Let me know if I cleared out my point.
>>>> 
>>>> Alfonso Sanchez-Penalver
>>>> 
>>>>> On Feb 11, 2014, at 3:20 PM, Gui Deng Say <[email protected]> wrote:
>>>>> 
>>>>> Hi Alfonso, thanks for your reply and for your time. I see what you mean.
>>>>> 
>>>>> Could you elaborate regarding the noconstant point why it's not necessary?
>>>>> 
>>>>> At this point I do not need the coefficients for dummies. But i do
>>>>> need to take into account the time trend interaction to see if it's
>>>>> driving my results.
>>>>> 
>>>>> To save some time, I tried xtreg, fe  without including i. Firm but
>>>>> still including the i.firm#yrdiff. This cuts down about 20 percent of
>>>>> waiting time. However, when I tried to cluster by firm, the software
>>>>> once again kept running and hangs.
>>>>> 
>>>>> I'm wondering why stata crashes whenever I try to cluster errors by
>>>>> firm. Or more great broadly, under what circumstances does stata hang
>>>>> when running regressions?
>>>>> 
>>>>> Best,
>>>>> Seb
>>>>> 
>>>>>> On Tue, Feb 11, 2014 at 1:24 PM, Gui Deng Say <[email protected]> wrote:
>>>>>> Hi Alfonso, thanks for your reply and for your time. I see what you mean.
>>>>>> 
>>>>>> Could you elaborate regarding the noconstant point why it's not necessary?
>>>>>> 
>>>>>> At this point I do not need the coefficients for dummies. But i do need to
>>>>>> take into account the time trend interaction to see if it's driving my
>>>>>> results.
>>>>>> 
>>>>>> To save some time, I tried xtreg, fe  without including i. Firm but still
>>>>>> including the i.firm#yrdiff. This cuts down about 20 percent of waiting
>>>>>> time. However, when I tried to cluster by firm, the software once again kept
>>>>>> running and hangs.
>>>>>> 
>>>>>> I'm wondering why stata crashes whenever I try to cluster errors by firm. Or
>>>>>> more great broadly, under what circumstances does stata hang when running
>>>>>> regressions?
>>>>>> 
>>>>>> Best,
>>>>>> Seb
>>>>>> 
>>>>>> On Feb 11, 2014 6:48 AM, "Alfonso Sanchez-Penalver"
>>>>>> <[email protected]> wrote:
>>>>>>> 
>>>>>>> Hi Seb,
>>>>>>> 
>>>>>>> A couple of comments. First if you want both the main effects and the
>>>>>>> interaction effect you can write -i.firm##c.yrdiff- instead of having to
>>>>>>> write things twice.
>>>>>>> 
>>>>>>> My second question is why do you expect further correlation of the errors
>>>>>>> by firm, which is what clustering the variance corrects for. By further
>>>>>>> correlation I mean that you are already accounting for differences in the
>>>>>>> unobserved means by firm by introducing he firms' dummies, so how would the
>>>>>>> errors be correlated within the firms now that they don't have differences
>>>>>>> in values?
>>>>>>> 
>>>>>>> Lastly I would suggest using no constant in your regression since you have
>>>>>>> both firm fixed effects and firm specific trends.
>>>>>>> 
>>>>>>> I hope this helps,
>>>>>>> 
>>>>>>> Alfonso Sanchez-Penalver
>>>>>>> 
>>>>>>>> On Feb 11, 2014, at 4:03 AM, Gui Deng Say <[email protected]> wrote:
>>>>>>>> 
>>>>>>>> Hi,
>>>>>>>>  I am using Stata13MP and I have two questions regarding OLS
>>>>>>>> regressions. I have an unbalanced firm-year panel consisting of 35k
>>>>>>>> observations, about 4900 firms.
>>>>>>>> 
>>>>>>>> I am trying trying to estimate the following model.
>>>>>>>> 
>>>>>>>> regress DV IV i.firm yrdiff i.firm#c.yrdiff
>>>>>>>> 
>>>>>>>> where yrdiff is a time counter variable, measured relative to a
>>>>>>>> particular year. The reason i'm using i.firm#c.yrdiff is to control
>>>>>>>> for firm specific time trend
>>>>>>>> 
>>>>>>>> q1. Firstly, estimating this model takes very long ~ 2 hours. Is this
>>>>>>>> normal? If not, what might be the reason(s)?
>>>>>>>> 
>>>>>>>> q2. Secondly I tried to cluster the standard errors by firm. i.e. i
>>>>>>>> tried this model
>>>>>>>> regress DV IV i.firm yrdiff i.firm#c.yrdiff, vce(cluster firm)
>>>>>>>> 
>>>>>>>> This regression kept running...and in the end, the Stata program
>>>>>>>> freezes. Any ideas?
>>>>>>>> 
>>>>>>>> Many thanks,
>>>>>>>> Seb
>>>>>>>> *
>>>>>>>> *   For searches and help try:
>>>>>>>> *   http://www.stata.com/help.cgi?search
>>>>>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>>>> 
>>>>>>> *
>>>>>>> *   For searches and help try:
>>>>>>> *   http://www.stata.com/help.cgi?search
>>>>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>> *
>>>>> *   For searches and help try:
>>>>> *   http://www.stata.com/help.cgi?search
>>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>> 
>>>> *
>>>> *   For searches and help try:
>>>> *   http://www.stata.com/help.cgi?search
>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>> *   http://www.ats.ucla.edu/stat/stata/
>>> 
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>> *   http://www.ats.ucla.edu/stat/stata/
>> 
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index