Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Regression by industry and year excluding firm i
From
Nick Cox <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: Regression by industry and year excluding firm i
Date
Fri, 13 Dec 2013 20:18:56 +0000
This is very good advice.
Nick
[email protected]
On 13 December 2013 19:41, Sarah Edgington <[email protected]> wrote:
> Ahmed,
> As an aside, this is strikes me as one of those instances where you would
> benefit a great deal from debugging your code on a subset of your data. You
> need enough data for your regressions to run without errors but I'd try
> getting the loop working on a subset of a few hundred observations rather
> than the whole data set. That will run much more quickly. The resulting
> predictions will be nonsense but they'll serve as a proof of concept. Once
> you're happy that you have code that does what you expect you can run it on
> the whole dataset with a certain amount of confidence that even if it takes
> a very long time, you'll get the results that reflect your intended process.
> -Sarah
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Fernando Rios
> Avila
> Sent: Friday, December 13, 2013 11:21 AM
> To: [email protected]
> Subject: Re: st: Regression by industry and year excluding firm i
>
> Ahmed,
> In addition to Nick Cox comments, keep in mind that based on your
> explanation, you need to run 95000 regressions. which will be very time
> consuming. But, computer time is "cheap".
> I would suggest, however, to clarify if each observation represent a
> different Firm, which is assumption on how your code and Nick's are handling
> the problem.
> Fernando
> HTH
>
> On Fri, Dec 13, 2013 at 2:12 PM, Nick Cox <[email protected]> wrote:
>> Sorry, no.
>>
>> The code hasn't finished running, so
>>
>> 1. Good news. No obvious bug.
>>
>> 2. I'd expect that code to be slow. You want a regression for every
>> observation.
>>
>> I don't think you've demonstrated anything wrong with my code, so I
>> can't possibly fix it. That doesn't mean the code must be right, but
>> you need to show me incorrect results first. The point is that your
>> code would, I imagine, have been even slower had it been correct.
>> Several of the changes I made would have speeded up things compared
>> with your code.
>>
>> I don't have your data to test anything, but without wanting to seem
>> arrogant, I think you need to be confident that I made a mistake
>> before you change my code.
>>
>> Nick
>> [email protected]
>>
>>
>> On 13 December 2013 19:01, Abdalla, Ahmed <[email protected]> wrote:
>>> Dear Nick
>>> Many Thanks for that.
>>> I understand your code now. I ran it. However, STATA has been running the
> loop for more than 40 minutes now and I got no output !!!
>>> I will explain more:
>>> I have a model:
>>> wce= b0+b1wlag_ce+b2 wato+b3 wlag_acc +b4wacc+b5 wdsale+b6 wndsale
>>>
>>> I want to run this model using all observations in a particular industry
> -year excluding firm i. Expected wce for firm i are measured using the
> coefficients I obtain from the industry year regressions multiplied by the
> actual values of the variables in the model for firm i.
>>> As far as I understand your code should achieve my target, but it took
> long time and didn't give any results !
>>> I even tried another code that worked well and give me results in
> seconds, but it doesn't exclude firm i from the estimation. I will write
> this code for you here:
>>> egen sic2id=group(sic_2 datadate)
>>> egen count=count(sic2id), by(sic2id)
>>> drop if count<10
>>> drop count
>>> drop sic2id
>>> egen sic2id=group(sic_2 datadate)
>>>
>>> gen b0=.
>>> gen b1= .
>>> gen b2=.
>>> gen b3=.
>>> gen b4=.
>>> gen b5=.
>>> gen b6=.
>>>
>>> sum sic2id
>>> scalar max2=r(max)
>>> local k=max2
>>> set more off
>>> forvalues x=1(1)`k'{
>>> capture reg wce wlag_ce wato wlag_acc wacc wdsale wndsale if sic2id==`x'
>>> capture replace b0= _b[_cons]
>>> capture replace b1= _b[wlag_ce]
>>> capture replace b2= _b[wato]
>>> capture replace b3= _b[wlag_acc]
>>> capture replace b4= _b[wacc]
>>> capture replace b5= _b[wdsale]
>>> capture replace b6= _b[wndsale]
>>> }
>>>
>>> I appreciate if you can explain what was wrong with your code and update
> the new code I have posted here to exclude firm i.
>>>
>>>
>>>
>>>
>>> ________________________________________
>>> From: [email protected]
>>> <[email protected]> on behalf of Nick Cox
>>> <[email protected]>
>>> Sent: 13 December 2013 18:03
>>> To: [email protected]
>>> Subject: Re: st: Regression by industry and year excluding firm i
>>>
>>> Remarks
>>>
>>> 1. If you are cycling over observations, you don't need a variable
>>> containing observation numbers, nor to use -levelsof-.
>>>
>>> 2. -in- is always faster than the corresponding -if-.
>>>
>>> 3. wlag_ce=!=. is presumably a typo, but to Stata it will be illegal
> syntax.
>>>
>>> 4. -capture replace b0= _b[_cons]- will end with the last intercept
>>> calculated. I guess you don't want that.
>>>
>>> 5. Checking for missing values is redundant as -regress- will never
>>> include them.
>>>
>>> With these and some other small tricks, here is an attempt at
>>> rewriting your code.
>>>
>>> local X wlag_ce wato wlag_acc wacc wdsale wndsale tokenize "`X'"
>>>
>>> forval j = 0/6 {
>>> gen b`j'=.
>>> }
>>>
>>> forval i = 1/`=_N' {
>>> local same sic_2[`i'] == sic_2 & datadate[`i'] == datadate qui count
>>> if `same' & _n != `i'
>>>
>>> if r(N) > 10 {
>>> reg wce `X' if `same' & _n != `i'
>>> }
>>>
>>> quietly if _rc == 0 {
>>> replace b0 = _b[_cons] in `i'
>>> forval j = 1/6 {
>>> replace b`j' = _b[``j''] in `i'
>>> }
>>> }
>>> }
>>>
>>> gen pred_ce= b0 + b1*wlag_ce + b2*wato + b3*wlag_acc + /// b4*wacc +
>>> b5*wdsale + b6*wndsale
>>>
>>> Nick
>>> [email protected]
>>>
>>>
>>> On 13 December 2013 17:33, Abdalla, Ahmed <[email protected]>
> wrote:
>>>> Dear Statalist
>>>> I run a regression to estimate core earnings for each variable in my
> dataset. The regression is run using all observations in a particular
> industry year EXCLUDING firm i. Expected core earnings for firm i is
> estimated using the coefficients multiplied by the actual values of
> variables in the model for firm i.
>>>> I run the following code.
>>>>
>>>> First: I get an error message for macro length being exceeded.
>>>> Second: I try to use other commands for looping, the loop runs but it
> gives me error message for invalid syntax.
>>>> My problem is on how to exclude firm i ? I hope if you have any
> suggestions regarding running regressions by industry and year and excluding
> firm i from the estimation procedures.
>>>>
>>>>
>>>> gen obs= [_n]
>>>> gen runn=1
>>>>
>>>> gen b0=.
>>>> gen b1= .
>>>> gen b2=.
>>>> gen b3=.
>>>> gen b4=.
>>>> gen b5=.
>>>> gen b6=.
>>>>
>>>> levelsof obs,local(levels)
>>>> foreach x of local levels{
>>>> gen mark=1 if obs==runn
>>>> gen sic_lp= sic_2 if obs ==runn
>>>> qui summ sic_lp
>>>> replace sic_lp = r(mean) if sic_lp==.
>>>> gen datadate_lp= datadate if obs == runn qui summ datadate_lp
>>>> replace datadate_lp = r(mean) if datadate_lp==.
>>>> format datadate_lp %d
>>>> gen sample =1 if sic_lp== sic_2 & datadate_lp== datadate & sale !=. &
> wce !=. & wlag_ce=!=. & wato !=. & wacc !=. & wlag_acc!=. & wdsale !=. &
> wndsale !=.
>>>> egen sample_sum= sum(sample) if mark != 1 capture reg wce wlag_ce
>>>> wato wlag_acc wacc wdsale wndsale if sample==1 & mark != 1 &
>>>> sample_sum >10 capture replace b0= _b[_cons] capture replace b1=
>>>> _b[wlag_ce] if obs==runn capture replace b2= _b[wato] if obs==runn
>>>> capture replace b3= _b[wlag_acc] if obs==runn capture replace b4=
>>>> _b[wacc] if obs==runn capture replace b5= _b[wdsale] if obs==runn
>>>> capture replace b6= _b[wndsale] if obs==runn drop mark sic_lp
>>>> datadate_lp sample sample_sum replace runn= runn+1 }
>>>>
>>>> gen pred_ce= b0+ b1*wlag_ce + b2*wato +b3*wlag_acc + b4*wacc +
>>>> b5*wdsale + b6*wndsale
>>>>
>>>>
>>>> I appreciate your help
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *
>>>> * For searches and help try:
>>>> * http://www.stata.com/help.cgi?search
>>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>>> * http://www.ats.ucla.edu/stat/stata/
>>>
>>> *
>>> * For searches and help try:
>>> * http://www.stata.com/help.cgi?search
>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>> * http://www.ats.ucla.edu/stat/stata/
>>>
>>> *
>>> * For searches and help try:
>>> * http://www.stata.com/help.cgi?search
>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>> * http://www.ats.ucla.edu/stat/stata/
>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/