Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Regression by industry and year excluding firm i
From
Fernando Rios Avila <[email protected]>
To
[email protected]
Subject
Re: st: Regression by industry and year excluding firm i
Date
Fri, 13 Dec 2013 16:20:04 -0500
Certainly, SAS and Stata have different features handling data.
Your explanation, unfortunately, doesnt seem to explain the whole
process that the SAS code is currently doing. Your best strategy right
now is to break up the code, and start working with both programs
piece by piece (Brute force approach to learning the code)
Unless you have a coauthor that knows SAS, it will be hard to give you
any further suggestions.
Fernando
On Fri, Dec 13, 2013 at 3:49 PM, Nick Cox <[email protected]> wrote:
> As before, you can't exclude firms *and* still have only one
> regression for each combination of SIC and year. There has to be a
> different regression for each firm, omitting each firm in turn.
>
> Alternatively, what you want is not what I thought I said you want, so
> feel free to change the code to do what you want.
>
> For "STATA" read "Stata".
>
> Nick
> [email protected]
>
>
> On 13 December 2013 20:38, Abdalla, Ahmed <[email protected]> wrote:
>> Thanks Nick.
>> I have unblanced panel data
>>
>> Firmid datadate
>> 1 1990
>> 1 1991
>> 1 1992
>> 2 1990
>> 2 1992
>> 2 1993
>> 3 1990
>> 3 1991
>>
>> so in each year we have a set of observations (firms) nested in a particular sic code. Many sic year groups.As far as I understand the regressions should run for each sic year group, hence we have as many regressions as each sic year group. Note that firms 1 2 3 comes again in year 1990, 1991, etc. 9however in my data, it is unbalanced)
>>
>> The code works but does it achieve what I intend to do ?
>>
>> Sorry for that, but I have got the SAS code (from the paper author) and I used to work on STATA so I hope I can get an equivalent code in STATA. This will benefit a lot of scholars who estimate similar models.
>>
>> Thanks
>> Ahmed
>>
>>
>>
>> ________________________________________
>> From: [email protected] <[email protected]> on behalf of Nick Cox <[email protected]>
>> Sent: 13 December 2013 20:21
>> To: [email protected]
>> Subject: Re: st: Regression by industry and year excluding firm i
>>
>> I understand each observation to be a distinct firm in a distinct
>> year. But as regressions must be for the same SIC and the same year,
>> and exclude each firm in turn, there are still as many regressions as
>> observations.
>> Nick
>> [email protected]
>>
>>
>> On 13 December 2013 20:16, Nick Cox <[email protected]> wrote:
>>> Ahmed said he understood my code. Now he wants to know what it does.
>>>
>>> Please note: I am not an economist (bar A-level Economics grade A (in
>>> joke for the British)).
>>>
>>> As I understand it,
>>>
>>> 1. each observation is a distinct firm (and firms nest within SIC).
>>>
>>> 2. if you want regressions to exclude particular firms, there will be
>>> as many regressions as firms (as observations), regardless of the fact
>>> that many of those regressions will share the same SIC and the same
>>> year.
>>>
>>> Otherwise put, I don't see, Ahmed, that you can have it both ways. If
>>> you want to the _same_ regressions "for each group of two digit SIC
>>> code and fiscal year" then they will include all the firms in that
>>> SIC.
>>>
>>> The code sent previously did contain one bug, as if a regression was
>>> not computed, the previous successful regression coefficients would
>>> still have been used. Here is that fixed. This is also quieter.
>>>
>>> local X wlag_ce wato wlag_acc wacc wdsale wndsale
>>> tokenize "`X'"
>>>
>>> forval j = 0/6 {
>>> gen b`j'=.
>>> }
>>>
>>> forval i = 1/`=_N' {
>>> local same sic_2[`i'] == sic_2 & datadate[`i'] == datadate
>>> qui count if `same' & _n != `i'
>>>
>>> qui if r(N) > 10 {
>>> reg wce `X' if `same' & _n != `i'
>>>
>>> if _rc == 0 {
>>> replace b0 = _b[_cons] in `i'
>>> forval j = 1/6 {
>>> replace b`j' = _b[``j''] in `i'
>>> }
>>> }
>>> }
>>> }
>>> Nick
>>> [email protected]
>>>
>>>
>>> On 13 December 2013 19:58, Abdalla, Ahmed <[email protected]> wrote:
>>>> Dear All
>>>> I might didn't explain well in my initial post. So I just want to be sure we are all on the same line:
>>>>
>>>> I want to run a regression for every two digit SIC code (industry classification) and fiscal year (cross sectionally), while firm i is not included in the observations that are used to estimate the coefficients per industry and fiscal year. Then I use the estimated coefficients from each regression run by industry and fiscal year and multiply them by actual values of firm i that was previously excluded in the regression. This means that the number of regressions will not be 95,000, as regressions will run for each group of two digit SIC code and fiscal year. I drop any sic code fiscal year with observations less than 10.
>>>>
>>>> At the end of the day, I want to have expected core earnings calculated for each firm in each industry and fiscal year in my sample.
>>>>
>>>> I am sorry I might have not been clear at the beginning.
>>>>
>>>> Nick, Is your code intended to achieve that ?
>>>>
>>>> Many thanks
>>>> Ahmed
>>>>
>>>>
>>>>
>>>>
>>>> ________________________________________
>>>> From: [email protected] <[email protected]> on behalf of Sarah Edgington <[email protected]>
>>>> Sent: 13 December 2013 19:41
>>>> To: [email protected]
>>>> Subject: RE: st: Regression by industry and year excluding firm i
>>>>
>>>> Ahmed,
>>>> As an aside, this is strikes me as one of those instances where you would
>>>> benefit a great deal from debugging your code on a subset of your data. You
>>>> need enough data for your regressions to run without errors but I'd try
>>>> getting the loop working on a subset of a few hundred observations rather
>>>> than the whole data set. That will run much more quickly. The resulting
>>>> predictions will be nonsense but they'll serve as a proof of concept. Once
>>>> you're happy that you have code that does what you expect you can run it on
>>>> the whole dataset with a certain amount of confidence that even if it takes
>>>> a very long time, you'll get the results that reflect your intended process.
>>>> -Sarah
>>>>
>>>> -----Original Message-----
>>>> From: [email protected]
>>>> [mailto:[email protected]] On Behalf Of Fernando Rios
>>>> Avila
>>>> Sent: Friday, December 13, 2013 11:21 AM
>>>> To: [email protected]
>>>> Subject: Re: st: Regression by industry and year excluding firm i
>>>>
>>>> Ahmed,
>>>> In addition to Nick Cox comments, keep in mind that based on your
>>>> explanation, you need to run 95000 regressions. which will be very time
>>>> consuming. But, computer time is "cheap".
>>>> I would suggest, however, to clarify if each observation represent a
>>>> different Firm, which is assumption on how your code and Nick's are handling
>>>> the problem.
>>>> Fernando
>>>> HTH
>>>>
>>>> On Fri, Dec 13, 2013 at 2:12 PM, Nick Cox <[email protected]> wrote:
>>>>> Sorry, no.
>>>>>
>>>>> The code hasn't finished running, so
>>>>>
>>>>> 1. Good news. No obvious bug.
>>>>>
>>>>> 2. I'd expect that code to be slow. You want a regression for every
>>>>> observation.
>>>>>
>>>>> I don't think you've demonstrated anything wrong with my code, so I
>>>>> can't possibly fix it. That doesn't mean the code must be right, but
>>>>> you need to show me incorrect results first. The point is that your
>>>>> code would, I imagine, have been even slower had it been correct.
>>>>> Several of the changes I made would have speeded up things compared
>>>>> with your code.
>>>>>
>>>>> I don't have your data to test anything, but without wanting to seem
>>>>> arrogant, I think you need to be confident that I made a mistake
>>>>> before you change my code.
>>>>>
>>>>> Nick
>>>>> [email protected]
>>>>>
>>>>>
>>>>> On 13 December 2013 19:01, Abdalla, Ahmed <[email protected]> wrote:
>>>>>> Dear Nick
>>>>>> Many Thanks for that.
>>>>>> I understand your code now. I ran it. However, STATA has been running the
>>>> loop for more than 40 minutes now and I got no output !!!
>>>>>> I will explain more:
>>>>>> I have a model:
>>>>>> wce= b0+b1wlag_ce+b2 wato+b3 wlag_acc +b4wacc+b5 wdsale+b6 wndsale
>>>>>>
>>>>>> I want to run this model using all observations in a particular industry
>>>> -year excluding firm i. Expected wce for firm i are measured using the
>>>> coefficients I obtain from the industry year regressions multiplied by the
>>>> actual values of the variables in the model for firm i.
>>>>>> As far as I understand your code should achieve my target, but it took
>>>> long time and didn't give any results !
>>>>>> I even tried another code that worked well and give me results in
>>>> seconds, but it doesn't exclude firm i from the estimation. I will write
>>>> this code for you here:
>>>>>> egen sic2id=group(sic_2 datadate)
>>>>>> egen count=count(sic2id), by(sic2id)
>>>>>> drop if count<10
>>>>>> drop count
>>>>>> drop sic2id
>>>>>> egen sic2id=group(sic_2 datadate)
>>>>>>
>>>>>> gen b0=.
>>>>>> gen b1= .
>>>>>> gen b2=.
>>>>>> gen b3=.
>>>>>> gen b4=.
>>>>>> gen b5=.
>>>>>> gen b6=.
>>>>>>
>>>>>> sum sic2id
>>>>>> scalar max2=r(max)
>>>>>> local k=max2
>>>>>> set more off
>>>>>> forvalues x=1(1)`k'{
>>>>>> capture reg wce wlag_ce wato wlag_acc wacc wdsale wndsale if sic2id==`x'
>>>>>> capture replace b0= _b[_cons]
>>>>>> capture replace b1= _b[wlag_ce]
>>>>>> capture replace b2= _b[wato]
>>>>>> capture replace b3= _b[wlag_acc]
>>>>>> capture replace b4= _b[wacc]
>>>>>> capture replace b5= _b[wdsale]
>>>>>> capture replace b6= _b[wndsale]
>>>>>> }
>>>>>>
>>>>>> I appreciate if you can explain what was wrong with your code and update
>>>> the new code I have posted here to exclude firm i.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> ________________________________________
>>>>>> From: [email protected]
>>>>>> <[email protected]> on behalf of Nick Cox
>>>>>> <[email protected]>
>>>>>> Sent: 13 December 2013 18:03
>>>>>> To: [email protected]
>>>>>> Subject: Re: st: Regression by industry and year excluding firm i
>>>>>>
>>>>>> Remarks
>>>>>>
>>>>>> 1. If you are cycling over observations, you don't need a variable
>>>>>> containing observation numbers, nor to use -levelsof-.
>>>>>>
>>>>>> 2. -in- is always faster than the corresponding -if-.
>>>>>>
>>>>>> 3. wlag_ce=!=. is presumably a typo, but to Stata it will be illegal
>>>> syntax.
>>>>>>
>>>>>> 4. -capture replace b0= _b[_cons]- will end with the last intercept
>>>>>> calculated. I guess you don't want that.
>>>>>>
>>>>>> 5. Checking for missing values is redundant as -regress- will never
>>>>>> include them.
>>>>>>
>>>>>> With these and some other small tricks, here is an attempt at
>>>>>> rewriting your code.
>>>>>>
>>>>>> local X wlag_ce wato wlag_acc wacc wdsale wndsale tokenize "`X'"
>>>>>>
>>>>>> forval j = 0/6 {
>>>>>> gen b`j'=.
>>>>>> }
>>>>>>
>>>>>> forval i = 1/`=_N' {
>>>>>> local same sic_2[`i'] == sic_2 & datadate[`i'] == datadate qui count
>>>>>> if `same' & _n != `i'
>>>>>>
>>>>>> if r(N) > 10 {
>>>>>> reg wce `X' if `same' & _n != `i'
>>>>>> }
>>>>>>
>>>>>> quietly if _rc == 0 {
>>>>>> replace b0 = _b[_cons] in `i'
>>>>>> forval j = 1/6 {
>>>>>> replace b`j' = _b[``j''] in `i'
>>>>>> }
>>>>>> }
>>>>>> }
>>>>>>
>>>>>> gen pred_ce= b0 + b1*wlag_ce + b2*wato + b3*wlag_acc + /// b4*wacc +
>>>>>> b5*wdsale + b6*wndsale
>>>>>>
>>>>>> Nick
>>>>>> [email protected]
>>>>>>
>>>>>>
>>>>>> On 13 December 2013 17:33, Abdalla, Ahmed <[email protected]>
>>>> wrote:
>>>>>>> Dear Statalist
>>>>>>> I run a regression to estimate core earnings for each variable in my
>>>> dataset. The regression is run using all observations in a particular
>>>> industry year EXCLUDING firm i. Expected core earnings for firm i is
>>>> estimated using the coefficients multiplied by the actual values of
>>>> variables in the model for firm i.
>>>>>>> I run the following code.
>>>>>>>
>>>>>>> First: I get an error message for macro length being exceeded.
>>>>>>> Second: I try to use other commands for looping, the loop runs but it
>>>> gives me error message for invalid syntax.
>>>>>>> My problem is on how to exclude firm i ? I hope if you have any
>>>> suggestions regarding running regressions by industry and year and excluding
>>>> firm i from the estimation procedures.
>>>>>>>
>>>>>>>
>>>>>>> gen obs= [_n]
>>>>>>> gen runn=1
>>>>>>>
>>>>>>> gen b0=.
>>>>>>> gen b1= .
>>>>>>> gen b2=.
>>>>>>> gen b3=.
>>>>>>> gen b4=.
>>>>>>> gen b5=.
>>>>>>> gen b6=.
>>>>>>>
>>>>>>> levelsof obs,local(levels)
>>>>>>> foreach x of local levels{
>>>>>>> gen mark=1 if obs==runn
>>>>>>> gen sic_lp= sic_2 if obs ==runn
>>>>>>> qui summ sic_lp
>>>>>>> replace sic_lp = r(mean) if sic_lp==.
>>>>>>> gen datadate_lp= datadate if obs == runn qui summ datadate_lp
>>>>>>> replace datadate_lp = r(mean) if datadate_lp==.
>>>>>>> format datadate_lp %d
>>>>>>> gen sample =1 if sic_lp== sic_2 & datadate_lp== datadate & sale !=. &
>>>> wce !=. & wlag_ce=!=. & wato !=. & wacc !=. & wlag_acc!=. & wdsale !=. &
>>>> wndsale !=.
>>>>>>> egen sample_sum= sum(sample) if mark != 1 capture reg wce wlag_ce
>>>>>>> wato wlag_acc wacc wdsale wndsale if sample==1 & mark != 1 &
>>>>>>> sample_sum >10 capture replace b0= _b[_cons] capture replace b1=
>>>>>>> _b[wlag_ce] if obs==runn capture replace b2= _b[wato] if obs==runn
>>>>>>> capture replace b3= _b[wlag_acc] if obs==runn capture replace b4=
>>>>>>> _b[wacc] if obs==runn capture replace b5= _b[wdsale] if obs==runn
>>>>>>> capture replace b6= _b[wndsale] if obs==runn drop mark sic_lp
>>>>>>> datadate_lp sample sample_sum replace runn= runn+1 }
>>>>>>>
>>>>>>> gen pred_ce= b0+ b1*wlag_ce + b2*wato +b3*wlag_acc + b4*wacc +
>>>>>>> b5*wdsale + b6*wndsale
>>>>>>>
>>>>>>>
>>>>>>> I appreciate your help
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *
>>>>>>> * For searches and help try:
>>>>>>> * http://www.stata.com/help.cgi?search
>>>>>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>>>> * http://www.ats.ucla.edu/stat/stata/
>>>>>>
>>>>>> *
>>>>>> * For searches and help try:
>>>>>> * http://www.stata.com/help.cgi?search
>>>>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>>> * http://www.ats.ucla.edu/stat/stata/
>>>>>>
>>>>>> *
>>>>>> * For searches and help try:
>>>>>> * http://www.stata.com/help.cgi?search
>>>>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>>> * http://www.ats.ucla.edu/stat/stata/
>>>>>
>>>>> *
>>>>> * For searches and help try:
>>>>> * http://www.stata.com/help.cgi?search
>>>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>> * http://www.ats.ucla.edu/stat/stata/
>>>>
>>>> *
>>>> * For searches and help try:
>>>> * http://www.stata.com/help.cgi?search
>>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>>> * http://www.ats.ucla.edu/stat/stata/
>>>>
>>>> *
>>>> * For searches and help try:
>>>> * http://www.stata.com/help.cgi?search
>>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>>> * http://www.ats.ucla.edu/stat/stata/
>>>>
>>>> *
>>>> * For searches and help try:
>>>> * http://www.stata.com/help.cgi?search
>>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>>> * http://www.ats.ucla.edu/stat/stata/
>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/