Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Regression by industry and year excluding firm i


From   Steve Samuels <sjsamuels@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Regression by industry and year excluding firm i
Date   Sat, 14 Dec 2013 13:31:14 -0500

Sorry, the last two equations in the explanation were incorrect. 

They are corrected below. The code is unchanged.

S.

Stata can compute all delete-one predictions
in a group after just one run of regress:

Let # indicate that an observation has been omitted
before calculation of the corresponding statistic

If "y" is the response variable.
I use the following regression identity

1) y = xb# + e#

xb# =  delete-one predicted mean
e# =  delete-one residual

Thus

(2) xb# = y- e#

So it's just necessary to compute e#

Let e be the ordinary residual.

(3) e# = e/(1-h)

where h is the observation's diagonal element of the
hat matrix   (See any regression text)

Stata can compute 1- h, because it can
compute the standard error of a residual (stdr)
after regress, and:

(4) stdr = s * sqrt(1- h) , where s is is the rmse

(5) So: 1- h = (stdr/s)^2

(6) e# = e/(1-h) = e * (s/stdr)^2

Substituting into (2), gives

(7) xb# = y - e * (s/stdr)^2


Steve

Steven J. Samuels
18 Cantine's Island
Saugerties NY 12477 USA

**********CODE BEGINS***********************
/* Data set to hold results */
clear
save d_hold, emptyok replace

sysuse auto, clear
gen highp= price>5000 /* new category */

/* Set up  variables for regress */
local xvars  trunk turn  /* predictors */
local y mpg       /* outcome */
local byvars  highp foreign  /* grouping variables */

egen group = group(`byvars')
levelsof group, local(levels)
tempfile t1
save `t1'

tempvar e stdr

foreach x of local levels{
use `t1', clear
keep if group==`x'
reg price `xvars'
scalar mse = e(mss)

predict double `e' , residual
predict double `stdr', stdr
gen double pred_del =  `y' - `e'*mse/`stdr'^2
append using d_hold
save d_hold, replace
}
label var pred_del "Delete-one Prediction"
sum  pred_del
**************CODE ENDS**************




On Dec 13, 2013, at 2:58 PM, Abdalla, Ahmed wrote:

Dear All
I might didn't explain well in my initial post. So I just want to be sure we are all on the same line:

I want to run a regression for every two digit SIC code (industry classification) and fiscal year (cross sectionally), while firm i is not included in the observations that are used to estimate the coefficients per industry and fiscal year. Then I use the estimated coefficients from each regression run by industry and fiscal year and multiply them by actual values of firm i that was previously excluded in the regression. This means that the number of regressions will not be 95,000, as regressions will run for each group of two digit SIC code and fiscal year. I drop any sic code fiscal year with observations less than 10.

At the end of the day, I want to have expected core earnings calculated for each firm in each industry and fiscal year in my sample.

I am sorry I might have not been clear at the beginning.

Nick, Is your code intended to achieve that ?

Many thanks
Ahmed




________________________________________
From: owner-statalist@hsphsun2.harvard.edu <owner-statalist@hsphsun2.harvard.edu> on behalf of Sarah Edgington <sedging@ucla.edu>
Sent: 13 December 2013 19:41
To: statalist@hsphsun2.harvard.edu
Subject: RE: st: Regression by industry and year excluding firm i

Ahmed,
As an aside, this is strikes me as one of those instances where you would
benefit a great deal from debugging your code on a subset of your data.  You
need enough data for your regressions to run without errors but I'd try
getting the loop working on a subset of a few hundred observations rather
than the whole data set.  That will run much more quickly.  The resulting
predictions will be nonsense but they'll serve as a proof of concept.  Once
you're happy that you have code that does what you expect you can run it on
the whole dataset with a certain amount of confidence that even if it takes
a very long time, you'll get the results that reflect your intended process.
-Sarah

-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Fernando Rios
Avila
Sent: Friday, December 13, 2013 11:21 AM
To: statalist@hsphsun2.harvard.edu
Subject: Re: st: Regression by industry and year excluding firm i

Ahmed,
In addition to Nick Cox comments, keep in mind that based on your
explanation, you need to run 95000 regressions. which will be very time
consuming. But, computer time is "cheap".
I would suggest, however, to clarify if each observation represent a
different Firm, which is assumption on how your code and Nick's are handling
the problem.
Fernando
HTH

On Fri, Dec 13, 2013 at 2:12 PM, Nick Cox <njcoxstata@gmail.com> wrote:
> Sorry, no.
> 
> The code hasn't finished running, so
> 
> 1. Good news. No obvious bug.
> 
> 2. I'd expect that code to be slow. You want a regression for every
> observation.
> 
> I don't think you've demonstrated anything wrong with my code, so I
> can't possibly fix it. That doesn't mean the code must be right, but
> you need to show me incorrect results first. The point is that your
> code would, I imagine, have been even slower had it been correct.
> Several of the changes I made would have speeded up things compared
> with your code.
> 
> I don't have your data to test anything, but without wanting to seem
> arrogant, I think you need to be confident that I made a mistake
> before you change my code.
> 
> Nick
> njcoxstata@gmail.com
> 
> 
> On 13 December 2013 19:01, Abdalla, Ahmed <ahmed.abdalla@kcl.ac.uk> wrote:
>> Dear Nick
>> Many Thanks for that.
>> I understand your code now. I ran it. However, STATA has been running the
loop for more than 40 minutes now and I got no output !!!
>> I will explain more:
>> I have a model:
>> wce= b0+b1wlag_ce+b2 wato+b3 wlag_acc +b4wacc+b5 wdsale+b6 wndsale
>> 
>> I want to run this model using all observations in a particular industry
-year excluding firm i. Expected wce for firm i are measured using the
coefficients I obtain from the industry year regressions multiplied by the
actual values of the variables in the model for firm i.
>> As far as I understand your code should achieve my target, but it took
long time and didn't give any results !
>> I even tried another code that worked well and give me results in
seconds, but it doesn't exclude firm i from the estimation. I will write
this code for you here:
>> egen sic2id=group(sic_2 datadate)
>> egen count=count(sic2id), by(sic2id)
>> drop if count<10
>> drop count
>> drop sic2id
>> egen sic2id=group(sic_2 datadate)
>> 
>> gen b0=.
>> gen b1= .
>> gen b2=.
>> gen b3=.
>> gen b4=.
>> gen b5=.
>> gen b6=.
>> 
>> sum sic2id
>> scalar max2=r(max)
>> local k=max2
>> set more off
>> forvalues x=1(1)`k'{
>> capture reg wce wlag_ce wato wlag_acc wacc wdsale wndsale if sic2id==`x'
>> capture replace b0= _b[_cons]
>> capture replace b1= _b[wlag_ce]
>> capture replace b2= _b[wato]
>> capture replace b3= _b[wlag_acc]
>> capture replace b4= _b[wacc]
>> capture replace b5= _b[wdsale]
>> capture replace b6= _b[wndsale]
>> }
>> 
>> I appreciate if you can explain what was wrong with your code and update
the new code I have posted here to exclude firm i.
>> 
>> 
>> 
>> 
>> ________________________________________
>> From: owner-statalist@hsphsun2.harvard.edu
>> <owner-statalist@hsphsun2.harvard.edu> on behalf of Nick Cox
>> <njcoxstata@gmail.com>
>> Sent: 13 December 2013 18:03
>> To: statalist@hsphsun2.harvard.edu
>> Subject: Re: st: Regression by industry and year excluding firm i
>> 
>> Remarks
>> 
>> 1. If you are cycling over observations, you don't need a variable
>> containing observation numbers, nor to use -levelsof-.
>> 
>> 2. -in- is always faster than the corresponding -if-.
>> 
>> 3. wlag_ce=!=. is presumably a typo, but to Stata it will be illegal
syntax.
>> 
>> 4. -capture replace b0= _b[_cons]- will end with the last intercept
>> calculated. I guess you don't want that.
>> 
>> 5. Checking for missing values is redundant as -regress- will never
>> include them.
>> 
>> With these and some other small tricks, here is an attempt at
>> rewriting your code.
>> 
>> local X wlag_ce wato wlag_acc wacc wdsale wndsale tokenize "`X'"
>> 
>> forval j = 0/6 {
>> gen b`j'=.
>> }
>> 
>> forval i = 1/`=_N' {
>> local same sic_2[`i'] == sic_2 & datadate[`i'] == datadate qui count
>> if `same' & _n != `i'
>> 
>> if r(N) > 10 {
>> reg wce `X' if `same' & _n != `i'
>> }
>> 
>> quietly if _rc == 0 {
>> replace b0 = _b[_cons] in `i'
>> forval j = 1/6 {
>> replace b`j' = _b[``j''] in `i'
>> }
>> }
>> }
>> 
>> gen pred_ce= b0 + b1*wlag_ce + b2*wato + b3*wlag_acc + /// b4*wacc +
>> b5*wdsale + b6*wndsale
>> 
>> Nick
>> njcoxstata@gmail.com
>> 
>> 
>> On 13 December 2013 17:33, Abdalla, Ahmed <ahmed.abdalla@kcl.ac.uk>
wrote:
>>> Dear Statalist
>>> I run a regression to estimate core earnings for each variable in my
dataset. The regression is run using all observations in a particular
industry year EXCLUDING firm i. Expected core earnings for firm i is
estimated using the coefficients multiplied by the actual values of
variables in the model for firm i.
>>> I run the following code.
>>> 
>>> First: I get an error message for macro length being exceeded.
>>> Second: I try to use other commands for looping, the loop runs but it
gives me error message for invalid syntax.
>>> My problem is on how to exclude firm i ? I hope if you have any
suggestions regarding running regressions by industry and year and excluding
firm i from the estimation procedures.
>>> 
>>> 
>>> gen obs= [_n]
>>> gen runn=1
>>> 
>>> gen b0=.
>>> gen b1= .
>>> gen b2=.
>>> gen b3=.
>>> gen b4=.
>>> gen b5=.
>>> gen b6=.
>>> 
>>> levelsof obs,local(levels)
>>> foreach x of local levels{
>>> gen mark=1 if obs==runn
>>> gen sic_lp= sic_2 if obs ==runn
>>> qui summ sic_lp
>>> replace sic_lp = r(mean) if sic_lp==.
>>> gen datadate_lp= datadate if obs == runn qui summ datadate_lp
>>> replace datadate_lp = r(mean) if datadate_lp==.
>>> format datadate_lp %d
>>> gen sample =1 if sic_lp== sic_2 & datadate_lp== datadate & sale !=. &
wce !=. & wlag_ce=!=. & wato !=. & wacc !=. & wlag_acc!=. & wdsale !=. &
wndsale !=.
>>> egen sample_sum= sum(sample) if mark != 1 capture reg wce wlag_ce
>>> wato wlag_acc wacc wdsale wndsale if sample==1 & mark != 1 &
>>> sample_sum >10 capture replace b0= _b[_cons] capture replace b1=
>>> _b[wlag_ce] if obs==runn capture replace b2= _b[wato] if obs==runn
>>> capture replace b3= _b[wlag_acc] if obs==runn capture replace b4=
>>> _b[wacc] if obs==runn capture replace b5= _b[wdsale] if obs==runn
>>> capture replace b6= _b[wndsale] if obs==runn drop mark sic_lp
>>> datadate_lp sample sample_sum replace runn= runn+1 }
>>> 
>>> gen pred_ce= b0+ b1*wlag_ce + b2*wato +b3*wlag_acc + b4*wacc +
>>> b5*wdsale + b6*wndsale
>>> 
>>> 
>>> I appreciate your help
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>> *   http://www.ats.ucla.edu/stat/stata/
>> 
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
>> 
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index