Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: AW: How to Reconcile R2 with Economic Significance


From   Joseph McDonnell <jockmcdock@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: AW: How to Reconcile R2 with Economic Significance
Date   Fri, 31 Jul 2009 10:38:01 +0930

Have to disagree with Martin here. I'm assuming you used standardised
variables in the regression. Standardised variables can be a bit
tricky. here's a little simulation I did.


. clear

* a Q&D simulation
. set seed 123456

* suppose we want to examine the effect of sex and eating carrots on a
particular outcome

. set obs 1000

. gen sex=(_n<=500) // sex is pretty evenly distributed
. gen osex=sex // keep the original sex because we're going to standardise it
. summ sex
. replace sex=(sex-r(mean))/r(sd)

. bysort osex: gen carrot=(_n<=50) // eating carrots is relatively rare
. gen ocarrot=carrot
. summ carrot
. replace carrot=(carrot-r(mean))/r(sd)

* let's suppose that after standardisation, sex and carrots have
exactly the same effect
. gen y=2+1*sex+1*carrot+5*(runiform()-0.5)
. regress y sex carrot
. predict yhat, xb

. list osex sex ocarrot carrot yhat if inlist(_n,500,1,1000,501)

With this code, I get the following


      Source |       SS       df       MS              Number of obs =    1000
-------------+------------------------------           F(  2,   997) =  539.52
       Model |  2216.14901     2  1108.07451           Prob > F      =  0.0000
    Residual |  2047.66704   997  2.05382853           R-squared     =  0.5198
-------------+------------------------------           Adj R-squared =  0.5188
       Total |  4263.81606   999  4.26808414           Root MSE      =  1.4331

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         sex |    1.01654   .0453419    22.42   0.000     .9275632    1.105516
      carrot |   1.088584   .0453419    24.01   0.000     .9996074     1.17756
       _cons |   1.982797   .0453192    43.75   0.000     1.893865    2.071729
------------------------------------------------------------------------------

      +---------------------------------------------------+
      | osex         sex   ocarrot      carrot       yhat |
      |---------------------------------------------------|
   1. |    0   -.9994999         1      2.9985   4.230884 |
 500. |    0   -.9994999         0   -.3331666   .6040861 |
 501. |    1    .9994999         1      2.9985   6.262946 |
1000. |    1    .9994999         0   -.3331666   2.636148 |
      +---------------------------------------------------+

The regression coefficients and t-values are pretty similar, but the
if you compare the a 1 SD change in the variables, the effects are
very different. Comparing rows 1 and 500 (a change in carrots), we see
a change of around 3.6. Comparing rows 1 and 501 (a change in sex) we
see a difference of around 2.

If we then replace 50 with 250 in

. bysort osex: gen carrot=(_n<=50) // eating carrots is relatively rare

we get


      Source |       SS       df       MS              Number of obs =    1000
-------------+------------------------------           F(  2,   997) =  475.93
       Model |  1961.23173     2  980.615863           Prob > F      =  0.0000
    Residual |  2054.23188   997  2.06041312           R-squared     =  0.4884
-------------+------------------------------           Adj R-squared =  0.4874
       Total |  4015.46361   999  4.01948309           Root MSE      =  1.4354

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         sex |    1.01654   .0454145    22.38   0.000     .9274206    1.105659
      carrot |   .9642833   .0454145    21.23   0.000     .8751643    1.053402
       _cons |   1.982797   .0453918    43.68   0.000     1.893723    2.071871
------------------------------------------------------------------------------


      +---------------------------------------------------+
      | osex         sex   ocarrot      carrot       yhat |
      |---------------------------------------------------|
   1. |    0   -.9994999         1    .9994999   1.930567 |
 500. |    0   -.9994999         0   -.9994999   .0029649 |
 501. |    1    .9994999         1    .9994999   3.962629 |
1000. |    1    .9994999         0   -.9994999   2.035027 |
      +---------------------------------------------------+

Again, the regression coefficients are pretty much what we would
expect but now a 1 SD change in either variable leads to a change of
around 2. Distribution is important.

Cheers

Joseph

On Thu, Jul 30, 2009 at 4:44 AM, Martin Weiss<martin.weiss1@gmx.de> wrote:
>
> <>
>
> What you are describing could mean either of two things: The underlying
> economic theory is wrong and should be replaced by one supported by the
> data. Or you are unlucky and have picked a very special dataset that is not
> representative of the population. You have to make this pick yourself, I am
> afraid...
>
> HTH
> Martin
>
> -----Ursprüngliche Nachricht-----
> Von: owner-statalist@hsphsun2.harvard.edu
> [mailto:owner-statalist@hsphsun2.harvard.edu] Im Auftrag von Erasmo Giambona
> Gesendet: Mittwoch, 29. Juli 2009 10:09
> An: statalist
> Betreff: st: How to Reconcile R2 with Economic Significance
>
> Dear Statalist,
>
> I am trying to understand how to reconcile statistical and economc
> significance.
>
> Consider a simple model: y = a + b1x1 + b2x2 +e, fitted for panel data
> and estimated via OLS. Suppose the t-values are respectively 10 and 2
> for x1 and x2, implying that x1 contributes more to the R2 for the
> model. Suppose also that a 1 standard deviation increase in x1 cause y
> to increase by 2% from its mean while a 1 standard deviation increase
> in x2 causes y to increase by 25% from its mean. Now, a simple
> interpretation of a model R2 is that it is a proportion in the
> variability of y that is accounted for by the model. Accordingly,
> because of its t-value (and its effect on the R2), x1 would seem to be
> one of the key drivers of this variabillity in y. However, from an
> economic point of view, x1 seems to have a very marginal abillity in
> explaining this variation in y (while x2 seems to be very important).
>
> Statistical and economic significance would seem to lead to seemingly
> "contradicting" results. Can someone provide some suggestions that
> could help me reconciling statistical and economic significance?
>
> Thanks,
>
> Erasmo
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index