Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: Why do I get two different results from the same specification and the same dataset?
From 
 
Nick Cox <[email protected]> 
To 
 
"'[email protected]'" <[email protected]> 
Subject 
 
RE: st: Why do I get two different results from the same specification and the same dataset? 
Date 
 
Sun, 6 Nov 2011 14:28:05 +0000 
The same point can be made differently. Yuval's story depends on (a) data that we cannot see and (b) in part on code that is not shown to us. There can be no objection to people working with their own data, naturally, but it doesn't make remote analysis any easier. However, any bug in Stata if it exists can be demonstrated with data accessible to all. More crucially, we have no way of checking Yuval's own creation of interactions. 
So, the best hypothesis on this evidence is that Yuval did something differently in the code not shown to us. 
Nick 
[email protected] 
Joerg Luedicke
You must have made a mistake when creating your interaction terms
"directly". I cannot think of any other explanation.
On Sun, Nov 6, 2011 at 7:24 AM, Yuval Arbel <[email protected]> wrote:
> when I run the following regression
>
> reg bid_win dev_cost bid_num year area units min min_price
> c.dev_cost#i.min c.bid_num#i.min c.year#i.min c.area#i.min
> c.units#i.min
>
> I get the following output:
>
>
>      Source |       SS       df       MS              Number of obs =    6802
> -------------+------------------------------           F( 12,  6789) = 2891.19
>       Model |  7.0107e+17    12  5.8423e+16           Prob > F      =  0.0000
>    Residual |  1.3719e+17  6789  2.0207e+13           R-squared     =  0.8363
> -------------+------------------------------           Adj R-squared =  0.8361
>       Total |  8.3826e+17  6801  1.2326e+14           Root MSE      =  4.5e+06
>
> ------------------------------------------------------------------------------
>     bid_win |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
> -------------+----------------------------------------------------------------
>    dev_cost |  -.0782451   .1319286    -0.59   0.553    -.3368666    .1803764
>     bid_num |   53637.98   18087.12     2.97   0.003     18181.54    89094.41
>        year |   61991.65   44544.85     1.39   0.164    -25330.21    149313.5
>        area |   204.3705   105.0742     1.95   0.052    -1.607834    410.3488
>       units |   52691.04   11756.39     4.48   0.000     29644.82    75737.26
>         min |   1.04e+08   9.74e+07     1.06   0.288    -8.73e+07    2.95e+08
>   min_price |   3.956053   .0241168   164.04   0.000     3.908777     4.00333
>             |
>         min#|
>  c.dev_cost |
>          1  |   -.460682   .1391518    -3.31   0.001    -.7334631    -.187901
>             |
>         min#|
>   c.bid_num |
>          1  |  -43194.68   19552.09    -2.21   0.027     -81522.9   -4866.457
>             |
>  min#c.year |
>          1  |  -51639.35   48543.57    -1.06   0.287      -146800    43521.27
>             |
>  min#c.area |
>          1  |   186.6343   110.1857     1.69   0.090    -29.36416    402.6327
>             |
>  min#c.units |
>          1  |  -128569.4   12642.23   -10.17   0.000    -153352.2   -103786.7
>             |
>       _cons |  -1.25e+08   8.94e+07    -1.39   0.163    -3.00e+08    5.05e+07
> ------------------------------------------------------------------------------
>
>
> But when I define directly the interaction variables, and run the
> regression, I get different outcomes:
>
> . reg bid_win dev_cost bid_num year area units min min_price
> dev_cost_int bid_num_int year_int area_int units_int
>
>      Source |       SS       df       MS              Number of obs =    6802
> -------------+------------------------------           F( 12,  6789) = 2840.90
>       Model |  6.9905e+17    12  5.8254e+16           Prob > F      =  0.0000
>    Residual |  1.3921e+17  6789  2.0505e+13           R-squared     =  0.8339
> -------------+------------------------------           Adj R-squared =  0.8336
>       Total |  8.3826e+17  6801  1.2326e+14           Root MSE      =  4.5e+06
>
> ------------------------------------------------------------------------------
>     bid_win |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
> -------------+----------------------------------------------------------------
>    dev_cost |   .3458744   .1259233     2.75   0.006     .0990254    .5927235
>     bid_num |   50612.77   18218.04     2.78   0.005     14899.71    86325.84
>        year |    26138.3   44731.32     0.58   0.559     -61549.1    113825.7
>        area |    796.392   88.98841     8.95   0.000     621.9468    970.8371
>       units |  -56322.78   4522.886   -12.45   0.000    -65189.06   -47456.51
>         min |   2.11e+07   9.78e+07     0.22   0.829    -1.71e+08    2.13e+08
>   min_price |   3.914549   .0241269   162.25   0.000     3.867252    3.961845
> dev_cost_int |  -.9575921   .1316807    -7.27   0.000    -1.215728   -.6994567
>  bid_num_int |  -40191.13   19694.51    -2.04   0.041    -78798.54   -1583.728
>    year_int |  -10450.87   48755.43    -0.21   0.830    -106026.8    85125.05
>    area_int |  -443.5883   91.15801    -4.87   0.000    -622.2866     -264.89
>   units_int |  -.2338972    .131622    -1.78   0.076    -.4919176    .0241233
>       _cons |  -5.29e+07   8.98e+07    -0.59   0.556    -2.29e+08    1.23e+08
> ------------------------------------------------------------------------------
>
> My question is why do I get two different results from the same specification?
> Just to exemplify: note that the coefficient of "dev_cost" has
> modified signs and became significant
>
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/