Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Multicollinearity Problem in Stata


From   DE SOUZA Eric <eric.de_souza@coleurope.eu>
To   "'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu>
Subject   RE: st: Multicollinearity Problem in Stata
Date   Wed, 31 Jul 2013 10:53:54 +0200

If you subtract  8.080053 from 1.929168  (regression without a constant) you get -6.150886  (coefficient of r_ow in the regression with a constant.


Eric de Souza 
College of Europe 
Brugge (Bruges), Belgium 
http://www.coleurope.eu



-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of FU Youyan
Sent: 30 July 2013 16:41
To: statalist@hsphsun2.harvard.edu
Subject: RE: st: Multicollinearity Problem in Stata

I have double checked my data. I am sure that r_ew+r_ow equals a constant (0.2407656). The coefficients of  lnnc  and n1_ln in these two regressions are same indeed, but the coefficients of  r_ow dose change. We can also see that the t-value and p-value of the constant in the first regression are exactly same as the t-value and p-value of r_ew in the second regression, which is consistent with your explanation about the omitted variable. Therefore, I am still confused about the coefficient change of r_ow and wonder which result is more reliable.


________________________________________
From: owner-statalist@hsphsun2.harvard.edu [owner-statalist@hsphsun2.harvard.edu] On Behalf Of Yuval Arbel [yuval.arbel@gmail.com]
Sent: 30 July 2013 14:04
To: statalist
Subject: Re: st: Multicollinearity Problem in Stata

See the example below on wage and gender. The fact that these variables are continous rather than dummies are irrelevant here. If indeed r_ew+r_ow equals a constant - the coefficients should be the same in both regressions

On Mon, Jul 29, 2013 at 1:14 PM, FU Youyan <s1150901@sms.ed.ac.uk> wrote:
> Dear Yuval,
>
> Thank you very much for this answer, it is quite helpful.  I have a followed up question:
> The r_ew and r_ow are two types of investment return in my research ( they are continuous variable rather than dummy), what I want to test is the impact of these two returns on investors' future behavior. In other words, I want to know how investors weight these two types of return. Therefore, I have to include both of the returns into my regression. In the regression with constant but omitting r_ew, the coefficient  of r_ow is significantly negative (t-value=-3.30). However,  in the regression without constant but including r_ew, the coefficient of r_ow is significantly positive (t-value=2.20). So, I would like to know which result is more reliable?
>
> Best wishes,
> Youyan
> ________________________________________
> From: owner-statalist@hsphsun2.harvard.edu 
> [owner-statalist@hsphsun2.harvard.edu] On Behalf Of Yuval Arbel 
> [yuval.arbel@gmail.com]
> Sent: 29 July 2013 17:58
> To: statalist
> Subject: Re: st: Multicollinearity Problem in Stata
>
> Dear FU,
>
> This outcome is not strange at all. I believe what you encountered is 
> known in econometrics as "the dummy variable trap":
>
> I believe that r_ew+r_ow=constant. Consequently - when you run the 
> model with a constant - you get a perfect colinearity with the 
> constant term. But when you omit the constant - the problem is solved.
>
> In fact you can make use of these two specifications. Consider the 
> following exercise. Lets say that w is the wage male=0 for female and
> 1 for male, and female=1 for female and 0 for male. if the average 
> wage is 1200 for male and 1000 for female - and you run the model 
> without the constant, you will get:
>
> w(hat)=1200*male+1000*female
>
> But if you omit male and use constant (in order to avoid the dummy 
> variable trap), you get
>
> w(hat)=1200-200*female
>
> The second specification is more common because it permits you to test 
> whether wage differences across gender are significant
>
> On Mon, Jul 29, 2013 at 9:10 AM, FU Youyan <s1150901@sms.ed.ac.uk> wrote:
>> Dear Statalist users,
>>
>> I am encountering a strange multicollinearity problem when I conduct regression using Stata. The problem is illustrated below. I will VERY appreciate if any of you can answer my question.
>>
>>
>> *********************************************************************
>> ********************************
>> note: r_ew omitted because of collinearity
>>
>> Linear regression                                      Number of obs =     159
>>                                                        F(  3,   155) =   73.74
>>                                                        Prob > F      =  0.0000
>>                                                        R-squared     =  0.4900
>>                                                        Root MSE      =  .88944
>>
>> ------------------------------------------------------------------------------
>>                  |                   Robust
>>        n2_ln  |      Coef.      Std. Err.          t    P>|t|     [95% Conf. Interval]
>> -------------+-------------------------------------------------------
>> -------------+---------
>>         r_ow |  -6.150886   1.861984    -3.30   0.001    -9.829026   -2.472746
>>         r_ew |          0       (omitted)
>>         lnnc |   .1853104   .0502188     3.69   0.000     .0861089    .2845119
>>        n1_ln |   .2328174   .0912362     2.55   0.012     .0525905    .4130443
>>        _cons |   1.945399   .5489629     3.54   0.001     .8609843    3.029813
>> ---------------------------------------------------------------------
>> ---------
>>
>> In the above regression table, r_ew is omitted due to the perfectly negative collinearity between r_ow and r_ew.
>>
>> (Correlation table is showed below). The relationship between these two variables is r_ow+r_ew=0.2407656,so there exists perfect collinearity.
>>
>>
>>              |       n2_ln     r_ow     r_ew       lnnc        n1_ln
>> -------------+---------------------------------------------
>>        n2_ln |   1.0000
>>        r_ow |  -0.6565   1.0000
>>        r_ew |   0.6565  -1.0000   1.0000
>>        lnnc |   0.4587    -0.4285   0.4285   1.0000
>>        n1_ln |   0.6419  -0.8468   0.8468   0.4103   1.0000
>>
>> However, the variable of r_ew is not omitted when I run the exactly same regression but without intercept.
>>
>>
>> Linear regression                                      Number of obs =     159
>>                                                        F(  4,   155) =  441.13
>>                                                        Prob > F      =  0.0000
>>                                                        R-squared     =  0.8909
>>                                                        Root MSE      =  .88944
>>
>> ------------------------------------------------------------------------------
>>              |                      Robust
>>        n2_ln |      Coef.      Std. Err.         t          P>|t|     [95% Conf. Interval]
>> -------------+-------------------------------------------------------
>> -------------+---------
>>         r_ow |   1.929168   .8763971     2.20   0.029     .1979442    3.660391
>>         r_ew |   8.080053   2.280073     3.54   0.001     3.576027    12.58408
>>         lnnc |   .1853104   .0502188     3.69   0.000     .0861089    .2845119
>>        n1_ln |   .2328174   .0912363     2.55   0.012     .0525905    .4130443
>> ---------------------------------------------------------------------
>> ---------
>>
>> My question is why Stata does not omit r_ew when intercept term is excluded? And whether the regression result without intercept is valid?
>>
>>
>> Thanks for your help.
>> Youyan
>>
>> --
>> The University of Edinburgh is a charitable body, registered in 
>> Scotland, with registration number SC005336.
>>
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
>
>
>
> --
> Dr. Yuval Arbel
> School of Business
> Carmel Academic Center
> 4 Shaar Palmer Street,
> Haifa 33031, Israel
> e-mail1: yuval.arbel@carmel.ac.il
> e-mail2: yuval.arbel@gmail.com
> You can access my latest paper on SSRN at:  
> http://ssrn.com/abstract=2263398 You can access previous papers on 
> SSRN at: http://ssrn.com/author=1313670
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
> --
> The University of Edinburgh is a charitable body, registered in 
> Scotland, with registration number SC005336.
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/



--
Dr. Yuval Arbel
School of Business
Carmel Academic Center
4 Shaar Palmer Street,
Haifa 33031, Israel
e-mail1: yuval.arbel@carmel.ac.il
e-mail2: yuval.arbel@gmail.com
You can access my latest paper on SSRN at:  http://ssrn.com/abstract=2263398 You can access previous papers on SSRN at: http://ssrn.com/author=1313670

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
--
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index