Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Matsize and Estimation of the Variance Matrix in a Regression


From   [email protected] (Jeff Pitblado, StataCorp LP)
To   [email protected]
Subject   Re: st: Matsize and Estimation of the Variance Matrix in a Regression
Date   Thu, 05 Sep 2013 09:54:10 -0500

Alex MacKay <[email protected]> is using -areg- with -vce(cluster ...)- and
is getting a missing value for the model F statistic, inconsistent values
for the model degrees of freedom, and some missing standard errors:

> I have run into an issue that when I increase the matsize, it can
> cause a regression that previously ran with no warnings to return:
> "Warning: variance matrix is nonsymmetric or highly singular."
> 
> It estimates the exact same coefficients across the board. I've put
> the log for the first coefficient below. Notice the Warning in advance
> of the output. With the larger matsize (10000), it does not estimate
> standard errors, and the model degrees of freedom are zero.
> 
> I am using the areg command to absorb the variable product_id. Is it
> possible that Stata is trying to generate a number of fixed effects
> that exceed 800, the original matsize, and decides to drop the
> product_id dummy variables? This may allow it to estimate standard
> errors. If so, I think it should be reported as a bug.

Alex later gave us a count on the number of levels for the factor variables
participating in the model:

> The levels are: 141 (week), 73 (retailer_id),  24 (state_id), 25
> (product), and 46 (clusterID), for a total of 309. ...

There appear to be 46 clusters and well over 200 estimable regression
coefficients beyond the absorbed 25 levels of the 'product' variable.

The rank of a cluster robust VCE cannot exceed the number of clusters.
If the number of estimable parameters (those not marked as omitted because of
collinearity) exceeds the rank, then the model F test will be missing and some
of the robust standard errors can be missing too.

However, the differing output when changing c(matsize) is troublesome.  If
Alex can share the data and a do-file reproducing the problem, I invite Alex
to contact me privately and we will take a closer look at this.

--Jeff
[email protected]

> (Note: I'm reposting in a way that may more clearly identify the
> issues, now that I am familiar with replying).
> 
> 
> //Matsize = 10000
> 
> 
> note: 2599.week omitted because of collinearity
> note: 597.retailer_id omitted because of collinearity
> note: 866.retailer_id omitted because of collinearity
> note: 877.retailer_id omitted because of collinearity
> note: 9101.retailer_id omitted because of collinearity
> note: 54.state_id omitted because of collinearity
> Warning:  variance matrix is nonsymmetric or highly singular
> note: 3997.retailer_id omitted because of collinearity
> note: 4955.retailer_id omitted because of collinearity
> note: 7005.retailer_id omitted because of collinearity
> note: 7599.retailer_id omitted because of collinearity
> 
> Linear regression, absorbing indicators           Number of obs   =        597
> 
>                                                   F(   0,     45) =          .
>                                                   Prob > F        =          .
>                                                   R-squared       =     0.9256
>                                                   Adj R-squared   =     0.8695
>                                                   Root MSE        =     0.2950
> 
>                       (Std. Err. adjusted for 46 clusters in  clusterID)
> ------------------------------------------------------------------------------
>              |               Robust
>     ln_price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
> -------------+----------------------------------------------------------------
>          treatment |  -4.044072          .        .       .
> .           .
> 
> 
> 
> //Matsize == 800
> 
> note: 2599.week omitted because of collinearity
> note: 597.retailer_id omitted because of collinearity
> note: 866.retailer_id omitted because of collinearity
> note: 877.retailer_id omitted because of collinearity
> note: 9101.retailer_id omitted because of collinearity
> note: 54.fips omitted because of collinearity
> note: 3997.retailer_id omitted because of collinearity
> note: 4955.retailer_id omitted because of collinearity
> note: 7005.retailer_id omitted because of collinearity
> note: 7599.retailer_id omitted because of collinearity
> 
> Linear regression, absorbing indicators           Number of obs   =        597
> 
>                                                   F(  49,     45) =          .
>                                                   Prob > F        =          .
>                                                   R-squared       =     0.9256
>                                                   Adj R-squared   =     0.8695
>                                                   Root MSE        =     0.3085
> 
>                       (Std. Err. adjusted for 46 clusters in clusterID)
> ------------------------------------------------------------------------------
>              |               Robust
>     ln_price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
> -------------+----------------------------------------------------------------
>          treatment |  -4.044072   3.152507    -1.28   0.206
> -10.39355    2.305404
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index