Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.

# st: IVREG2 vs. REG/CLUSTER2 – Difference in number of observations reported in regressions

 From "Hofbaur, Ulrich" To "statalist@hsphsun2.harvard.edu" Subject st: IVREG2 vs. REG/CLUSTER2 – Difference in number of observations reported in regressions Date Mon, 10 Sep 2012 12:50:49 +0000

Hi everybody,

I am using the "ivreg2"-command to run an OLS-regression model and simultaneously allow for 2-way clustering in the SE-terms. So, the command is simply “ivreg2 y x, cluster(cs_id ts_id)”. As it turns out STATA drops some observations (about 30 percent of the total sample; it keeps 1806 instead of 2618 obs.) when conducting the regression although the information is available. I have also tried the related "cluster2"-command and the ordinary "reg"-command. However, these commands are using the full set of observations. Does anyone know why this difference in number of observations reported in the regressions shows up?

Help highly appreciated!

Best,
Ulrich

The code is given below.

---------------------------------------------
. reg car_m1_1 dv_chng

Source |       SS       df       MS              Number of obs =    2618
-------------+------------------------------           F(  1,  2616) =   69.36
Model |  .359573378     1  .359573378           Prob > F      =  0.0000
Residual |  13.5615124  2616  .005184064           R-squared     =  0.0258
Total |  13.9210858  2617  .005319483           Root MSE      =    .072

------------------------------------------------------------------------------
car_m1_1 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
dv_chng |   .0602397   .0072331     8.33   0.000     .0460565    .0744228
_cons |  -.0057141    .003311    -1.73   0.084    -.0122064    .0007783
------------------------------------------------------------------------------

. ivreg2 car_m1_1 dv_chng

OLS estimation
--------------

Estimates efficient for homoskedasticity only
Statistics consistent for homoskedasticity only

Number of obs =     1806
F(  1,  1804) =    35.82
Prob > F      =   0.0000
Total (centered) SS     =  10.15831896                Centered R2   =   0.0195
Total (uncentered) SS   =  12.13900806                Uncentered R2 =   0.1795
Residual SS             =  9.960554106                Root MSE      =   .07426

------------------------------------------------------------------------------
car_m1_1 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
dv_chng |   .0536313   .0089563     5.99   0.000     .0360774    .0711853
_cons |  -.0098285   .0042637    -2.31   0.021    -.0181852   -.0014719
------------------------------------------------------------------------------
Included instruments: dv_chng
------------------------------------------------------------------------------

. ivreg2 car_m1_1 dv_chng, cluster(cs_id ts_id)

OLS estimation
--------------

Estimates efficient for homoskedasticity only
Statistics robust to heteroskedasticity and clustering on cs_id and ts_id

Number of clusters (cs_id) =      1149                Number of obs =     1806
Number of clusters (ts_id) =        44                F(  1,    43) =    24.80
Prob > F      =   0.0000
Total (centered) SS     =  10.15831896                Centered R2   =   0.0195
Total (uncentered) SS   =  12.13900806                Uncentered R2 =   0.1795
Residual SS             =  9.960554106                Root MSE      =   .07426

------------------------------------------------------------------------------
|               Robust
car_m1_1 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
dv_chng |   .0536313   .0106425     5.04   0.000     .0327725    .0744902
_cons |  -.0098285   .0052442    -1.87   0.061     -.020107    .0004499
------------------------------------------------------------------------------
Included instruments: dv_chng
------------------------------------------------------------------------------

. cluster2 car_m1_1 dv_chng, fcluster(cs_id) tcluster(ts_id)

Linear regression with 2D clustered SEs                Number of obs =    2618
F(  1,  2499) =   64.84
Prob > F      =  0.0000
Number of clusters (cs_id) =   1568                    R-squared     =  0.0258
Number of clusters (ts_id) =     51                    Root MSE      =  0.0720
------------------------------------------------------------------------------
car_m1_1 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
dv_chng |   .0602397   .0091375     6.59   0.000     .0423219    .0781575
_cons |  -.0057141   .0040381    -1.42   0.157    -.0136324    .0022042
------------------------------------------------------------------------------

SE clustered by cs_id and ts_id (multiple obs per cs_id-ts_id)

. count if car_m1_1!=. & dv_chng!=. &cs_id!=. & ts_id!=.
2618

. distinct cs_id

|        Observations
Variable |      total   distinct
--------------+----------------------
cs_id |       2618       1568

. distinct  ts_id

|        Observations
Variable |      total   distinct
--------------+----------------------
ts_id |       2618         51

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/