Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

Steven Samuels <sjsamuels@gmail.com>

statalist@hsphsun2.harvard.edu

Re: st: trying to compare means and using xi and xi3 for survey data

Tue, 5 Jul 2011 09:21:32 -0500

"Is this interpretation accurate?" Yes Steve sjsamuels@gmail.com On Jul 5, 2011, at 6:44 AM, Hitesh Chandwani wrote: Steven, I used the following commands: . char insured_pub_pvt_un[omit]2 . xi: svy: regress totchg_num i.insured_pub_pvt_un And got the following output: i.insured_pub~n _Iinsured_p_0-4 (naturally coded; _Iinsured_p_2 omitted) (running regress on estimation sample) Survey: Linear regression Number of strata = 75 Number of obs = 103817 Number of PSUs = 966 Population size = 469088.57 Design df = 891 F( 3, 889) = . Prob > F = . R-squared = 0.0106 ------------------------------------------------------------------------------ | Linearized totchg_num | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- _Iinsured_~0 | (dropped) _Iinsured_~1 | 6504.334 915.0348 7.11 0.000 4708.46 8300.209 _Iinsured_~3 | -3015.988 705.0121 -4.28 0.000 -4399.666 -1632.31 _Iinsured_~4 | 1070.352 1961.327 0.55 0.585 -2779.007 4919.711 _cons | 13894.47 837.4082 16.59 0.000 12250.95 15538 ------------------------------------------------------------------------------ I think the fact that the "0" group was dropped again has something to do with the fact that all observations in this group have pweights set to zero. The way I interpret the output is that the coefficients are the differences in mean between the omitted group (group 2) and the other groups (1, 3, and 4, respectively) with the corresponding t-statistic values being a comparison of means with the omitted group. Is this interpretation accurate? 

Regards, Hitesh On Tue, Jul 5, 2011 at 7:30 AM, Hitesh Chandwani wrote: 
Hi Steven, 

There is no evident coding error that I can see. If I use the 
-,noomit- option, how do I interpret the results? The coefficients are 
clearly the means, but what do the t-values indicate? 

xi, noomit: svy: reg totchg_num i.insured_pub_pvt_un , nocons 
(running regress on estimation sample) 

Survey: Linear regression 

Number of strata = 75 Number of obs = 103817 
Number of PSUs = 966 Population size = 469088.57 
Design df = 891 
F( 4, 888) = . 
Prob > F = . 
R-squared = 0.1513 

------------------------------------------------------------------------------ 
| Linearized 
totchg_num | Coef. Std. Err. t P>|t| [95% Conf. Interval] 
-------------+---------------------------------------------------------------- 
_Iinsured_~0 | (dropped) 
_Iinsured_~1 | 20398.81 1171.304 17.42 0.000 18099.97 22697.64 
_Iinsured_~2 | 13894.47 837.4082 16.59 0.000 12250.95 15538 
_Iinsured_~3 | 10878.49 844.9702 12.87 0.000 9220.121 12536.85 
_Iinsured_~4 | 14964.83 1801.761 8.31 0.000 11428.64 18501.02 
------------------------------------------------------------------------------ 

Regards, 
Hitesh 


On Tue, Jul 5, 2011 at 12:34 AM, Steven Samuels wrote: 

I suspect a coding error. 

Suppose insure_cat is your original insurance variable. Have you looked at 

******************************* 
bys insure_cat: sum totchg_num 

***************************** 
Have you tabulated each insurance indicator against insure_cat? 

In any case, direct survey approaches are: 
************************ 
svy: mean totchg_num, over(insure_cat) 
xi, noomit: svy: reg totch_num i.insure_cat, nocons //pre-Stata 11 
svy: reg totch_num ibn.insure_cat, nocons //Stata 11 + 
************************ 


Steve 


Steven J. Samuels 
Consultant in Statistics 
18 Cantine's Island 
Saugerties, NY 12477 USA 
Voice: 845-246-0774 
Fax: 206-202-4783 
sjsamuels@gmail.com 

On Jul 4, 2011, at 5:02 PM, Hitesh Chandwani wrote: 

Hello Statalisters, 

I am using cost survey data and have 2 questions: 

1) Comparison of means 

Using the svy: mean procedure, I can get means of cost for all 
categories of a particular variable. But since this variable is not 
dichotomous, using -test- or -lincom- as a postestimation command to 
compare the means, doesn't yield any results. What I thought of was 
dummy coding the categories and then running a regression. Instead of 
manually creating dummy variables, I decided to use -xi-; which brings 
me to my next question, 

2) -xi- and -xi3- will both omit one category as a reference 
category..which is fine. But, in my output, after omitting the first 
category, another category is indicated as (dropped). Moreover, there 
is still no value for the F-statistic. 

Firstly, is my approach correct? And secondly, why are 2 categories 
being dropped? 

(One explanation that I could come up with for the 2 dropped 
categories is that the pweight for the observations in the omitted 
category " _Iinsured_p_0" is set to zero and hence Stata needs to use 
another category as reference) 

The following is my syntax as well as output: 


xi: svy: regress totchg_num i.insured_pub_pvt_un 
i.insured_pub~n _Iinsured_p_0-4 (naturally coded; _Iinsured_p_0 omitted) 
(running regress on estimation sample) 

Survey: Linear regression 

Number of strata = 75 Number of obs = 103817 
Number of PSUs = 966 Population size = 469088.57 
Design df = 891 
F( 3, 889) = . 
Prob > F = . 
R-squared = 0.0106 

------------------------------------------------------------------------------ 
| Linearized 
totchg_num | Coef. Std. Err. t P>|t| [95% Conf. Interval] 
-------------+---------------------------------------------------------------- 
_Iinsured_~1 | 6504.334 915.0348 7.11 0.000 4708.46 8300.209 
_Iinsured_~2 | (dropped) 
_Iinsured_~3 | -3015.988 705.0121 -4.28 0.000 -4399.666 -1632.31 
_Iinsured_~4 | 1070.352 1961.327 0.55 0.585 -2779.007 4919.711 
_cons | 13894.47 837.4082 16.59 0.000 12250.95 15538 
------------------------------------------------------------------------------ 

. test _Iinsured_p_1 _Iinsured_p_2 _Iinsured_p_3 _Iinsured_p_4 

Adjusted Wald test 

( 1) _Iinsured_p_1 = 0 
( 2) _Iinsured_p_2 = 0 
( 3) _Iinsured_p_3 = 0 
( 4) _Iinsured_p_4 = 0 
Constraint 2 dropped 

F( 3, 889) = 23.78 
Prob > F = 0.0000 

Any help in understanding this issue will be greatly appreciated. 

Regards, 
-- 
Hitesh S. Chandwani 
University of Texas at Austin

