Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: trying to compare means and using xi and xi3 for survey data

From	Austin Nichols <[email protected]>
To	[email protected]
Subject	Re: st: trying to compare means and using xi and xi3 for survey data
Date	Tue, 5 Jul 2011 08:38:57 -0400

Hitesh Chandwani <[email protected]>:

I doubt there is a coding error, and your interpretation is
correct--but this all may be clearer if you use an example everyone
can share:

webuse nhanes2, clear
svy:mean height, over(race)
svy:reg height black orace
xi:svy:reg height i.race
replace finalwgt=0 if orace
svy:reg height black orace
xi:svy:reg height i.race


On Tue, Jul 5, 2011 at 7:44 AM, Hitesh Chandwani
<[email protected]> wrote:
> Steven,
>
> I used the following commands:
>
> . char insured_pub_pvt_un[omit]2
>
> . xi: svy: regress totchg_num i.insured_pub_pvt_un
>
>
> And got the following output:
>
> i.insured_pub~n   _Iinsured_p_0-4     (naturally coded; _Iinsured_p_2 omitted)
> (running regress on estimation sample)
>
> Survey: Linear regression
>
> Number of strata   =        75                  Number of obs      =    103817
> Number of PSUs     =       966                  Population size    = 469088.57
>                                                Design df          =       891
>                                                F(   3,    889)    =         .
>                                                Prob > F           =         .
>                                                R-squared          =    0.0106
>
> ------------------------------------------------------------------------------
>             |             Linearized
>  totchg_num |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
> -------------+----------------------------------------------------------------
> _Iinsured_~0 |  (dropped)
> _Iinsured_~1 |   6504.334   915.0348     7.11   0.000      4708.46    8300.209
> _Iinsured_~3 |  -3015.988   705.0121    -4.28   0.000    -4399.666    -1632.31
> _Iinsured_~4 |   1070.352   1961.327     0.55   0.585    -2779.007    4919.711
>       _cons |   13894.47   837.4082    16.59   0.000     12250.95       15538
> ------------------------------------------------------------------------------
>
> I think the fact that the "0" group was dropped again has something to
> do with the fact that all observations in this group have pweights set
> to zero. The way I interpret the output is that the coefficients are
> the differences in mean between the omitted group (group 2) and the
> other groups (1, 3, and 4, respectively) with the corresponding
> t-statistic values being a comparison of means with the omitted group.
>
> Is this interpretation accurate?
>
> Regards,
> Hitesh
>
>
>
>
> On Tue, Jul 5, 2011 at 7:30 AM, Hitesh Chandwani
> <[email protected]> wrote:
>> Hi Steven,
>>
>> There is no evident coding error that I can see. If I use the
>> -,noomit- option, how do I interpret the results? The coefficients are
>> clearly the means, but what do the t-values indicate?
>>
>> xi, noomit: svy: reg totchg_num i.insured_pub_pvt_un , nocons
>> (running regress on estimation sample)
>>
>> Survey: Linear regression
>>
>> Number of strata   =        75                  Number of obs      =    103817
>> Number of PSUs     =       966                  Population size    = 469088.57
>>                                                Design df          =       891
>>                                                F(   4,    888)    =         .
>>                                                Prob > F           =         .
>>                                                R-squared          =    0.1513
>>
>> ------------------------------------------------------------------------------
>>             |             Linearized
>>  totchg_num |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
>> -------------+----------------------------------------------------------------
>> _Iinsured_~0 |  (dropped)
>> _Iinsured_~1 |   20398.81   1171.304    17.42   0.000     18099.97    22697.64
>> _Iinsured_~2 |   13894.47   837.4082    16.59   0.000     12250.95       15538
>> _Iinsured_~3 |   10878.49   844.9702    12.87   0.000     9220.121    12536.85
>> _Iinsured_~4 |   14964.83   1801.761     8.31   0.000     11428.64    18501.02
>> ------------------------------------------------------------------------------
>>
>> Regards,
>> Hitesh
>>
>>
>> On Tue, Jul 5, 2011 at 12:34 AM, Steven Samuels <[email protected]> wrote:
>>>
>>> I suspect a coding error.
>>>
>>> Suppose insure_cat is your original insurance variable.  Have you looked at
>>>
>>> *******************************
>>> bys insure_cat: sum totchg_num
>>>
>>> *****************************
>>> Have you tabulated each insurance indicator against insure_cat?
>>>
>>> In any case,  direct survey approaches are:
>>> ************************
>>> svy: mean totchg_num, over(insure_cat)
>>> xi, noomit: svy: reg totch_num i.insure_cat, nocons  //pre-Stata 11
>>> svy:  reg totch_num ibn.insure_cat, nocons   //Stata 11 +
>>> ************************
>>>
>>>
>>> Steve
>>>
>>>
>>> Steven J. Samuels
>>> Consultant in Statistics
>>> 18 Cantine's Island
>>> Saugerties, NY 12477 USA
>>> Voice: 845-246-0774
>>> Fax:   206-202-4783
>>> [email protected]
>>>
>>> On Jul 4, 2011, at 5:02 PM, Hitesh Chandwani wrote:
>>>
>>> Hello Statalisters,
>>>
>>> I am using cost survey data and have 2 questions:
>>>
>>> 1) Comparison of means
>>>
>>> Using the svy: mean procedure, I can get means of cost for all
>>> categories of a particular variable. But since this variable is not
>>> dichotomous, using -test- or -lincom- as a postestimation command to
>>> compare the means, doesn't yield any results. What I thought of was
>>> dummy coding the categories and then running a regression. Instead of
>>> manually creating dummy variables, I decided to use -xi-; which brings
>>> me to my next question,
>>>
>>> 2) -xi- and -xi3- will both omit one category as a reference
>>> category..which is fine. But, in my output, after omitting the first
>>> category, another category is indicated as (dropped). Moreover, there
>>> is still no value for the F-statistic.
>>>
>>> Firstly, is my approach correct? And secondly, why are 2 categories
>>> being dropped?
>>>
>>> (One explanation that I could come up with for the 2 dropped
>>> categories is that the pweight for the observations in the omitted
>>> category " _Iinsured_p_0" is set to zero and hence Stata needs to use
>>> another category as reference)
>>>
>>> The following is my syntax as well as output:
>>>
>>>
>>> xi: svy: regress totchg_num i.insured_pub_pvt_un
>>> i.insured_pub~n   _Iinsured_p_0-4     (naturally coded; _Iinsured_p_0 omitted)
>>> (running regress on estimation sample)
>>>
>>> Survey: Linear regression
>>>
>>> Number of strata   =        75                  Number of obs      =    103817
>>> Number of PSUs     =       966                  Population size    = 469088.57
>>>                                               Design df          =       891
>>>                                               F(   3,    889)    =         .
>>>                                               Prob > F           =         .
>>>                                               R-squared          =    0.0106
>>>
>>> ------------------------------------------------------------------------------
>>>            |             Linearized
>>>  totchg_num |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
>>> -------------+----------------------------------------------------------------
>>> _Iinsured_~1 |   6504.334   915.0348     7.11   0.000      4708.46    8300.209
>>> _Iinsured_~2 |  (dropped)
>>> _Iinsured_~3 |  -3015.988   705.0121    -4.28   0.000    -4399.666    -1632.31
>>> _Iinsured_~4 |   1070.352   1961.327     0.55   0.585    -2779.007    4919.711
>>>      _cons |   13894.47   837.4082    16.59   0.000     12250.95       15538
>>> ------------------------------------------------------------------------------
>>>
>>> . test _Iinsured_p_1 _Iinsured_p_2 _Iinsured_p_3 _Iinsured_p_4
>>>
>>> Adjusted Wald test
>>>
>>> ( 1)  _Iinsured_p_1 = 0
>>> ( 2)  _Iinsured_p_2 = 0
>>> ( 3)  _Iinsured_p_3 = 0
>>> ( 4)  _Iinsured_p_4 = 0
>>>      Constraint 2 dropped
>>>
>>>      F(  3,   889) =   23.78
>>>           Prob > F =    0.0000
>>>
>>> Any help in understanding this issue will be greatly appreciated.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: trying to compare means and using xi and xi3 for survey data
  - From: Hitesh Chandwani <[email protected]>
- Re: st: trying to compare means and using xi and xi3 for survey data
  - From: Steven Samuels <[email protected]>
- Re: st: trying to compare means and using xi and xi3 for survey data
  - From: Hitesh Chandwani <[email protected]>
- Re: st: trying to compare means and using xi and xi3 for survey data
  - From: Hitesh Chandwani <[email protected]>

Prev by Date: Re: st: change range of axis in bar graph (start counting from 1, not from 0)
Next by Date: Re: st: PCA with Missing Values (or other factor analysis)
Previous by thread: Re: st: trying to compare means and using xi and xi3 for survey data
Next by thread: Re: st: trying to compare means and using xi and xi3 for survey data
Index(es):
- Date
- Thread