Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: logistic regression complex samples


From   Steve Samuels <sjsamuels@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: logistic regression complex samples
Date   Thu, 8 Dec 2011 09:26:16 -0500

Correction, the command for getting -svylogitgof- is "findit svylogitgof" 

Steve

On Dec 7, 2011, at 10:43 PM, Steve Samuels wrote:


Thanks for the detailed command output, but you are on the wrong track. The error message has nothing to do with leaving a cluster variable (PSU or "primary sampling unit") out of the -svyset- command. SAS and Stata behave the same in this regard. The proof is that in the -logistic- results, the "Number of PSUs" and "Number of obs" are the same.

We've already given you suggestions and links for fixing the problem identified by the error message. There are different solutions, and you will have to choose. You have encountered Stata's default behavior when there is only one PSU in some strata. I suspect it's intended to force the user to make a deliberate choice. 

The most conservative approach (biggest standard errors) would be the following:

*********************************************************************
svyset [pweight= var_weight], strata(var_strata) singleunit(centered)
*********************************************************************

I suggest that you figure out why there are singleton strata, because the answer might help you choose a better solution. One possibility is that there were more observations in the stratum, but that all but one have missing values for an analysis variable.  Another guess is that you have a subset of the original survey data, and the subgroup has only one member in some strata.  (If you have subset the data, you should use a -subpop- option in your -svy: logistic- model.) Still another possibility is that some observations were marked in advance to be in the sample; in other words they are "certainty" units; and each was given its own stratum. This is one of the possibilities for the -singleunit()- option of the -svyset- command. 

The contributed command you refer to (the FAQ ask that you give exact references) is -svylogitgof-. You can install it by typing "find svylogitgof" and going to the first listed "package". 


Steve
sjsamuels@gmail.com







On Dec 7, 2011, at 8:01 PM, Antonio silva wrote:

Thanks for the replies. I can run a model using  SAS surveylogistic without  the cluster variable but I have had  difficulties to do the same with Stata version 11. I am a beginner in Stata programming.My final goal is to calculate the Archer  and  Lemeshow  (A-L;  2006) goodness of fit test (with estat gof command) that is not available in SAS. To do that I have to  run correctly the logistic regression model (with only weight and strata without cluster) in Stata. I hope someone can help with the Stata code.
Consider the following  code (ex. with 2 categorical covariates)  that have been used and the output .

svyset [pweight= var_weight], strata(var_strata)


.  xi: svy: logistic outcome i.covar1  i.covar2_3cat 


i.covar1            _Icovar1_1-2          (naturally coded; _Icovar1_1 omitted)
i.covar2_3cat    _Icovar2_3_1-3     (naturally coded; _Icovar2_3_1 omitted)
(running logistic on estimation sample)

Survey: Logistic regression

Number of strata   =         9                  Number of obs      =       398
Number of PSUs     =       398                  Population size    = 4361.1088
                                            Design df          =       389
                                            F(   0,    389)    =         .
                                            Prob > F           =         .

------------------------------------------------------------------------------
         |             Linearized
  outcome | Odds Ratio   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
_Icovar1_2 |   1.926984          .        .       .            .           .
_Icovar2_~2 |   .2875105          .        .       .            .           .
_Icovar2_~3 |   .1978389          .        .       .            .           .
------------------------------------------------------------------------------
Note: missing standard errors because of stratum with single sampling unit.

Thanks,
Antonio.
> -----Original Message-----
> From: skolenik@gmail.com
> Sent: Wed, 7 Dec 2011 11:18:37 -0600
> To: statalist@hsphsun2.harvard.edu
> Subject: Re: st: logistic regression complex samples
> 
> Antonio,
> 
> it would help if you mentioned the version of Stata that you are
> using. By default, Stata would use observations as PSUs (and the
> output of -svyset- would state that -- again, it would help if you
> included the output of both commands). You can also achieve the effect
> of specifying observations as PSUs via -svyset _n ...-.
> 
> On Wed, Dec 7, 2011 at 10:05 AM, Antonio silva <asilva@inbox.com> wrote:
>> Hello,
>> I would like to perform binary logistic regression in stratified
>> sampling incorporating 2 variables that represents that design
>> var_weight and var_strata.
>> Considering a model with 2 covariates , in SAS I would consider a code
>> like this that works perfectly:
>> 
>> PROC SURVEYLOGISTIC DATA =  dataset
>> STRATA var_strata;
>> 
>> WEIGHT var_weight;
>> 
>> 
>> CLASS covariate1
>>   Covariate2  ;
>> 
>> MODEL outcome(event='1')= covariate1 covariate2 /clparm vadjust=none ;
>> Run;
>> 
>> 
>> I tried an equivalent Stata code but does not work. It seems that in
>> Stata its is always necessary have the cluster variable. But in my
>> design I do not have cluster variable,only weight and strata.
>> 
>> svyset [pweight= var_weight], strata(var_strata)
>> 
>> svy: logistic outcome i.covariate1 i.covariate2
>> 
>> After run , in the output appears only the OR calculated and a note:
>> Note: missing standard errors because of stratum with single sampling
>> unit.
>> What is wrong with it?
>> 
>> After that I did some tests considering a fictitious cluster variable
>> and worked.   I suppose this command works only when the 3 design
>> variables weight strata and cluster are used at the same time.
> 
> --
> Stas Kolenikov, also found at http://stas.kolenikov.name
> Small print: I use this email account for mailing lists only.
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

____________________________________________________________
Share photos & screenshots in seconds...
TRY FREE IM TOOLPACK at http://www.imtoolpack.com/default.aspx?rc=if1
Works in all emails, instant messengers, blogs, forums and social networks.



*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index