Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Median test & ANOVA with sampling weights


From   Nur.Hikmayani@studentmail.newcastle.edu.au
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Median test & ANOVA with sampling weights
Date   Mon, 22 Sep 2008 07:30:30 +0700

Thanks Steve for enlightening!

hfd--

----- Original Message -----
From: Steven Samuels <sjhsamuels@earthlink.net>
Date: Monday, September 22, 2008 0:22 am
Subject: Re: st: Median test & ANOVA with sampling weights

> I meant to finish the sentence in the second paragraph:
> 
> "This is because the mean for the population will be closer to 
> that  
> of the diabetics because it contains a contribution from the 
> diabetics."
> -Steve
> 
> On Sep 21, 2008, at 12:53 PM, Steven Samuels wrote:
> 
> > hafida--
> >
> > 1. "Since it's not a case control study, I thought that 
> comparing  
> > those with and without diabetes was inappropriate"
> >
> > That's not correct.  You want to compare diabetics to the whole  
> > population.  This is *equivalent* to comparing diabetics to non- 
> > diabetics.  There is no stata command which compare part of a  
> > sample with the whole sample, but there are plenty (-cendif-) is 
> 
> > one, which will compare a part to the other part and give you a 
> CI  
> > for the difference.
> >
> > This is easiest to illustrate with means:  Suppose the mean for  
> > diabetics for a variable is 10 and that for non-diabetics is 10. 
>  
> > The difference is zero.  If diabetics are 10% of the population, 
> 
> > the mean for the population is (.1 x 10)  + (.9 x 10) = 1 + 9 =  
> > 10.  The difference between this and the diabetics' mean is also 
> 
> > zero.  On the other hand, suppose that the mean for non-
> diabetics  
> > is 20; the difference from the mean of the diabetics is 10.  
> Then  
> > the population mean is .1 x 10 + .9 x 20 = 1 + 18 =19; the  
> > difference from the mean of diabetics is 9. Notice that the  
> > diabetic/population difference is < diabetic/non-diabetic  
> > difference.  This is because the d
> >
> >
> > 2. As -cendif- is a rank procedure, you will get the same 
> results  
> > for any transformation.  There is no need to transform.
> >
> > 3. If you are uncertain of basic math functions, it is time to  
> > review; you will not be happy in epidemiology without a working  
> > knowledge of back-transformations.  To answer your question 
> about  
> > the "cubic": x^3  and x^(1/3) are inverses in Stata (-help  
> > operators-).  Not sure what this means? try a google search on:  
> > inverse function introduction.
> >
> > I strongly suggest that you consult a Biostatistics staff member 
> at  
> > Newcastle.
> >
> > Good luck!
> >
> > -Steve
> >
> >
> >
> >
> > On Sep 19, 2008, at 11:06 PM,  
> > Nur.Hikmayani@studentmail.newcastle.edu.au wrote:
> >
> >> Hi Steve and all,
> >> I think you're correctly recognising my situation: I might have 
> 
> >> taken the sampling issue wrong so far.
> >> For additional information, I'm working with a data set from a  
> >> national longitudinal survey with three age cohorts (young, 
> mids,  
> >> older) which were randomly re-sampled from Medicare database  
> >> employing stratified random sampling.
> >>
> >> . svyset [pweight=o1wtarea], strata(o4state)
> >>       pweight: o1wtarea
> >>           VCE: linearized
> >>   Single unit: missing
> >>      Strata 1: o4state
> >>          SU 1: <observations>
> >>         FPC 1: <zero>
> >>
> >> I focus on older cohort only at a certain time point (4th 
> survey)  
> >> and my sample is those with diabetes. My project aims to look 
> at  
> >> if different patterns of cardiovascular medication use is  
> >> associated with quality of life (4 dimensions of SF-36). The 
> study  
> >> design is pretty simple, cross sectional. However, I have 
> received  
> >> some input that comparison between my sample and the entire in 
> the  
> >> cohort (older at survey 4) is worth performing. Since it's not 
> a  
> >> case control study, I thought that comparing those with and  
> >> without diabetes was inappropriate, leading me to consider 
> using - 
> >> svy- (which maybe equally or even more inappropriate!). Your  
> >> suggestion, however, indicates that my previous thought was ok 
> and  
> >> I perhaps needn't use -svy- at all. Did I take it correctly?
> >>
> >> Some of the dependent variables are skewed and -gladder- offers 
> 
> >> cubic transformation to best approximate normal distribution. 
> If  
> >> any median test is not fairly robust, is comparing transformed  
> >> means acceptable in this case? (My concern is that cubic  
> >> transformation, perhaps unlike log, will inflate type I error). 
> 
> >> Also, what is the command to perform a back transformation from 
> 
> >> cubic? (I'm definitely not a maths nerd :)).
> >>
> >>
> >> thanks,
> >> hafida--
> >>
> >>
> >> On Sep 20, 2008, at 1:11 AM Steven Samuels to statalist wrote:
> >>
> >> hafida--
> >>
> >> You've given us very little information about your survey 
> sample  
> >> and its design. More would have been helpful.
> >>
> >> You appear to be misusing the terms "sample" and "population". 
> A  
> >> "population" is the larger group of people represented by the  
> >> sample; statistics for a population are known from outside 
> sources  
> >> such as a census. For example, in the U.S. a sample of 1500 
> people  
> >> might represent the population of millions. What you are 
> calling  
> >> "sample" and "population" appear to be, respectively,  one  
> >> subgroup of a sample (those with dmstat=1) and the entire sample.
> >>
> >> The proper way to compare one subgroup to the whole group is to 
> 
> >> compare the subgroup to the others. So, form two groups: group 
> = 1  
> >> if dmstat =1 and group = 2 if dmstat is not 1 (the rest of the  
> >> sample).
> >>
> >> -pctile- will estimate weighted medians, but the CI's will not 
> be  
> >> correct, for they assume independent observations. To proceed, 
> you  
> >> must know the sampling design, including cluster and stratum  
> >> information. The program -cendif- by Roger Newson (-findit  
> >> cendif-) will estimate differences in the medians and 
> accommodates  
> >> sampling weights and clustering. The sign test, in contrast, is 
> 
> >> for a set of paired independent observations, not for any list 
> of  
> >> paired numbers.
> >>
> >> To do ANOVA, you must first -svyset- your data and use -svy: 
> reg-.  
> >> There is nothing special about -svy: reg-; ust set up the ANOVA 
> as  
> >> you would do with ordinary -reg-. To compare individual groups 
> to  
> >> one another, after the regression  run -test-, with options -
> mtest 
> >> (holm)- or -mtest(sidak)-.
> >>
> >> Your post shows that you are fairly new to sampling concepts.  
> >> Before proceeding, I suggest that you look at a good text; I  
> >> recommend "Sampling Design and Analysis", by Sharon Lohr.  Your 
> 
> >> faculty may be able to suggest local resources.
> >>
> >> -Steve
> >>
> >>
> >> On Sep 19, 2008, at 7:53 AM,  
> >> Nur.Hikmayani@studentmail.newcastle.edu.au wrote:
> >>
> >>
> >>     I'm using a survey data and wonder how can I perform a  
> >> comparison between median in the sample and in the population.  
> >> Medians were separately obtained using -pctile- or -_pctile-.
> >>
> >>     . pctile pctGH = o4gh [pw=o1wtarea], nq(4) genp(percent)
> >>     . list percent pct in 1/4
> >>      +-----------------+
> >>      | percent pctGH |
> >>      |-----------------|
> >>      1. | 25 50 |
> >>      2. | 50 67 |
> >>      3. | 75 77 |
> >>      4. | . . |
> >>      +-----------------+
> >>
> >>     . pctile pctileGH1 = o4gh if dmstat==1 [pw=o1wtarea], nq(4) 
> 
> >> genp(pctGH1)
> >>     . list pctGH1 pctileGH1 in 1/4
> >>      +------------------+
> >>      | pctGH1 pctileGH1 |
> >>      |------------------|
> >>      1. | 25 40 |
> >>      2. | 50 60 |
> >>      3. | 75 72 |
> >>      4. | . . |
> >>      +------------------+
> >>
> >>     Should I calculate the difference between each value in the 
> 
> >> sample and population first and carry out a sign test then? If 
> so,  
> >> how is sampling weight taken into account? (I mean, can I use  
> >> weighted median in the population to substract each 
> 'unweighted'  
> >> value?)
> >>
> >>     Secondly, is it possible to perform one-way ANOVA with  
> >> sampling weight, particularly for post-hoc comparison? Using 
> svy:  
> >> regress did not give enough information.
> >> *
> >> *   For searches and help try:
> >> *   http://www.stata.com/help.cgi?search
> >> *   http://www.stata.com/support/statalist/faq
> >> *   http://www.ats.ucla.edu/stat/stata/
> >
> > *
> > *   For searches and help try:
> > *   http://www.stata.com/help.cgi?search
> > *   http://www.stata.com/support/statalist/faq
> > *   http://www.ats.ucla.edu/stat/stata/
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
> 
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index