Re: st: Median test & ANOVA with sampling weights

Mon, 22 Sep 2008 07:30:30 +0700

Thanks Steve for enlightening! hfd-- ----- Original Message ----- From: Steven Samuels <[email protected]> Date: Monday, September 22, 2008 0:22 am Subject: Re: st: Median test & ANOVA with sampling weights > I meant to finish the sentence in the second paragraph: > > "This is because the mean for the population will be closer to > that > of the diabetics because it contains a contribution from the > diabetics." > -Steve > > On Sep 21, 2008, at 12:53 PM, Steven Samuels wrote: > > > hafida-- > > > > 1. "Since it's not a case control study, I thought that > comparing > > those with and without diabetes was inappropriate" > > > > That's not correct. You want to compare diabetics to the whole > > population. This is *equivalent* to comparing diabetics to non- > > diabetics. There is no stata command which compare part of a > > sample with the whole sample, but there are plenty (-cendif-) is > > > one, which will compare a part to the other part and give you a > CI > > for the difference. > > > > This is easiest to illustrate with means: Suppose the mean for > > diabetics for a variable is 10 and that for non-diabetics is 10. > > > The difference is zero. If diabetics are 10% of the population, > > > the mean for the population is (.1 x 10) + (.9 x 10) = 1 + 9 = > > 10. The difference between this and the diabetics' mean is also > > > zero. On the other hand, suppose that the mean for non- > diabetics > > is 20; the difference from the mean of the diabetics is 10. > Then > > the population mean is .1 x 10 + .9 x 20 = 1 + 18 =19; the > > difference from the mean of diabetics is 9. Notice that the > > diabetic/population difference is < diabetic/non-diabetic > > difference. This is because the d > > > > > > 2. As -cendif- is a rank procedure, you will get the same > results > > for any transformation. There is no need to transform. > > > > 3. If you are uncertain of basic math functions, it is time to > > review; you will not be happy in epidemiology without a working > > knowledge of back-transformations. To answer your question > about > > the "cubic": x^3 and x^(1/3) are inverses in Stata (-help > > operators-). Not sure what this means? try a google search on: > > inverse function introduction. > > > > I strongly suggest that you consult a Biostatistics staff member > at > > Newcastle. > > > > Good luck! > > > > -Steve > > > > > > > > > > On Sep 19, 2008, at 11:06 PM, > > [email protected] wrote: > > > >> Hi Steve and all, > >> I think you're correctly recognising my situation: I might have > > >> taken the sampling issue wrong so far. > >> For additional information, I'm working with a data set from a > >> national longitudinal survey with three age cohorts (young, > mids, > >> older) which were randomly re-sampled from Medicare database > >> employing stratified random sampling. > >> > >> . svyset [pweight=o1wtarea], strata(o4state) > >> pweight: o1wtarea > >> VCE: linearized > >> Single unit: missing > >> Strata 1: o4state > >> SU 1: <observations> > >> FPC 1: <zero> > >> > >> I focus on older cohort only at a certain time point (4th > survey) > >> and my sample is those with diabetes. My project aims to look > at > >> if different patterns of cardiovascular medication use is > >> associated with quality of life (4 dimensions of SF-36). The > study > >> design is pretty simple, cross sectional. However, I have > received > >> some input that comparison between my sample and the entire in > the > >> cohort (older at survey 4) is worth performing. Since it's not > a > >> case control study, I thought that comparing those with and > >> without diabetes was inappropriate, leading me to consider > using - > >> svy- (which maybe equally or even more inappropriate!). Your > >> suggestion, however, indicates that my previous thought was ok > and > >> I perhaps needn't use -svy- at all. Did I take it correctly? > >> > >> Some of the dependent variables are skewed and -gladder- offers > > >> cubic transformation to best approximate normal distribution. > If > >> any median test is not fairly robust, is comparing transformed > >> means acceptable in this case? (My concern is that cubic > >> transformation, perhaps unlike log, will inflate type I error). > > >> Also, what is the command to perform a back transformation from > > >> cubic? (I'm definitely not a maths nerd :)). > >> > >> > >> thanks, > >> hafida-- > >> > >> > >> On Sep 20, 2008, at 1:11 AM Steven Samuels to statalist wrote: > >> > >> hafida-- > >> > >> You've given us very little information about your survey > sample > >> and its design. More would have been helpful. > >> > >> You appear to be misusing the terms "sample" and "population". > A > >> "population" is the larger group of people represented by the > >> sample; statistics for a population are known from outside > sources > >> such as a census. For example, in the U.S. a sample of 1500 > people > >> might represent the population of millions. What you are > calling > >> "sample" and "population" appear to be, respectively, one > >> subgroup of a sample (those with dmstat=1) and the entire sample. > >> > >> The proper way to compare one subgroup to the whole group is to > > >> compare the subgroup to the others. So, form two groups: group > = 1 > >> if dmstat =1 and group = 2 if dmstat is not 1 (the rest of the > >> sample). > >> > >> -pctile- will estimate weighted medians, but the CI's will not > be > >> correct, for they assume independent observations. To proceed, > you > >> must know the sampling design, including cluster and stratum > >> information. The program -cendif- by Roger Newson (-findit > >> cendif-) will estimate differences in the medians and > accommodates > >> sampling weights and clustering. The sign test, in contrast, is > > >> for a set of paired independent observations, not for any list > of > >> paired numbers. > >> > >> To do ANOVA, you must first -svyset- your data and use -svy: > reg-. > >> There is nothing special about -svy: reg-; ust set up the ANOVA > as > >> you would do with ordinary -reg-. To compare individual groups > to > >> one another, after the regression run -test-, with options - > mtest > >> (holm)- or -mtest(sidak)-. > >> > >> Your post shows that you are fairly new to sampling concepts. > >> Before proceeding, I suggest that you look at a good text; I > >> recommend "Sampling Design and Analysis", by Sharon Lohr. Your > > >> faculty may be able to suggest local resources. > >> > >> -Steve > >> > >> > >> On Sep 19, 2008, at 7:53 AM, > >> [email protected] wrote: > >> > >> > >> I'm using a survey data and wonder how can I perform a > >> comparison between median in the sample and in the population. > >> Medians were separately obtained using -pctile- or -_pctile-. > >> > >> . pctile pctGH = o4gh [pw=o1wtarea], nq(4) genp(percent) > >> . list percent pct in 1/4 > >> +-----------------+ > >> | percent pctGH | > >> |-----------------| > >> 1. | 25 50 | > >> 2. | 50 67 | > >> 3. | 75 77 | > >> 4. | . . | > >> +-----------------+ > >> > >> . pctile pctileGH1 = o4gh if dmstat==1 [pw=o1wtarea], nq(4) > > >> genp(pctGH1) > >> . list pctGH1 pctileGH1 in 1/4 > >> +------------------+ > >> | pctGH1 pctileGH1 | > >> |------------------| > >> 1. | 25 40 | > >> 2. | 50 60 | > >> 3. | 75 72 | > >> 4. | . . | > >> +------------------+ > >> > >> Should I calculate the difference between each value in the > > >> sample and population first and carry out a sign test then? If > so, > >> how is sampling weight taken into account? (I mean, can I use > >> weighted median in the population to substract each > 'unweighted' > >> value?) > >> > >> Secondly, is it possible to perform one-way ANOVA with > >> sampling weight, particularly for post-hoc comparison? Using svy:
>> regress did not give enough information.

