[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Median test & ANOVA with sampling weights

From   Steven Samuels <[email protected]>
To   [email protected]
Subject   Re: st: Median test & ANOVA with sampling weights
Date   Sun, 21 Sep 2008 12:53:53 -0400


1. "Since it's not a case control study, I thought that comparing those with and without diabetes was inappropriate"

That's not correct. You want to compare diabetics to the whole population. This is *equivalent* to comparing diabetics to non- diabetics. There is no stata command which compare part of a sample with the whole sample, but there are plenty (-cendif-) is one, which will compare a part to the other part and give you a CI for the difference.

This is easiest to illustrate with means: Suppose the mean for diabetics for a variable is 10 and that for non-diabetics is 10. The difference is zero. If diabetics are 10% of the population, the mean for the population is (.1 x 10) + (.9 x 10) = 1 + 9 = 10. The difference between this and the diabetics' mean is also zero. On the other hand, suppose that the mean for non-diabetics is 20; the difference from the mean of the diabetics is 10. Then the population mean is .1 x 10 + .9 x 20 = 1 + 18 =19; the difference from the mean of diabetics is 9. Notice that the diabetic/population difference is < diabetic/non-diabetic difference. This is because the d

2. As -cendif- is a rank procedure, you will get the same results for any transformation. There is no need to transform.

3. If you are uncertain of basic math functions, it is time to review; you will not be happy in epidemiology without a working knowledge of back-transformations. To answer your question about the "cubic": x^3 and x^(1/3) are inverses in Stata (-help operators-). Not sure what this means? try a google search on: inverse function introduction.

I strongly suggest that you consult a Biostatistics staff member at Newcastle.

Good luck!


On Sep 19, 2008, at 11:06 PM, [email protected] wrote:

Hi Steve and all,
I think you're correctly recognising my situation: I might have taken the sampling issue wrong so far.
For additional information, I'm working with a data set from a national longitudinal survey with three age cohorts (young, mids, older) which were randomly re-sampled from Medicare database employing stratified random sampling.

. svyset [pweight=o1wtarea], strata(o4state)
pweight: o1wtarea
VCE: linearized
Single unit: missing
Strata 1: o4state
SU 1: <observations>
FPC 1: <zero>

I focus on older cohort only at a certain time point (4th survey) and my sample is those with diabetes. My project aims to look at if different patterns of cardiovascular medication use is associated with quality of life (4 dimensions of SF-36). The study design is pretty simple, cross sectional. However, I have received some input that comparison between my sample and the entire in the cohort (older at survey 4) is worth performing. Since it's not a case control study, I thought that comparing those with and without diabetes was inappropriate, leading me to consider using -svy- (which maybe equally or even more inappropriate!). Your suggestion, however, indicates that my previous thought was ok and I perhaps needn't use -svy- at all. Did I take it correctly?

Some of the dependent variables are skewed and -gladder- offers cubic transformation to best approximate normal distribution. If any median test is not fairly robust, is comparing transformed means acceptable in this case? (My concern is that cubic transformation, perhaps unlike log, will inflate type I error). Also, what is the command to perform a back transformation from cubic? (I'm definitely not a maths nerd :)).


On Sep 20, 2008, at 1:11 AM Steven Samuels to statalist wrote:


You've given us very little information about your survey sample and its design. More would have been helpful.

You appear to be misusing the terms "sample" and "population". A "population" is the larger group of people represented by the sample; statistics for a population are known from outside sources such as a census. For example, in the U.S. a sample of 1500 people might represent the population of millions. What you are calling "sample" and "population" appear to be, respectively, one subgroup of a sample (those with dmstat=1) and the entire sample.

The proper way to compare one subgroup to the whole group is to compare the subgroup to the others. So, form two groups: group = 1 if dmstat =1 and group = 2 if dmstat is not 1 (the rest of the sample).

-pctile- will estimate weighted medians, but the CI's will not be correct, for they assume independent observations. To proceed, you must know the sampling design, including cluster and stratum information. The program -cendif- by Roger Newson (-findit cendif-) will estimate differences in the medians and accommodates sampling weights and clustering. The sign test, in contrast, is for a set of paired independent observations, not for any list of paired numbers.

To do ANOVA, you must first -svyset- your data and use -svy: reg-. There is nothing special about -svy: reg-; ust set up the ANOVA as you would do with ordinary -reg-. To compare individual groups to one another, after the regression run -test-, with options -mtest (holm)- or -mtest(sidak)-.

Your post shows that you are fairly new to sampling concepts. Before proceeding, I suggest that you look at a good text; I recommend "Sampling Design and Analysis", by Sharon Lohr. Your faculty may be able to suggest local resources.


On Sep 19, 2008, at 7:53 AM, [email protected] wrote:

I'm using a survey data and wonder how can I perform a comparison between median in the sample and in the population. Medians were separately obtained using -pctile- or -_pctile-.

. pctile pctGH = o4gh [pw=o1wtarea], nq(4) genp(percent)
. list percent pct in 1/4
| percent pctGH |
1. | 25 50 |
2. | 50 67 |
3. | 75 77 |
4. | . . |

. pctile pctileGH1 = o4gh if dmstat==1 [pw=o1wtarea], nq(4) genp (pctGH1)
. list pctGH1 pctileGH1 in 1/4
| pctGH1 pctileGH1 |
1. | 25 40 |
2. | 50 60 |
3. | 75 72 |
4. | . . |

Should I calculate the difference between each value in the sample and population first and carry out a sign test then? If so, how is sampling weight taken into account? (I mean, can I use weighted median in the population to substract each 'unweighted' value?)

Secondly, is it possible to perform one-way ANOVA with sampling weight, particularly for post-hoc comparison? Using svy: regress did not give enough information.
* For searches and help try:
*   For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index