[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
Joseph Coveney <[email protected]> |

To |
Statalist <[email protected]> |

Subject |
Re: st: Re: vanelteren |

Date |
Mon, 01 Aug 2005 01:19:35 +0900 |

Ricardo Ovaldia wrote: --- Joseph Coveney wrote: >> That the medians of the pooled data are identical > wouldn't bother me so much as > the difference between the asymptotic and > permutation p-values with 256 dams. > > Take a look at what -xtreg percent mm, i(dam) fe- > gives you (and also take a > look at, say, -pnorm- on the residuals, for > starters). I'm guessing that > -xtreg, fe- (which would have been my first choice > with this design) gives you > the same take-home message as what -vaneltern- does. Yes, I first used -xtreg- and obtained a p-value of 0.018. Although the residual plot did not look to bad using -pnorm-, they look bad on the -qnorm- plot. I was concerned that I was not meeting the normality assumption, therefore I opted to use a non-parametric test. I also tried to find a transformation, but the ones I selected found did not performed any better. -------------------------------------------------------------------------------- Good enough. I'm glad that things worked out with -vanelteren-, then. I wouldn't necessarily write off -xtreg, fe- completely, though. Some work by Lisa Sullivan and Ralph D'Agostino Sr.* indicates that the power of a t-test on differences of paired ordered-categorical data is still pretty good, even with small samples. The normality assumption isn't very well met in their case. The do-file below suggests that the findings of Sullivan and D'Agostino can be extended beyond the paired t-test. -vanelteren- and -xtreg, fe- are compared in a simulation of an arrangement with 40 variably sized clusters of two to twelve that are divided into two comparison groups. The do-file creates a skewed distribution of ordered categorical data with five categories. The performance of -xtreg, fe- isn't too shabby for hypothesis testing with two levels of the grouping variable--Null: 55 / 1000 replicates in the simulation (-vanelteren-) versus 45 / 1000 (-xtreg, fe-) at a nominal 5% Type I error rate; Alternative: 222 / 1000 versus 216 / 1000. I would expect the findings to generally hold up with tenfold the replicates. It might be worthwhile to see how well -xtreg, fe- holds up with smaller variable cluster sizes (we know what T = 2 is from Sullivan and D'Agostino), cluster numbers (smaller samples) and levels of ordered categories (down to four or even three). Not to suggest -xtreg, fe- for *estimation* here. Joseph Coveney *L. M. Sullivan & R. B. D'Agostino Sr., Robustness and power of analysis of covariance applied to ordinal scaled data as arising in randomized controlled trials. _Statistics in Medicine_ 22(8):1317-34, 2003. clear set more off set seed `=date("2005-08-02", "ymd")' set obs 12 forvalues i = 1/12 { generate float a`i' = 0.5 + 0.5 * (_n == `i') local varlist `varlist' latent_variable`i' } mkmat a*, matrix(A) local one_eighth = 1/8 forvalues i = 1/6 { local null_means `null_means' 0 0 local alternative_means `alternative_means' 0 `one_eighth' } * capture program drop simem program define simem, rclass syntax namelist, MEANS(numlist) drawnorm `namelist', means(`means') corr(A) n(40) clear generate byte stratum = _n generate byte number_of_replicates = 2 + floor(uniform() * 10) reshape long latent_variable, i(stratum) j(observation) drop if observation > number_of_replicates generate byte manifest_variable = 1 scalar lowest_cutpoint = 1 / (2 + 4 + 8 + 16) foreach multiple in 2 4 8 16 { quietly replace manifest_variable = manifest_variable + /// (norm(latent_variable) > (1 - `multiple' * /// scalar(lowest_cutpoint))) } generate byte grouping_variable = mod(observation, 2) vanelteren manifest_variable, by(grouping_variable) /// strata(stratum) return scalar vanelteren = r(p) xtreg manifest_variable grouping_variable, i(stratum) fe return scalar xtregfe = Ftail(e(df_b), e(df_r), e(F)) end * simulate vanelteren = r(vanelteren) xtregfe = r(xtregfe), /// reps(1000) nodots: simem `varlist', means(`null_means') generate byte positive_vanelteren = vanelteren < 0.05 generate byte positive_xtregfe = xtregfe < 0.05 summarize positive_* simulate vanelteren = r(vanelteren) xtregfe = r(xtregfe), /// reps(1000) nodots: simem `varlist', means(`alternative_means') generate byte positive_vanelteren = vanelteren < 0.05 generate byte positive_xtregfe = xtregfe < 0.05 summarize positive_* simem `varlist', means(`alternative_means') predict residuals, e pause on version 7: kdensity residuals, norm pause version 7: pnorm residuals pause version 7: qnorm residuals exit * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**st: RE: RE: how to use multiple datasets?** - Next by Date:
**Re: st: Poisson regression** - Previous by thread:
**Re: st: Re: vanelteren** - Next by thread:
**st: Correctly identifying families in a household survey** - Index(es):

© Copyright 1996–2024 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |