Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | "Seed, Paul" <paul.seed@kcl.ac.uk> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | Re: st: sign test output |
Date | Fri, 18 Jan 2013 10:48:53 +0000 |
Dear Statalist, While the discussion of Nahla Betelmal's query has been interesting and informative, one point seems to have been missed: the question is ill-defined. It appears that Nahla Betelmal has a variable that she wants/expects for good theoretical reasons to have an average of 0; and wants to test if this is true. We are not told any more. If (s)he came to me for statistical advice, I would instantly want to know - what the theoretical reasons were - which average (the mean or the median) was expected to be 0 - how large a tolerance was acceptable - what the implications would be if the average was not 0. Until I had a clear understanding, I would not want start analysing data. The second question is crucial. For a seriously non-normal distribution, the mean and the median can be quite different, and it is possible to construct examples where the mean is significantly > 0, while the median is significantly < 0. Normality checks would be mainly graphical, for the reasons discussed; but I might look at measures of skewness, kurtosis and in particular compare whether the mean and median were sufficiently close for it not to matter which I used. (Estimates of the mean are usually more robust, so with low skew and mean close to median, I might prefer to use the mean even if the median were the main object of interest.) Assuming interest was in the mean, I would advise one or more of one-sample t-test (quick simple, and usually sufficient) linear regression with robust standard errors (a basic correction for non-Normality) bootstrapped linear regression with BCa confidence intervals, (a fuller correction, that can give asymmetrical CI where appropriate, e.g. in cases of extreme non-Normality). All methods are well described in the Stata manual, and usually give very similar answers (except for extreme cases of non-Normality). If interest was in the median, and I didn't trust the Normal approximation, I would use the -centile- command with the -cci- option to get a confidence interval for the median. In each case I would direct attention to the confidence interval, and to the question of whether the answer was sufficiently close to 0 (As defined by the third question.) All this assumes that the ultimate interest is in the answer to this question. If it was just a preliminary to another analysis, or the answer was wanted for some deduction that could be made from it, I would also look for other ways of addressing the real question, whatever it might be. On Jan 17, 2013, at 5:13, Nahla Betelmal <nahlaib@gmail.com> wrote: > Again, thank you both for your comments. > > However, if normality test is proved to be useful only for huge sample > as Maarten mentioned. How can we determine which test (i.e. parametric > or non-parametric ) to be used for smaller sample size in hundreds?! > > I personally think it is irrational to run both t-test and sign test > on the same sample and hope they both produce the same conclusion! and > what if they don't! > > I will follow Nick's advise to look deeper in the data, but I still > believe that there must be another way to give obvious solution to > this situation. > > Thank you both again, I highly appreciate your kind help and time, > > Nahla > > Paul T Seed, Senior Lecturer in Medical Statistics, Division of Women's Health, King's College London Women's Health Academic Centre, King's Health Partners (+44) (0) 20 7188 3642. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/