Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nick Cox <njcoxstata@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: sign test output |

Date |
Thu, 17 Jan 2013 12:05:45 +0000 |

I can't see your data, so the only answers I can give are 1. If results appear contradictory, the response is not to believe one or the other. It is a matter of digging deeper to find out what is going on. 2. Look _much_ more closely at the data, with lots of graphs. It could easily be, for example, that there are several _small_ positive changes that are naturally tallied as positive when the sign test is used but are not a big deal when the t test is applied. That wouldn't surprise me at all with something like literacy, but I am reduced to guessing. 3. Think in terms of confidence intervals for the amount of change. Perhaps this is homework, and you are expected to apply these tests as a matter of instruction. If it's research, then looking at the data is likely to show quite as much as these simple tests. Nick On Thu, Jan 17, 2013 at 11:33 AM, Nahla Betelmal <nahlaib@gmail.com> wrote: > Dear Nick, > > Thanks for your reply, I will look up the reference, and I will use > the -qnorm- as well (thanks for pointing out). > > But if t-test can work out even if the assumptions are not satisfied, > and I got a contradicting results using sign test (i.e. t-test : > accept the null U=0, while sign test: reject the null) which one > should I follow? > > Many thanks > > Nahla > > ttest DA_T_1 == 0 > > One-sample t test > ------------------------------------------------------------------------------ > Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] > ---------+-------------------------------------------------------------------- > DA_T_1 | 346 1.564346 1.68628 31.36663 -1.752338 4.88103 > ------------------------------------------------------------------------------ > mean = mean(DA_T_1) t = 0.9277 > Ho: mean = 0 degrees of freedom = 345 > > Ha: mean < 0 Ha: mean != 0 Ha: mean > 0 > Pr(T < t) = 0.8229 Pr(|T| > |t|) = 0.3542 Pr(T > t) = 0.1771 > > While > > ksmirnov DA_T_1 = normal((DA_T_1-DA_T_1_mu)/ DA_T_1_s) > > One-sample Kolmogorov-Smirnov test against theoretical distribution > normal((DA_T_1-DA_T_1_mu)/ DA_T_1_s) > > Smaller group D P-value Corrected > ---------------------------------------------- > DA_T_1: 0.4878 0.000 > Cumulative: -0.4330 0.000 > Combined K-S: 0.4878 0.000 0.000 > > > On 17 January 2013 10:59, Nick Cox <njcoxstata@gmail.com> wrote: >> Sorry; I misread radically what your variable is, and it is helpful >> that you have now explained it. >> >> My suggestion of a binomial confidence interval still makes sense when >> understood in this way: equal numbers of positive and negative >> differences imply a fraction of 0.5 for pr(positive) and also >> pr(negative). >> >> The literature is large and contradictory and the advice you quote >> from somewhere >> >>> Shapiro-Wilk is used to test normality, when the number of >>> observations is less than 30. Otherwise, we should use >>> Kolmogorov-Smirnov for large sample (as in my sample). >> >> would never be my two sentences of advice. I would always start out >> with -qnorm- and often end with it. Kolmogorov-Smirnov is more >> sensitive in the middle than in the tails of a distribution, which is >> precisely the wrong way round. >> >> All that said, there is a lot of literature to the effect that the >> t-test can work very well even when assumptions are not well >> satisfied. See for example Rupert Miller, Beyond ANOVA >> >> http://www.amazon.com/Beyond-ANOVA-Applied-Statistics-Statistical/dp/0412070111 >> >> Nick >> >> On Thu, Jan 17, 2013 at 10:21 AM, Nahla Betelmal <nahlaib@gmail.com> wrote: >>> Dear Nick, >>> >>> Thank you for the comments. the variable I am testing is not binary , >>> and the literary of my field is concerned whether the mean (median) of >>> this variable is different than zero. So, U is the mean in case the >>> variable is normally distributed, or U is the median in case the >>> distribution is not normal. >>> >>> from my readings in statistics , I know that in order to decide >>> whether to use parametric or non-parametric tests, the data normality >>> distribution should be checked first. >>> >>> Shapiro-Wilk is used to test normality, when the number of >>> observations is less than 30. Otherwise, we should use >>> Kolmogorov-Smirnov for large sample (as in my sample). >>> >>> So, when the test accepts the null (normality), we should use the >>> parametric test (i.e. t-test) which examines the mean. On the other >>> hand if the null of normality was reject, we should use the >>> non-parametric test ( sign test) instead which examines the median (As >>> in my case). >>> >>> Also, for the comment about robust, I meant exactly what said (I used >>> the robust term loosely) >>> >>> Thanks for suggesting to read again, sure I will do. >>> >>> Many thanks again >>> >>> Nahla >>> >>> On 17 January 2013 09:49, Nick Cox <njcoxstata@gmail.com> wrote: >>>> Your t-test is testing a quite different hypothesis. If the two states >>>> 0 and 1 of a binary variable have equal frequencies, then its mean is >>>> 0.5, not 0. >>>> >>>> That aside, the t-test can not be more appropriate for a binary >>>> variable than what you have done already, and this is predictable in >>>> advance, as a distribution with two distinct states is not a normal >>>> distribution. You do not need a Kolmogorov-Smirnov test to tell you >>>> that. >>>> >>>> For the record, what I suggested is best not described as a robust >>>> test. It was calculating a confidence interval, and I showed that for >>>> your data the result was robust to the method of calculation, meaning >>>> merely not sensitive. The word "robust" was used informallly. >>>> >>>> You never define what you mean by u, so I am not commenting on any >>>> details about u. >>>> >>>> I recommend that you read (or re-read) a good introductory text on >>>> statistics, as you appear confused on some basic matters. >>>> >>>> Nick >>>> >>>> On Thu, Jan 17, 2013 at 7:52 AM, Nahla Betelmal <nahlaib@gmail.com> wrote: >>>> >>>>> Thank you Maarten and Nick for the great help. >>>>> >>>>> So, in this case I would reject the null in favour of the alternative >>>>> u>0 as p value 0.000. However, using t-test on the same sample >>>>> provided the opposite (i.e. accept the null). >>>>> >>>>> ttest DA_T_1 == 0 >>>>> >>>>> One-sample t test >>>>> ------------------------------------------------------------------------------ >>>>> Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] >>>>> ---------+-------------------------------------------------------------------- >>>>> DA_T_1 | 346 1.564346 1.68628 31.36663 -1.752338 4.88103 >>>>> ------------------------------------------------------------------------------ >>>>> mean = mean(DA_T_1) t = 0.9277 >>>>> Ho: mean = 0 degrees of freedom = 345 >>>>> >>>>> Ha: mean < 0 Ha: mean != 0 Ha: mean > 0 >>>>> Pr(T < t) = 0.8229 Pr(|T| > |t|) = 0.3542 Pr(T > t) = 0.1771 >>>>> >>>>> >>>>> I think this is due to the distribution of the sample, so I performed >>>>> K-S normality test. It shows that data is not normally distributed, >>>>> hence I should use the non-parametric sign test instead of t-test. In >>>>> other words I would reject the null u=0 in favor of u>0 , right? >>>>> >>>>> >>>>> ksmirnov DA_T_1 = normal((DA_T_1-DA_T_1_mu)/ DA_T_1_s) >>>>> >>>>> One-sample Kolmogorov-Smirnov test against theoretical distribution >>>>> normal((DA_T_1-DA_T_1_mu)/ DA_T_1_s) >>>>> >>>>> Smaller group D P-value Corrected >>>>> ---------------------------------------------- >>>>> DA_T_1: 0.4878 0.000 >>>>> Cumulative: -0.4330 0.000 >>>>> Combined K-S: 0.4878 0.000 0.000 >>>>> >>>>> >>>>> N.B. Thank you so much Nick for the robust test you mentioned, I will >>>>> use that as well) >>>>> >>>>> Many thanks >>>>> >>>>> Nahla >>>>> >>>>> On 16 January 2013 09:33, Nick Cox <njcoxstata@gmail.com> wrote: >>>>>> In addition, it could be as or more useful to think in terms of >>>>>> confidence intervals. With this sample size and average, 0.5 lies well >>>>>> outside 95% intervals for the probability of being positive, and that >>>>>> is robust to method of calculation: >>>>>> >>>>>> . cii 346 221 >>>>>> >>>>>> -- Binomial Exact -- >>>>>> Variable | Obs Mean Std. Err. [95% Conf. Interval] >>>>>> -------------+--------------------------------------------------------------- >>>>>> | 346 .6387283 .0258248 .5856497 .6894096 >>>>>> >>>>>> . cii 346 221, jeffreys >>>>>> >>>>>> ----- Jeffreys ----- >>>>>> Variable | Obs Mean Std. Err. [95% Conf. Interval] >>>>>> -------------+--------------------------------------------------------------- >>>>>> | 346 .6387283 .0258248 .5871262 .6880204 >>>>>> >>>>>> . cii 346 221, wilson >>>>>> >>>>>> ------ Wilson ------ >>>>>> Variable | Obs Mean Std. Err. [95% Conf. Interval] >>>>>> -------------+--------------------------------------------------------------- >>>>>> | 346 .6387283 .0258248 .5868449 .6875651 >>>>>> >>>>>> Nick >>>>>> >>>>>> On Wed, Jan 16, 2013 at 9:13 AM, Maarten Buis <maartenlbuis@gmail.com> wrote: >>>>>>> On Wed, Jan 16, 2013 at 9:38 AM, Nahla Betelmal wrote: >>>>>>>> I have generated this output using non-parametric test "one sample >>>>>>>> sign test" with null: U=0 , & Ua > 0 >>>>>>>> >>>>>>>> However, I do not understand the output. where is the p-value? is it >>>>>>>> 0.5 in all cases or the 0.000 ( as in the first and third cases) and >>>>>>>> 1.000 as in the second case? >>>>>>>> >>>>>>>>. signtest DA_T_1= 0 >>>>>>>> >>>>>>>> Sign test >>>>>>>> >>>>>>>> sign | observed expected >>>>>>>> -------------+------------------------ >>>>>>>> positive | 221 173 >>>>>>>> negative | 125 173 >>>>>>>> zero | 0 0 >>>>>>>> -------------+------------------------ >>>>>>>> all | 346 346 >>>>>>>> >>>>>>>> One-sided tests: >>>>>>>> Ho: median of DA_T_1 = 0 vs. >>>>>>>> Ha: median of DA_T_1 > 0 >>>>>>>> Pr(#positive >= 221) = >>>>>>>> Binomial(n = 346, x >= 221, p = 0.5) = 0.0000 >>>>>>> >>>>>>> The p-value is the last number, so in your case 0.0000. The stuff >>>>>>> before the p-value tells you how it is computed: it is based on the >>>>>>> binomial distribution, and in particular it is the chance of observing >>>>>>> 221 successes or more in 346 trials when the chance of success at each >>>>>>> trial is .5. For this tests this chance is the p-value, and it is very >>>>>>> small, less than 0.00005. If you type in Stata -di binomialtail(346, >>>>>>> 221, 0.5)- you will see that this chance is 1.381e-07, i.e. >>>>>>> 0.00000001381. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**References**:**st: sign test output***From:*Nahla Betelmal <nahlaib@gmail.com>

**Re: st: sign test output***From:*Maarten Buis <maartenlbuis@gmail.com>

**Re: st: sign test output***From:*Nick Cox <njcoxstata@gmail.com>

**Re: st: sign test output***From:*Nahla Betelmal <nahlaib@gmail.com>

**Re: st: sign test output***From:*Nick Cox <njcoxstata@gmail.com>

**Re: st: sign test output***From:*Nahla Betelmal <nahlaib@gmail.com>

**Re: st: sign test output***From:*Nick Cox <njcoxstata@gmail.com>

**Re: st: sign test output***From:*Nahla Betelmal <nahlaib@gmail.com>

- Prev by Date:
**Re: st: sign test output** - Next by Date:
**Re: st: sign test output** - Previous by thread:
**Re: st: sign test output** - Next by thread:
**Re: st: sign test output** - Index(es):