Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: sign test output

 From Nick Cox To statalist@hsphsun2.harvard.edu Subject Re: st: sign test output Date Thu, 17 Jan 2013 12:05:45 +0000

```I can't see your data, so the only answers I can give are

1. If results appear contradictory, the response is not to believe one
or the other. It is a matter of digging deeper to find out what is
going on.

2. Look _much_ more closely at the data, with lots of graphs. It could
easily be, for example, that there are several _small_ positive
changes that are naturally tallied as positive when the sign test is
used but are not a big deal when the t test is applied. That wouldn't
surprise me at all with something like literacy, but I am reduced to
guessing.

3. Think in terms of confidence intervals for the amount of change.

Perhaps this is homework, and you are expected to apply these tests as
a matter of instruction. If it's research, then looking at the data is
likely to show quite as much as these simple tests.

Nick

On Thu, Jan 17, 2013 at 11:33 AM, Nahla Betelmal <nahlaib@gmail.com> wrote:
> Dear Nick,
>
> Thanks for your reply, I will look up the reference, and I will use
> the -qnorm- as well (thanks for pointing out).
>
> But if t-test can work out even if the assumptions are not satisfied,
> and I got a contradicting results using sign test (i.e. t-test :
> accept the null U=0, while sign test: reject the null) which one
> should I follow?
>
> Many thanks
>
> Nahla
>
> ttest DA_T_1 == 0
>
> One-sample t test
> ------------------------------------------------------------------------------
> Variable |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
> ---------+--------------------------------------------------------------------
>   DA_T_1 |     346    1.564346     1.68628    31.36663   -1.752338     4.88103
> ------------------------------------------------------------------------------
>     mean = mean(DA_T_1)                                           t =   0.9277
> Ho: mean = 0                                     degrees of freedom =      345
>
>     Ha: mean < 0                 Ha: mean != 0                 Ha: mean > 0
>  Pr(T < t) = 0.8229         Pr(|T| > |t|) = 0.3542          Pr(T > t) = 0.1771
>
>  While
>
> ksmirnov  DA_T_1 = normal((DA_T_1-DA_T_1_mu)/  DA_T_1_s)
>
> One-sample Kolmogorov-Smirnov test against theoretical distribution
>            normal((DA_T_1-DA_T_1_mu)/  DA_T_1_s)
>
>  Smaller group       D       P-value  Corrected
>  ----------------------------------------------
>  DA_T_1:             0.4878    0.000
>  Cumulative:        -0.4330    0.000
>  Combined K-S:    0.4878    0.000      0.000
>
>
> On 17 January 2013 10:59, Nick Cox <njcoxstata@gmail.com> wrote:
>> that you have now explained it.
>>
>> My suggestion of a binomial confidence interval still makes sense when
>> understood in this way: equal numbers of positive and negative
>> differences imply a fraction of 0.5 for pr(positive) and also
>> pr(negative).
>>
>> The literature is large and contradictory and the advice you quote
>> from somewhere
>>
>>>  Shapiro-Wilk is used to test normality, when the number of
>>> observations is less than 30. Otherwise, we should use
>>> Kolmogorov-Smirnov for large sample (as in my sample).
>>
>> would never be my two sentences of advice. I would always start out
>> with -qnorm- and often end with it. Kolmogorov-Smirnov is more
>> sensitive in the middle than in the tails of a distribution, which is
>> precisely the wrong way round.
>>
>> All that said, there is a lot of literature to the effect that the
>> t-test can work very well even when assumptions are not well
>> satisfied. See for example Rupert Miller, Beyond ANOVA
>>
>> http://www.amazon.com/Beyond-ANOVA-Applied-Statistics-Statistical/dp/0412070111
>>
>> Nick
>>
>> On Thu, Jan 17, 2013 at 10:21 AM, Nahla Betelmal <nahlaib@gmail.com> wrote:
>>> Dear Nick,
>>>
>>> Thank you for the comments. the variable I am testing is not binary ,
>>> and the literary of my field is concerned whether the mean (median) of
>>> this variable is different than zero. So, U is the mean in case the
>>> variable is normally distributed, or U is the median in case the
>>> distribution is not normal.
>>>
>>> from my readings in statistics , I know that in order to decide
>>> whether to use parametric or non-parametric tests, the data normality
>>> distribution should be checked first.
>>>
>>>  Shapiro-Wilk is used to test normality, when the number of
>>> observations is less than 30. Otherwise, we should use
>>> Kolmogorov-Smirnov for large sample (as in my sample).
>>>
>>> So, when the test accepts the null (normality), we should use the
>>> parametric test (i.e. t-test) which examines the mean. On the other
>>> hand if the null of normality was reject, we should use the
>>> non-parametric test ( sign test) instead which examines the median (As
>>> in my case).
>>>
>>> Also,  for the comment about robust, I meant exactly what said (I used
>>> the robust term loosely)
>>>
>>> Thanks for suggesting to read again, sure I will do.
>>>
>>> Many thanks again
>>>
>>> Nahla
>>>
>>> On 17 January 2013 09:49, Nick Cox <njcoxstata@gmail.com> wrote:
>>>> Your t-test is testing a quite different hypothesis. If the two states
>>>> 0 and 1 of a binary variable have equal frequencies, then its mean is
>>>> 0.5, not 0.
>>>>
>>>> That aside, the t-test can not be more appropriate for a binary
>>>> variable than what you have done already, and this is predictable in
>>>> advance, as a distribution with two distinct states is not a normal
>>>> distribution. You do not need a Kolmogorov-Smirnov test to tell you
>>>> that.
>>>>
>>>> For the record, what I suggested is best not described as a robust
>>>> test. It was calculating a confidence interval, and I showed that for
>>>> your data the result was robust to the method of calculation, meaning
>>>> merely not sensitive. The word "robust" was used informallly.
>>>>
>>>> You never define what you mean by u, so I am not commenting on any
>>>>
>>>> I recommend that you read (or re-read) a good introductory text on
>>>> statistics, as you appear confused on some basic matters.
>>>>
>>>> Nick
>>>>
>>>> On Thu, Jan 17, 2013 at 7:52 AM, Nahla Betelmal <nahlaib@gmail.com> wrote:
>>>>
>>>>> Thank you Maarten and Nick  for the great help.
>>>>>
>>>>>  So, in this case I would reject the null in favour of the alternative
>>>>> u>0 as p value 0.000. However, using t-test on the same sample
>>>>> provided the opposite (i.e. accept the null).
>>>>>
>>>>> ttest DA_T_1 == 0
>>>>>
>>>>> One-sample t test
>>>>> ------------------------------------------------------------------------------
>>>>> Variable |     Obs        Mean    Std. Err.   Std. Dev.   [95% Conf. Interval]
>>>>> ---------+--------------------------------------------------------------------
>>>>>   DA_T_1 |     346    1.564346     1.68628    31.36663   -1.752338     4.88103
>>>>> ------------------------------------------------------------------------------
>>>>>     mean = mean(DA_T_1)                                           t =   0.9277
>>>>> Ho: mean = 0                                     degrees of freedom =      345
>>>>>
>>>>>     Ha: mean < 0                 Ha: mean != 0                 Ha: mean > 0
>>>>>  Pr(T < t) = 0.8229         Pr(|T| > |t|) = 0.3542          Pr(T > t) = 0.1771
>>>>>
>>>>>
>>>>> I think this is due to the distribution of the sample, so I performed
>>>>> K-S normality test. It shows that data is not normally distributed,
>>>>> hence I should use the non-parametric sign test instead of t-test. In
>>>>> other words I would reject the null u=0 in favor of u>0 , right?
>>>>>
>>>>>
>>>>> ksmirnov  DA_T_1 = normal((DA_T_1-DA_T_1_mu)/  DA_T_1_s)
>>>>>
>>>>> One-sample Kolmogorov-Smirnov test against theoretical distribution
>>>>>            normal((DA_T_1-DA_T_1_mu)/  DA_T_1_s)
>>>>>
>>>>>  Smaller group       D       P-value  Corrected
>>>>>  ----------------------------------------------
>>>>>  DA_T_1:             0.4878    0.000
>>>>>  Cumulative:        -0.4330    0.000
>>>>>  Combined K-S:    0.4878    0.000      0.000
>>>>>
>>>>>
>>>>> N.B. Thank you so much Nick for the robust test you mentioned, I will
>>>>> use that as well)
>>>>>
>>>>> Many thanks
>>>>>
>>>>> Nahla
>>>>>
>>>>> On 16 January 2013 09:33, Nick Cox <njcoxstata@gmail.com> wrote:
>>>>>> In addition, it could be as or more useful to think in terms of
>>>>>> confidence intervals. With this sample size and average, 0.5 lies well
>>>>>> outside 95% intervals for the probability of being positive, and that
>>>>>> is robust to method of calculation:
>>>>>>
>>>>>> . cii 346 221
>>>>>>
>>>>>>                                                          -- Binomial Exact --
>>>>>>     Variable |        Obs        Mean    Std. Err.       [95% Conf. Interval]
>>>>>> -------------+---------------------------------------------------------------
>>>>>>              |        346    .6387283    .0258248        .5856497    .6894096
>>>>>>
>>>>>> . cii 346 221, jeffreys
>>>>>>
>>>>>>                                                          ----- Jeffreys -----
>>>>>>     Variable |        Obs        Mean    Std. Err.       [95% Conf. Interval]
>>>>>> -------------+---------------------------------------------------------------
>>>>>>              |        346    .6387283    .0258248        .5871262    .6880204
>>>>>>
>>>>>> . cii 346 221, wilson
>>>>>>
>>>>>>                                                          ------ Wilson ------
>>>>>>     Variable |        Obs        Mean    Std. Err.       [95% Conf. Interval]
>>>>>> -------------+---------------------------------------------------------------
>>>>>>              |        346    .6387283    .0258248        .5868449    .6875651
>>>>>>
>>>>>> Nick
>>>>>>
>>>>>> On Wed, Jan 16, 2013 at 9:13 AM, Maarten Buis <maartenlbuis@gmail.com> wrote:
>>>>>>> On Wed, Jan 16, 2013 at 9:38 AM, Nahla Betelmal wrote:
>>>>>>>> I have generated this output using  non-parametric test "one sample
>>>>>>>> sign test" with null: U=0 , & Ua > 0
>>>>>>>>
>>>>>>>> However, I do not understand the output. where is the p-value? is it
>>>>>>>> 0.5 in all cases or the 0.000 ( as in the first and third cases) and
>>>>>>>> 1.000 as in the second case?
>>>>>>>>
>>>>>>>>. signtest DA_T_1= 0
>>>>>>>>
>>>>>>>> Sign test
>>>>>>>>
>>>>>>>>         sign |    observed    expected
>>>>>>>> -------------+------------------------
>>>>>>>>     positive |         221         173
>>>>>>>>     negative |         125         173
>>>>>>>>         zero |           0           0
>>>>>>>> -------------+------------------------
>>>>>>>>          all |         346         346
>>>>>>>>
>>>>>>>> One-sided tests:
>>>>>>>>   Ho: median of DA_T_1 = 0 vs.
>>>>>>>>   Ha: median of DA_T_1 > 0
>>>>>>>>       Pr(#positive >= 221) =
>>>>>>>>          Binomial(n = 346, x >= 221, p = 0.5) =  0.0000
>>>>>>>
>>>>>>> The p-value is the last number, so in your case 0.0000. The stuff
>>>>>>> before the p-value tells you how it is computed: it is based on the
>>>>>>> binomial distribution, and in particular it is the chance of observing
>>>>>>> 221 successes or more in 346 trials when the chance of success at each
>>>>>>> trial is .5. For this tests this chance is the p-value, and it is very
>>>>>>> small, less than 0.00005. If you type in Stata -di binomialtail(346,
>>>>>>> 221, 0.5)- you will see that this chance is 1.381e-07, i.e.
>>>>>>> 0.00000001381.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
```