»  Home »  Products »  Stata 17 »  Nonparametric tests for trend

# Nonparametric tests for trend

## Highlights

• ### nptrend performs four nonparametric tests for trend

• Cochran–Armitage test
• Jonckheere–Terpstra test
• linear-by-linear test
• Cuzick's test with ranks
• ### option for exact p-values

Trend tests involve responses in ordered groups. They test whether response values tend to either increase or decrease across groups.

Trend tests are typically used when there is only a small amount of data and no covariates to control for, and a test yielding a p-value valid in small samples is desired. nptrend has an option to compute exact p-values based on Monte Carlo permutations or a full enumeration of the permutation distribution (the latter practical only for extremely small samples).

nptrend performs four different tests for trend:

• the Cochran–Armitage test,
• the Jonckheere–Terpstra test,
• the linear-by-linear trend test, and
• a test using ranks developed by Cuzick.

To calculate the Cochran–Armitage statistic for trend, you type

. nptrend relief, group(dose) carmitage

## Let's see it work

For the Cochran–Armitage test (when the response is 0/1), linear-by-linear trend test, and Cuzick's test, the groups have scores as well. It tests the trend in the proportions of positive responses across the groups.

Here we have fictional data from a clinical trial of a new drug for treating migraines. The variable dose contains the dose of the drug given to a subject. The variable relief is 0/1, with 0 indicating no relief and 1 partial or total relief.

Here is a tabulation of the data:

. webuse migraine
(Fictional migraine drug data)

. tabulate dose relief, row nokey

Relief of migraine
Mycureit       after 2 hours
dose in mg           0          1       Total

10          80        120         200
40.00      60.00      100.00

20          92        108         200
46.00      54.00      100.00

30          83        117         200
41.50      58.50      100.00

40          63        137         200
31.50      68.50      100.00

Total         318        482         800
39.75      60.25      100.00



We will test whether there is a trend by dose in the proportion of subjects reporting relief.

. nptrend relief, group(dose) carmitage

Cochran–Armitage test for trend

Number of observations =      800
Number of groups =        4
Number of response levels =        2

Mean
response        Number
Group     Group score         score        of obs

dose
10               10            .6           200
20               20           .54           200
30               30          .585           200
40               40          .685           200

Statistic =     .003
Std. err. = .0015476
z =    1.939
Prob > |z| =   0.0526

Test of departure from trend:
chi2(2) =     5.45
Prob > chi2 =   0.0656


nptrend first displays a table of the mean response score by group. The mean response score in this case is simply the proportion of subjects in the group reporting relief.

The Cochran–Armitage $$z$$ statistic tests for a linear trend. A $$\chi^2$$ statistic that tests for departure from a linear trend is also calculated.

When either the $$z$$ statistic for linear trend or the $$\chi^2$$ statistic for departure from linear trend is large, it means that the test for independence between response and group is rejected. $$z$$ being large means there is a linear trend that rejects independence. $$\chi^2$$ being large means there are differences other than the linear trend that reject independence.

In the example above, the linear test for trend gave a p-value of 0.0526, not quite reaching significance at the 0.05 level. The test of departure from trend gave a p-value of 0.0656, meaning there is weak evidence, not reaching significance, for a nonlinear association between dose and relief.

Trends other than linear can also be tested using the scoregroup() option. For this example, specifying scoregroup(1 4 9 16) would test a quadratic trend in dose.

The Cochran–Armitage test requires that responses be 0/1 or else the group indicator be 0/1. The other trend tests computed by nptrend have no restriction on the response; the response variable can have any value.

Here's an example with the responses being ocular exposure to ultraviolet radiation for 32 pairs of sunglasses. Sunglasses are classified into 3 groups according to the amount of visible light transmitted. We list some of the data:

. webuse sg

. list in 1/12, separator(6)

group   exposure

1.       < 25%        1.4
2.       < 25%        1.4
3.       < 25%        1.4
4.       < 25%        1.6
5.       < 25%        2.3
6.       < 25%        2.5

7.  25% to 35%         .9
8.  25% to 35%          1
9.  25% to 35%        1.1
10.  25% to 35%        1.1
11.  25% to 35%        1.2
12.  25% to 35%        1.2



The Jonckheere–Terpstra test is useful when it is not clear what the trend might be and we simply want to test for any trend. It tests whether the ordering of the responses is associated with the ordering of the groups.

To compute the Jonckheere–Terpstra test, we specify the option jterpstra.

. nptrend exposure, group(group) jterpstra

Jonckheere–Terpstra test for trend

Number of observations =       32
Number of groups =        3
Number of response levels =       23

Mean
response        Number
Group     Group score         score        of obs

group
< 25%                1      1.766667             6
25% to 35%                2      2.311111            18
> 35%                3          4.85             8

Statistic =       82
Std. err. = 54.80056
z =    1.496
Prob > |z| =   0.1346


We see that the mean response score increases as the group indicator increases, but the p-value from the Jonckheere–Terpstra test is 0.1346, not reaching significance at the 0.05 level.

Because the Jonckheere–Terpstra statistic tests for any type of trend in responses across ordered groups, it will not be as powerful as a test that accurately hypothesizes the true trend. The linear-by-linear trend test allows you to do just this. The linear-by-linear trend test uses the numeric values of the responses to specify the trend being tested. How the trend is hypothesized to vary across groups is specified by the numeric values of the group variable.

The linear-by-linear statistic is equivalent to the Pearson correlation coefficient, the difference being that the Pearson correlation coefficient is standardized by the standard deviations of the scores. The p-values are slightly different because the p-value for the linear-by-linear test is based on its permutation distribution while the p-value for the Pearson correlation coefficient assumes normality.

To compute the linear-by-linear test, we specify the option linear. We also specify notable to suppress the display of the mean response scores by group.

. nptrend exposure, group(group) linear notable

Linear-by-linear test for trend

Number of observations =       32
Number of groups =        3
Number of response levels =       23

Statistic = .7035156
Std. err. = .3063377
z =    2.297
Prob > |z| =   0.0216


The p-value from the linear-by-linear test is 0.0216, which is considerably different from the p-value computed by the Jonckheere–Terpstra test, which was 0.1346. This is not surprising because the linear-by-linear test assumes a specific trend based on numerical values, whereas the Jonckheere–Terpstra statistic tests for any trend.

The fourth trend test computed by nptrend is a test based on ranks developed by Cuzick.

. nptrend exposure, group(group) cuzick

. nptrend exposure, group(group) cuzick notable

Cuzick's test with rank scores

Number of observations =       32
Number of groups =        3
Number of response levels =       23

Statistic =  1.65625
Std. err. = 1.090461
z =    1.519
Prob > |z| =   0.1288


In this case, it produces a p-value that is similar to the p-value from the Jonckheere–Terpstra test.

## Exact p-values

nptrend will also compute exact p-values using Monte Carlo permutations when the exact option is specified. Here we compute the exact p-value for the Jonckheere–Terpstra test.

. nptrend exposure, group(group) jterpstra notable exact

Permutations (10,000): ..........1,000..........2,000..........3,000..........4,000..........5,00
> 0..........6,000..........7,000..........8,000..........9,000..........10,000 done

Jonckheere–Terpstra test for trend

Number of observations =       32
Number of groups =        3
Number of response levels =       23

Statistic =       82
Std. err. = 54.80056
z =    1.496
Prob > |z| =   0.1346
Exact prob =   0.1510 (10,000 Monte Carlo permutations)


By default, 10,000 Monte Carlo permutations are used. This gave an exact p-value of 0.1510, differently slightly from the p-value of 0.1346, computed using a normal approximation.

Monte Carlo permutations give results with random error, so for more precision, more permutations can be computed. Below, we use 100,000 permutations, and have a dot displayed every 1,000th permutation to monitor the progress. We specify a random-number seed so we can duplicate the results and the option show, which displays a detailed table of the Monte Carlo results.

. nptrend exposure, group(group) jterpstra notable ///
> exact(montecarlo, reps(100000) dots(1000) rseed(1234) show)

Permutations (100,000): ..........10,000..........20,000..........30,000..........40,000....
> ......50,000..........60,000..........70,000..........80,000..........90,000..........100,
> 000 done

Monte Carlo permutation results                Number of observations =      32
Permutation variable: group                    Number of permutations = 100,000

Monte Carlo error

T      T(obs)       Test       c       n      p  SE(p)   [95% CI(p)]

_pm_1          82      lower   93358  100000  .9336  .0008  .9320  .9351
upper    6874  100000  .0687  .0008  .0672  .0703
two-sided                  .1375  .0011  .1353  .1396

Notes: For lower one-sided test, c = #{T <= T(obs)} and p = p_lower = c/n.
For upper one-sided test, c = #{T >= T(obs)} and p = p_upper = c/n.
For two-sided test, p = 2*min(p_lower, p_upper); SE and CI approximate.

Jonckheere–Terpstra test for trend

Number of observations =       32
Number of groups =        3
Number of response levels =       23

Statistic =       82
Std. err. = 54.80056
z =    1.496
Prob > |z| =   0.1346
Exact prob =   0.1375 (100,000 Monte Carlo permutations)


The exact p-value from the Monte Carlo computation is 0.1375, close to the approximate p-value of 0.1346. From the detailed table of the results, we see that the 95% confidence interval for the Monte Carlo p-value is [0.1353, 0.1396], which does not include the approximate p-value.

This example has only 32 observations. Should we wish to publish the results, we would likely want to run nptrend again, specifying 1,000,000 or more permutations to reduce the Monte Carlo error further. Permutations are generated using a fast algorithm, and the computation is not time-consuming.

For extremely small datasets, the exact(enumerate) option can be used to fully enumerate the permutation distribution. It gives an exact p-value without any Monte Carlo error.