# R: st: permutations

 From "Carlo Lazzaro" To Subject R: st: permutations Date Fri, 14 Mar 2008 16:07:33 +0100

```Dear Chris, as quoted by Dave, just two (hopefully useful) references for
Good's textbooks:

Phillip Good. Permutation, Parametric, and Bootstrap Tests of Hypotheses,
3rd Edition. Springer, 2005

Phillip I. Good. Resampling Methods: A Practical Guide to Data Analysis, 3rd
Edition. Birkhäuser, 2006

I would also take a look at Chapter 15 of Efron B, Tibshirani RJ. An
Introduction to the Bootstrap. New York: Chapman & Hall, 1993.

Kind Regards,

Carlo

-----Messaggio originale-----
Da: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] Per conto di David Airey
Inviato: venerdì 14 marzo 2008 15.27
A: statalist@hsphsun2.harvard.edu
Oggetto: Re: st: permutations

.

Here's an example using Stata 9/10 syntax. With permutations, you have
to be careful about how the data are permuted. Sometimes they have to
be permuted in a way that respects the structure (strata) of the data.
Read texts by Good to familiarize with these issues. Here I'm ignoring
any issues like that.

Stata makes the general permutations very easy to do. You first define
a program to grab a statistic associated with the test you want, and
then you use that program in a permute command.

***do_begin***

describe
table age caco snp

program plogistic, rclass
version 10
args caco snp age
xi: logistic `caco' i.`snp' i.`age'
test _Isnp_1 _Isnp_2
return scalar chi2 = r(chi2)
end

set seed 3132008

permute caco chi2=r(chi2), reps(1000): plogistic caco snp age

***do_end***

results pasted:

. describe

Contains data from js_person_1.dta
obs:         2,394
vars:             3                          10 Dec 2007 10:56
size:        40,698 (99.7% of memory free)
----------------------------------------------------------------------------
----
storage  display     value
variable name   type   format      label      variable label
----------------------------------------------------------------------------
----
caco            byte   %8.0g       caco       Group
age             float  %9.0g                  RECODE of age_group
snp             float  %9.0g                  C
----------------------------------------------------------------------------
----
Sorted by:
Note:  dataset has changed since last saved

. table age caco snp

--------------------------------------------------------------------
|                       C and Group
RECODE of | ------- 0 ------    ------- 1 ------    ------- 2 ------
age_group | Control     Case    Control     Case    Control     Case
----------+---------------------------------------------------------
35 |      39       24         90       75         90       57
40 |      37       48        103      119         92      103
45 |      30       42        109      111         80       88
50 |      21       33         70       61         42       61
55 |      22       12         65       58         43       31
60 |      29       21         52       49         59       50
--------------------------------------------------------------------

.
. program plogistic, rclass
1.         version 10
2.         args caco snp age
3.         xi: logistic `caco' i.`snp' i.`age'
4.         test _Isnp_1 _Isnp_2
5.         return scalar chi2 = r(chi2)
6. end

.
. set seed 3132008

.
. permute caco chi2=r(chi2), reps(1000): plogistic caco snp age
(running plogistic on estimation sample)

Permutation replications (1000)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
..................................................    50
..................................................   100
..................................................   150
..................................................   200
..................................................   250
..................................................   300
..................................................   350
..................................................   400
..................................................   450
..................................................   500
..................................................   550
..................................................   600
..................................................   650
..................................................   700
..................................................   750
..................................................   800
..................................................   850
..................................................   900
..................................................   950
..................................................  1000

Monte Carlo permutation results                   Number of obs
=       2394

command:  plogistic caco snp age
chi2:  r(chi2)
permute var:  caco

----------------------------------------------------------------------------
--
T            |     T(obs)       c       n   p=c/n   SE(p) [95% Conf.
Interval]
-------------
+----------------------------------------------------------------
chi2 |   .1560796     920    1000  0.9200  0.0086  .
9014202    .936058
----------------------------------------------------------------------------
--
Note:  confidence interval is with respect to p=c/n.
Note:  c = #{|T| >= |T(obs)|}

.
.
.
end of do-file

The p values for the genotype test was 0.9249, not much different.

. xi: logistic caco i.snp i.age
i.snp             _Isnp_0-2           (naturally coded; _Isnp_0 omitted)
i.age             _Iage_35-60         (naturally coded; _Iage_35
omitted)

Logistic regression                               Number of obs
=       2116
LR chi2(7)
=      21.17
Prob > chi2
=     0.0035
Log likelihood = -1455.9021                       Pseudo R2
=     0.0072

----------------------------------------------------------------------------
--
caco | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf.
Interval]
-------------
+----------------------------------------------------------------
_Isnp_1 |   .9580097   .1193996    -0.34   0.731     .7503816
1.223088
_Isnp_2 |   .9524632   .1218649    -0.38   0.703     .7412068
1.223931
_Iage_40 |    1.63372   .2251451     3.56   0.000
1.247017     2.14034
_Iage_45 |   1.545441    .216941     3.10   0.002     1.173722
2.034885
_Iage_50 |   1.634428   .2582141     3.11   0.002     1.199194
2.227627
_Iage_55 |    1.09121    .184666     0.52   0.606     .7831755
1.520399
_Iage_60 |   1.202247   .1956237     1.13   0.258     .8739581
1.653854
----------------------------------------------------------------------------
--

. test _Isnp_1 _Isnp_2

( 1)  _Isnp_1 = 0
( 2)  _Isnp_2 = 0

chi2(  2) =    0.16
Prob > chi2 =    0.9249

On Mar 14, 2008, at 5:44 AM, Christopher Intemann wrote:

> Hi Dave
> Am 13.03.2008 um 15:24 schrieb David Airey:
>
>> .
>>
>> Do you have access to the manual help or online help? The syntax is:
>>
>> permute permvar exp_list [, options] : command
>>
>> You specified permvar but not exp_list, so I'm not sure what
>> permute is defaulting to, maybe just the overall model statistic?
>> You can write a small program to return the coefficients for the
>> genetic dummy variables, and an overall test of those, that gets
>> called by permute. The ANOVA example in the printed help shows how
>> to do this.
>>
>> T is your statistic
>> T(obs) is your observed statistic
>> c is the number of times the statistics is greater than T(obs)
>> n is the default number of permutations since you didn't specify
>> this either
>>
>> Below is an example from ATS UCLA (Stata version 9/10):
>>
>> * read your data into Stata
>> use http://www.ats.ucla.edu/stat/stata/notes/hsb2, clear
>> generate goodread = (read > 60)
>>
>> * run logistic predicting "goodread" from "write"
>> logit goodread female
>>
>> * run permutation test for above with 10000 repetitions
>> permute female x2=e(chi2), reps(10000) nodots: logit goodread female
>>
>> Since you want a statistic for the individual dummy variables and
>> the test for both genetic dummy variables, you don't just want the
>> results from the overall test, and you cannot just collect that
>> using permute as the above example does.
>>
>>
>
> I was using a pre-stata9 syntax of the permute command.
>
> However, a command like
>
> xi:permute affected_stata x2=e(chi2):logistic affected_stata i.gene
> age ethno_x sex_x
>
> does calculate some permutations, but is pretty sure not what I
> want:-(
>
> As i could calculate my p-values from the matrices e(b) and e(V),
> which are generated by logistic, i would  rather like to use these
> in the exp_list. However, this does not work:
> "matrix operators that return matrices not allowed in this context"
>
> Is there any other way to permute or calculate the p-values?
> Just similar to what other programs for genetic analysis (like
> unphased, plink or haploview) do?
> I just came along cvpermute, maybe this could do the trick?
> Many thanks in advance,
>   Chris
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

--
David C. Airey, Ph.D.
Pharmacology Research Assistant Professor
Center for Human Genetics Research Member

Department of Pharmacology
School of Medicine
Vanderbilt University
Rm 8158A Bldg MR3
465 21st Avenue South
Nashville, TN 37232-8548

TEL   (615) 936-1510
FAX   (615) 936-3747
EMAIL david.airey@vanderbilt.edu
URL   http://people.vanderbilt.edu/~david.c.airey/dca_cv.pdf
URL   http://www.vanderbilt.edu/pharmacology

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```