|  |  | 
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: Re: Construct Null Datasets through Bootstrap Resampling
| From | Erik Ingelsson <[email protected]> | 
| To | [email protected] | 
| Subject | Re: st: Re: Construct Null Datasets through Bootstrap Resampling | 
| Date | Thu, 30 Nov 2006 23:32:27 -0500 | 
Thanks Michael,
Yes, it was this I wanted. And it worked just fine in my do-file as  
well. During the process I have realized that I might perhaps also  
need to keep some variables together in groups (so that it is not  
totally scrambled, but randomly resorted but keeping two groups of  
variables together). For this Austin Nichols suggestion seems to work,  
so I think I have a solution for that as well.
Many thanks,
Erik Ingelsson
Quoting Michael Blasnik <[email protected]>:
It seems like you want something similar to -permute- except you want
to permute all of the variables, not just one.  Here's a short program
that will scramble all of the variables in your dataset.  You can usel
this within your own program and perform a monte carlo simulation
(check out -simulate-).
program define scramble
version 9.2
syntax [varlist]
tempvar hold order rand
gen long `order'=_n
foreach var of local varlist {
gen `rand'=uniform()
gen `hold'=`var'
sort `rand'
replace `var'=`hold'[`order']
drop `rand' `hold'
}
end
Michael Blasnik
----- Original Message ----- From: "Erik Ingelsson"
<[email protected]>
To: <[email protected]>
Sent: Thursday, November 30, 2006 1:31 PM
Subject: st: Construct Null Datasets through Bootstrap Resampling
Dear Statalist users,
I am trying to construct null data sets through bootstrap   
resampling,  to be able to account for multiple testing in genetic   
analyses. I  would like to sample my genotypes and phenotypes   
randomly with  replacement (without keeping linked them together as  
 the original  observations in my dataset), and then run  
regressions  on these samples to evaluate a distribution of minimum  
probability  values. Thereby, I  will obtain empirical p-values by  
comparing the  nominal p-values with  the distribution of  
probability from the  null data sets. I have seen  this implemented  
in SAS, but I hope  that it could be done also in STATA.
In my ?trail-and-error? approach, I have come so far, that I have    
learned how to use bootstrap sampling to get a new dataset with    
p-values from a set of regressions. However, these simulations are   
 still using my original observations (although creating new   
samples),  while I would like the observations to be randomly   
created from the  available variables (not keeping them together as  
 in the original  dataset). Below is the code for what I have done   
so far. Variables  linj001-linj004 are the genotypes, thus the   
important independent  variables; the other variables are   
covariates to adjust for; stset and all definitions, etc are done   
above. In reality, I will have much more  regressions to include in  
 the simulation, but this is just for  learning how to do it.
---
*Program with the commands to be run in all bootstrap samples*
capture program drop myboot
program myboot, rclass
    stcox linj001 whoht70 adadiab70 ami70 vit70 z972 z290 z085 zekg_lvh
    return scalar p1 = 2*(1-normal(abs(b[linj001]/_se[linj001])))
    stcox linj002 whoht70 adadiab70 ami70 vit70 z972 z290 z085 zekg_lvh
    return scalar p2 = 2*(1-normal(abs(b[linj002]/_se[linj002])))
    stcox linj003 whoht70 adadiab70 ami70 vit70 z972 z290 z085 zekg_lvh
    return scalar p3 = 2*(1-normal(abs(b[linj003]/_se[linj003])))
    stcox linj004 whoht70 adadiab70 ami70 vit70 z972 z290 z085 zekg_lvh
    return scalar p4 = 2*(1-normal(abs(b[linj004]/_se[linj004])))
end
*Run the program in the original sample*
myboot
ret list
*Bootstrapping in 10000 samples*
bootstrap "myboot" p1=r(p1) p2=r(p2) p3=r(p3) p4=r(p4), reps(10000)  
 saving(C:\bootstrapsample) replace
---
This leaves me with a dataset (C:\bootstrapsample) which consists   
of  the p-values from the 4 regressions derived from 10000   
simulations.  However, this is not exactly what I need, since the   
variables are  still ?connected? in the original observations (and   
then randomly  chosen for my simulated sets). I would like to get   
simulations with  all variables scrambled, so that new observations  
 with all variables  scrambled are created in a number of bootstrap  
 simulations, and then  used for regressions. The present macro can  
 give me 10000 simulated  p-values for the regressions, based on   
samples with replacement, but  these simulations are reusing the   
actual 2000 observations from the  original dataset. Now I would   
like to create a ?null dataset?, in  which I instead of sampling   
from the observations in the real dataset, I would like Stata to   
randomly ?make up? observations from the  existing variables and   
values, so that I have 2000 fake observations  (with random   
selection of all variables) to base the regressions on,  in 10000   
simulations.
I have read the help files, manual, Statalist, searched at   
Internet,  even asked Technical Support (which helped me to come   
this far, but  not the last part). I am using Stata 8.2 for   
Windows. Is there a way  to do this? Did I explain what I want to   
do properly? Is there anyone  who can help me with this?
Thanks a lot in advance,
Erik Ingelsson
---
Erik Ingelsson, MD, PhD
Current affiliation (until June 30, 2007):
Framingham Heart Study
73 Mt. Wayte Avenue, Suite 2
Framingham, MA 01702-5827
Phone: 508-935-3453
Fax: 508-626-1262
Cell: 508-202-8493
Permanent affiliation:
Uppsala University, Department of Public Health and Caring Sciences, Uppsala
Science Park, SE-751 85 Uppsala, SWEDEN.
Fax: +46-18-611 79 76
E-mail: [email protected]
---
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
---
Erik Ingelsson, MD, PhD
Current affiliation (until June 30, 2007):
Framingham Heart Study
73 Mt. Wayte Avenue, Suite 2
Framingham, MA 01702-5827
Phone: 508-935-3453
Fax: 508-626-1262
Cell: 508-202-8493
Permanent affiliation:
Uppsala University, Department of Public Health and Caring Sciences, Uppsala
Science Park, SE-751 85 Uppsala, SWEDEN.
Fax: +46-18-611 79 76
E-mail: [email protected]
---
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/