
From  "Michael Blasnik" <michael.blasnik@verizon.net> 
To  <statalist@hsphsun2.harvard.edu> 
Subject  st: Re: Construct Null Datasets through Bootstrap Resampling 
Date  Thu, 30 Nov 2006 14:54:22 0500 
Dear Statalist users,
I am trying to construct null data sets through bootstrap resampling, to be able to account for multiple testing in genetic analyses. I would like to sample my genotypes and phenotypes randomly with replacement (without keeping linked them together as the original observations in my dataset), and then run regressions on these samples to evaluate a distribution of minimum probability values. Thereby, I will obtain empirical pvalues by comparing the nominal pvalues with the distribution of probability from the null data sets. I have seen this implemented in SAS, but I hope that it could be done also in STATA.
In my ?trailanderror? approach, I have come so far, that I have learned how to use bootstrap sampling to get a new dataset with pvalues from a set of regressions. However, these simulations are still using my original observations (although creating new samples), while I would like the observations to be randomly created from the available variables (not keeping them together as in the original dataset). Below is the code for what I have done so far. Variables linj001linj004 are the genotypes, thus the important independent variables; the other variables are covariates to adjust for; stset and all definitions, etc are done above. In reality, I will have much more regressions to include in the simulation, but this is just for learning how to do it.

*Program with the commands to be run in all bootstrap samples*
capture program drop myboot
program myboot, rclass
stcox linj001 whoht70 adadiab70 ami70 vit70 z972 z290 z085 zekg_lvh
return scalar p1 = 2*(1normal(abs(b[linj001]/_se[linj001])))
stcox linj002 whoht70 adadiab70 ami70 vit70 z972 z290 z085 zekg_lvh
return scalar p2 = 2*(1normal(abs(b[linj002]/_se[linj002])))
stcox linj003 whoht70 adadiab70 ami70 vit70 z972 z290 z085 zekg_lvh
return scalar p3 = 2*(1normal(abs(b[linj003]/_se[linj003])))
stcox linj004 whoht70 adadiab70 ami70 vit70 z972 z290 z085 zekg_lvh
return scalar p4 = 2*(1normal(abs(b[linj004]/_se[linj004])))
end
*Run the program in the original sample*
myboot
ret list
*Bootstrapping in 10000 samples*
bootstrap "myboot" p1=r(p1) p2=r(p2) p3=r(p3) p4=r(p4), reps(10000) saving(C:\bootstrapsample) replace

This leaves me with a dataset (C:\bootstrapsample) which consists of the pvalues from the 4 regressions derived from 10000 simulations. However, this is not exactly what I need, since the variables are still ?connected? in the original observations (and then randomly chosen for my simulated sets). I would like to get simulations with all variables scrambled, so that new observations with all variables scrambled are created in a number of bootstrap simulations, and then used for regressions. The present macro can give me 10000 simulated pvalues for the regressions, based on samples with replacement, but these simulations are reusing the actual 2000 observations from the original dataset. Now I would like to create a ?null dataset?, in which I instead of sampling from the observations in the real dataset, I would like Stata to randomly ?make up? observations from the existing variables and values, so that I have 2000 fake observations (with random selection of all variables) to base the regressions on, in 10000 simulations.
I have read the help files, manual, Statalist, searched at Internet, even asked Technical Support (which helped me to come this far, but not the last part). I am using Stata 8.2 for Windows. Is there a way to do this? Did I explain what I want to do properly? Is there anyone who can help me with this?
Thanks a lot in advance,
Erik Ingelsson

Erik Ingelsson, MD, PhD
Current affiliation (until June 30, 2007):
Framingham Heart Study
73 Mt. Wayte Avenue, Suite 2
Framingham, MA 017025827
Phone: 5089353453
Fax: 5086261262
Cell: 5082028493
Permanent affiliation:
Uppsala University, Department of Public Health and Caring Sciences, Uppsala
Science Park, SE751 85 Uppsala, SWEDEN.
Fax: +4618611 79 76
Email: erik.ingelsson@pubcare.uu.se

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
* * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/
© Copyright 1996–2015 StataCorp LP  Terms of use  Privacy  Contact us  What's new  Site index 