Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Shuffling of data in ttest bootstrapping


From   [email protected] (Jeff Pitblado, StataCorp LP)
To   [email protected]
Subject   Re: st: Shuffling of data in ttest bootstrapping
Date   Thu, 15 Jul 2004 12:24:58 -0500

Sorry for the repost; I forgot the subject line in the previous post.

Michael Malette <[email protected]> asks how -bootstrap- resamples
the data:

> I have a question about the bootstrapping command.  I'm trying to
> compare 2 means from populations with different sample sizes (n=400,
> n=3000) using a t-test.  The bootstrapping command should shuffle data
> and generate t-values after each shuffle.  This leads me to my question,
> how does stata shuffle the data.  
> 
> Does it take an equal subsample from each population (n=200 of the 400
> and n=200 of the 3000) and calculate t or does it take an overall
> subsample with an unequal number from each population (n=100 of 400 and
> n=300 of 3000)? 
> 
> This is the syntax that we are using: 
> program define TTestBoot      
> version 8.2      
> args AHI FWIN      
> ttest AHI == FWIN, unpaired
> end      
> 
> use "H:\ahifwin.dta", clear 
> bootstrap "TTestBoot AHI FWIN" T=r(t), reps(1000) saving
> ("H:\work\test.dta")  

To get bootstrap to sample independently between two (or more) groups, use the
-strata()- option.  It seems that Michael's dataset is in wide format, so
Michael will have to use -reshape- and alittle data management to get a
dataset that will work with -bootstrap- and the -strata()- option.  Here is a
short example:

***** BEGIN: example.do
clear

// some example data in wide format, notice that the groups are unbalanced
input x y
3 6
13 3
12 9
13 15
14 18
3 10
13 9
2 18
12 2
18 14
. 15
. 14
. 5
. 2
. 12
. 8
. 17
. 3
. 12
. 3
end

// call to -ttest- using originally shaped data
ttest x == y, unpaired

// rename the variables so they can be used with -reshape-
rename x x1
rename y x2
gen obsid = _n

// use reshape to stack the values in x1 and x2 into a new variable x
reshape long x, i(obsid) j(group)
// drop the missing values that made the original data unbalanced
drop if missing(x)

// two sample -ttest- using the stacked data, verify that the results are the
// same as the above -ttest- results
ttest x, by(group)

// bootstrap the t-statistic from the two sample -ttest-
bootstrap "ttest x, by(group)" T=r(t), reps(100) sav(test.dta) replace dots

exit
***** END: example.do

--Jeff
[email protected]
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index