Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: too many duplicates with bsample, weight()?


From   Matissa Hollister <[email protected]>
To   [email protected]
Subject   st: too many duplicates with bsample, weight()?
Date   Mon, 27 Feb 2006 13:41:30 -0800 (PST)

I've been experimenting a bit with the bootstrap
commands and there seems to be something wrong with
the bsample command when the weight option is used. 
As I understand it, the weight option is supposed draw
a sample with replacement but it does not delete the
non-selected observations from memory.  Instead, it
creates a frequency variable that indicates which
observations have been sampled and the number of times
they have been sampled (since the sampling is by
replacement, an observation can be included in the
sample more than once).  I was testing this option,
though, and it seems create duplicate observations in
the sample way too often.  I tested repeatedly taking
a sample of 10 from 3000 observations and never got a
sample consisting of all unique values!  The bsample
command without the weight option seems to work much
better.  I'm pasting an example below.  

On a related note, is there the equivalent of the
weight option for the bootstrap command? A way to
leave the full dataset in memory?  I saw the -nodrop-
option but it's not completely clear to me what it
does.

Here's the example.  Basically it takes several
bootstrap samples of 10 observations out of 1000 and
counts the number of non-duplicated observations in
each sample.


. clear

. set seed 12345

. forvalues i=1/10 {
  2.         quietly set obs 1000
  3.         gen id=_n
  4.         bsample 10
  5.         gen n=1
  6.         collapse (count) n, by(id)
  7.         count if n==1
  8.         clear
  9. }
   10
   10
   10
   10
   10
    8
   10
   10
   10
   10

. 
. clear

. quietly  set obs 1000

. gen id=_n

. quietly gen freq=.

. forvalues i=1/10 {
  2.         bsample 10, weight(freq)
  3.         count if freq==1
  4. }
    3
    4
    4
    4
    2
    4
    4
    4
    1
    2

As you can see, the weight option produces samples
that have many more duplicates.  Any idea what's going
on?  I'm running Stata 9, but I just ran the same
do-file on Stata 8 with the same results.

Matissa

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index