Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Illustrate SRS in a graph


From   Maarten buis <maartenbuis@yahoo.co.uk>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Illustrate SRS in a graph
Date   Fri, 17 Sep 2010 08:08:44 +0000 (GMT)

--- On Thu, 16/9/10, Richard Moverare wrote:
> I would like to illustrate the uncertainty of a SRS
> (without replacement) by first creating a dataset with
> one variable that identifies a number of different
> groups in the population (N), e.g. 415 units in group A,
> 634 units in group B, on so forth. Then I would like to
> draw a number of samples from that population, e.g. 20
> different samples and get estimates for the proportion of
> the population belonging to group A, B, ..., and the
> confidence interval (95 percent) for those estimates. And
> finally I would like to, in a graph, illustrate the true
> population proportion and the 20 different samples with
> their confidence intervals. This in order to illustrate
> the uncertainty but also that the confidence interval
> sometimes do not include the true population value.

As I understand Simple Random Sampling, it would be sampling
with replacement (but if the population is large compared
to the sample that should not matter too much).

For such an excercise I would use the -simulate- command,
like in the example below. I recovered the confidence 
intervals as discussed in (Buis 2007).

*------------------- begin example --------------------
program drop _all
program define sim, rclass

    // create population
    drop _all
    set obs 10000
    gen x = cond(_n <=  500, 1, ///
            cond(_n <= 5000, 2, 3))

    // draw a 1% sample without replacement
    sample 1

    // estimate the proportions and return the results
    proportion x
    return scalar p  = _b[x:1]
    return scalar lb = _b[x:1] - invttail(e(df_r),0.025)*_se[x:1]
    return scalar ub = _b[x:1] + invttail(e(df_r),0.025)*_se[x:1]
end

// repeat this 20 times and store the results in a dataset
simulate p=r(p) lb=r(lb) ub=r(ub), reps(20) : sim

//graph the results
gen sample = _n
twoway scatter sample p || ///
       rcap lb ub sample, horizontal xline(.05)
*-------------------- end example --------------------------------
(For more on examples I sent to the Statalist see: 
http://www.maartenbuis.nl/example_faq )

Hope this helps,
Maarten

M.L. Buis (2007), "Stata tip 54: Where did my p-values go?", 
The Stata Journal, 7(4), pp.584-586. 


--------------------------
Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany

http://www.maartenbuis.nl
--------------------------


      

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index