help sample dialog: sample
-------------------------------------------------------------------------------
Title
[D] sample -- Draw random sample
Syntax
sample # [if] [in] [, count by(groupvars)]
by is allowed; see [D] by.
Menu
Statistics > Resampling > Draw random sample
Description
sample draws random samples of the data in memory. "Sampling" here is
defined as drawing observations without replacement; see [R] bsample for
sampling with replacement.
The size of the sample to be drawn can be specified as a percentage or as
a count:
sample without the count option draws a #% pseudorandom sample of the
data in memory, thus discarding (100 - #)% of the observations.
sample with the count option draws a #-observation pseudorandom
sample of the data in memory, thus discarding _N - # observations. #
can be larger than _N, in which case all observations are kept.
In either case, observations not meeting the optional if and in criteria
are kept (sampled at 100%).
If you are interested in reproducing results, you must first set the
random-number seed; see [R] set seed.
Options
count specifies that # in sample # be interpreted as an observation count
rather than as a percentage. Typing sample 5 without the count
option means that a 5% sample be drawn; typing sample 5, count,
however, would draw a sample of 5 observations.
Specifying # as greater than the number of observations in the
dataset is not considered an error.
by(groupvars) specifies that a #% sample be drawn within each set of
values of groupvars, thus maintaining the proportion of each group.
count may be combined with by(). For example, typing
sample 50, count by(sex) would draw a sample of size 50 for men and
50 for women.
Specifying by varlist: sample # is equivalent to specifying
sample #, by(varlist); use whichever syntax you prefer.
Examples
---------------------------------------------------------------------------
Setup
. webuse nlswork
Describe the data
. describe, short
Draw a 10% sample
. sample 10
Describe the resulting data
. describe, short
---------------------------------------------------------------------------
Setup
. webuse nlswork, clear
Create a one-way table of frequency counts
. tab race
Keep 100% of race != 1 women, but only 10% of race = 1 women
. sample 10 if race == 1
---------------------------------------------------------------------------
Setup
. webuse nlswork, clear
Keep 10% of each of the three categories of race
. sample 10, by(race)
---------------------------------------------------------------------------
Setup
. webuse nlswork, clear
Draw a sample of 2,500
. sample 2500, count
Describe the resulting data
. describe, short
---------------------------------------------------------------------------
Also see
Manual: [D] sample
Help: [R] bsample