Stata 15 help for sample

[D] sample -- Draw random sample


sample # [if] [in] [, count by(groupvars)]

by is allowed; see [D] by.


Statistics > Resampling > Draw random sample


sample draws random samples of the data in memory. "Sampling" here is defined as drawing observations without replacement; see [R] bsample for sampling with replacement.

The size of the sample to be drawn can be specified as a percentage or as a count:

sample without the count option draws a #% pseudorandom sample of the data in memory, thus discarding (100 - #)% of the observations.

sample with the count option draws a #-observation pseudorandom sample of the data in memory, thus discarding _N - # observations. # can be larger than _N, in which case all observations are kept.

In either case, observations not meeting the optional if and in criteria are kept (sampled at 100%).

If you are interested in reproducing results, you must first set the random-number seed; see [R] set seed.


count specifies that # in sample # be interpreted as an observation count rather than as a percentage. Typing sample 5 without the count option means that a 5% sample be drawn; typing sample 5, count, however, would draw a sample of 5 observations.

Specifying # as greater than the number of observations in the dataset is not considered an error.

by(groupvars) specifies that a #% sample be drawn within each set of values of groupvars, thus maintaining the proportion of each group.

count may be combined with by(). For example, typing sample 50, count by(sex) would draw a sample of size 50 for men and 50 for women.

Specifying by varlist: sample # is equivalent to specifying sample #, by(varlist); use whichever syntax you prefer.


--------------------------------------------------------------------------- Setup . webuse nlswork

Describe the data . describe, short

Draw a 10% sample . sample 10

Describe the resulting data . describe, short

--------------------------------------------------------------------------- Setup . webuse nlswork, clear

Create a one-way table of frequency counts . tab race

Keep 100% of race != 1 women, but only 10% of race = 1 women . sample 10 if race == 1

--------------------------------------------------------------------------- Setup . webuse nlswork, clear

Keep 10% of each of the three categories of race . sample 10, by(race)

--------------------------------------------------------------------------- Setup . webuse nlswork, clear

Draw a sample of 2,500 . sample 2500, count

Describe the resulting data . describe, short ---------------------------------------------------------------------------

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index