**[D] sample** -- Draw random sample

__Syntax__

**sample** *#* [*if*] [*in*] [**,** __c__**ount** **by(***groupvars***)**]

**by** is allowed; see **[D] by**.

__Menu__

**Statistics > Resampling > Draw random sample**

__Description__

**sample** draws random samples of the data in memory. "Sampling" here is
defined as drawing observations without replacement; see **[R] bsample** for
sampling with replacement.

The size of the sample to be drawn can be specified as a percentage or as
a count:

**sample** without the **count** option draws a *#*% pseudorandom sample of the
data in memory, thus discarding (100 - *#*)% of the observations.

**sample** with the **count** option draws a *#*-observation pseudorandom
sample of the data in memory, thus discarding **_N** - *#* observations. *#*
can be larger than _N, in which case all observations are kept.

In either case, observations not meeting the optional **if** and **in** criteria
are kept (sampled at 100%).

If you are interested in reproducing results, you must first set the
random-number seed; see **[R] set seed**.

__Options__

**count** specifies that *#* in **sample** *#* be interpreted as an observation count
rather than as a percentage. Typing **sample 5** without the **count**
option means that a 5% sample be drawn; typing **sample 5, count**,
however, would draw a sample of 5 observations.

Specifying *#* as greater than the number of observations in the
dataset is not considered an error.

**by(***groupvars***)** specifies that a *#*% sample be drawn within each set of
values of *groupvars*, thus maintaining the proportion of each group.

**count** may be combined with **by()**. For example, typing
**sample 50, count by(sex)** would draw a sample of size 50 for men and
50 for women.

Specifying **by** *varlist***:** **sample** *#* is equivalent to specifying
**sample** *#***,** **by(***varlist***)**; use whichever syntax you prefer.

__Examples__

Setup
**. webuse nlswork**

Describe the data
**. describe, short**

Draw a 10% sample
**. sample 10**

Describe the resulting data
**. describe, short**

Setup
**. webuse nlswork, clear**

Create a one-way table of frequency counts
**. tab race**

Keep 100% of **race** != 1 women, but only 10% of **race** = 1 women
**. sample 10 if race == 1**

Setup
**. webuse nlswork, clear**

Keep 10% of each of the three categories of **race**
**. sample 10, by(race)**

Setup
**. webuse nlswork, clear**

Draw a sample of 2,500
**. sample 2500, count**

Describe the resulting data
**. describe, short**
