**[G-2] graph twoway histogram** -- Histogram plots

__Syntax__

__tw__**oway** __hist__**ogram** *varname* [*if*] [*in*] [*weight*] [**,**
[*discrete_options*|*continuous_options*] *common_options*]

*discrete_options* Description
-------------------------------------------------------------------------
__disc__**rete** specify that data are discrete
__w__**idth(***#***)** width of bins in *varname* units
**start(***#***)** theoretical minimum value
-------------------------------------------------------------------------

*continuous_options* Description
-------------------------------------------------------------------------
**bin(***#***)** *#* of bins
__w__**idth(***#***)** width of bins in *varname* units
**start(***#***)** lower limit of first bin
-------------------------------------------------------------------------

*common_options* Description
-------------------------------------------------------------------------
__den__**sity** draw as density; the default
__frac__**tion** draw as fractions
__freq__**uency** draw as frequencies
**percent** draw as percents

__vert__**ical** vertical bars; the default
__hor__**izontal** horizontal bars
**gap(***#***)** reduce width of bars, 0__<__*#*<100

*barlook_options* change look of bars

*axis_choice_options* associate plot with alternative axis

*twoway_options* titles, legends, axes, added lines and text,
by, regions, name, aspect ratio, etc.
-------------------------------------------------------------------------

**fweight**s are allowed; see weight.

__Menu__

**Graphics > Twoway graph (scatter, line, etc.)**

__Description__

**twoway** **histogram** draws histograms of *varname*. Also see **[R] histogram** for
an easier-to-use alternative.

__Options for use in the discrete case__

**discrete** specifies that *varname* is discrete and that each unique value of
*varname* be given its own bin (bar of histogram).

**width(***#***)** is rarely specified in the discrete case; it specifies the width
of the bins. The default is **width(***d***)**, where *d* is the observed
minimum difference between the unique values of *varname*.

Specify **width()** if you are concerned that your data are sparse. For
example, *varname* could in theory take on the values 1, 2, 3, ..., 9,
but because of sparseness, perhaps only the values 2, 4, 7, and 8 are
observed. Here the default width calculation would produce **width(2)**,
and you would want to specify **width(1)**.

**start(***#***)** is also rarely specified in the discrete case; it specifies the
theoretical minimum value of *varname*. The default is **start(***m***)**, where
*m* is the observed minimum value.

As with **width()**, specify **start()** when you are concerned about
sparseness. In the previous example, you would also want to specify
**start(1)**. **start()** does nothing more than add white space to the left
side of the graph.

**start()**, if specified, must be less than or equal to *m*, or an error
will be issued.

__Options for use in the continuous case__

**bin(***#***)** and **width(***#***)** are alternatives that specify how the data are to be
aggregated into bins. **bin()** specifies the number of bins (from which
the width can be derived), and **width()** specifies the bin width (from
which the number of bins can be derived).

If neither option is specified, the results are the same as if **bin(***k***)**
were specified, where

*k* = min(sqrt(*N*), 10*ln(*N*)/ln(10))

and where *N* is the number of nonmissing observations of *varname*.

**start(***#***)** specifies the theoretical minimum of *varname*. The default is
**start(***m***)**, where *m* is the observed minimum value of *varname*.

Specify **start()** when you are concerned about sparse data. For
instance, you might know that *varname* can go down to 0, but you are
concerned that 0 may not be observed.

**start()**, if specified, must be less than or equal to *m*, or else an
error will be issued.

__Options for use in both cases__

**density**, **fraction**, **frequency**, and **percent** are alternatives that specify
whether you want the histogram scaled to density, fractional, or
frequency units, or percentages. **density** is the default.

**density** scales the height of the bars so that the sum of their areas
equals 1.

**fraction** scales the height of the bars so that the sum of their
heights equals 1.

**frequency** scales the height of the bars so that each bar's height is
equal to the number of observations in the category, and thus the sum
of the heights is equal to the total number of nonmissing
observations of *varname*.

**percent** scales the height of the bars so that the sum of their
heights equals 100.

**vertical** and **horizontal** specify whether the bars are to be drawn
vertically (the default) or horizontally.

**gap(***#***)** specifies that the bar width be reduced by *#* percent. **gap(0)** is
the default; **histogram** sets the width so that adjacent bars just
touch. If you wanted gaps between the bars, you would specify, for
instance, **gap(5)**.

Also see **[G-2] graph twoway rbar** for other ways to set the display
width of the bars. Histograms are actually drawn using **twoway rbar**
with a restriction that 0 be included in the bars; **twoway histogram**
will accept any options allowed by **twoway rbar**.

*barlook_options* set the look of the bars. The most important of these
options is **color(***colorstyle***)**, which specifies the color and opacity
of the bars; see **[G-4]** *colorstyle* for a list of color choices. See
**[G-3]** *barlook_options* for information on the other *barlook_options*.

*axis_choice_options* associate the plot with a particular *y* or *x* axis on
the graph; see **[G-3]** *axis_choice_options*.

*twoway_options* are a set of common options supported by all **twoway**
graphs. These options allow you to title graphs, name graphs,
control axes and legends, add lines and text, set aspect ratios,
create graphs over **by()** groups, and change some advanced settings.
See **[G-3]** *twoway_options*.

__Remarks__

Remarks are presented under the following headings:

Relationship between graph twoway histogram and histogram
Typical use
Use with by()
History

__Relationship between graph twoway histogram and histogram__

**graph** **twoway** **histogram** -- documented here -- and **histogram** -- documented
in **[R] histogram** -- are almost the same command. **histogram** has the
advantages that

1. it allows overlaying of a normal density or a kernel estimate of
the density;

2. if a density estimate is overlaid, it scales the density to
reflect the scaling of the bars.

**histogram** is implemented in terms of **graph** **twoway** **histogram**.

__Typical use__

When you do not specify otherwise, **graph** **twoway** **histogram** assumes that
the variable is continuous:

**. sysuse lifeexp**

**. twoway histogram le**
*(**click to run**)*

Even with a continuous variable, you may specify the **discrete** option to
see the individual values:

**. twoway histogram le, discrete**
*(**click to run**)*

__Use with by()__

**graph** **twoway** **histogram** may be used with **by()**:

**. sysuse lifeexp, clear**

**. twoway histogram le, discrete by(region, total)**
*(**click to run**)*

Here specifying **frequency** is a good way to show both the distribution and
the overall contribution to the total:

**. twoway histogram le, discrete freq by(region, total)**
*(**click to run**)*

The height of the bars reflects the number of countries. Here -- and in
all the above examples -- we would do better by obtaining population data
on the countries and then typing

**. twoway histogram le [fw=pop], discrete freq by(region, total)**

so that bar height reflected total population.

__History__

According to Beniger and Robyn (1978, 4), although A. M. Guerry published
a histogram in 1833, the word "histogram" was first used by Karl Pearson
in 1895.

__References__

Beniger, J. R., and D. L. Robyn. 1978 Quantitative graphics in
statistics: A brief history. *American Statistician* 32: 1-11.

Guerry, A.-M. 1833. *Essai sur la Statique Morale de la France*. Paris:
Crochard.

Pearson, K. 1895. Contributions to the mathematical theory of evolution
-- II. Skew variation in homogeneous material. *Philosophical*
*Transactions of the Royal Society in London, Series A* 186: 343-414.