[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: how to make an area graph showing distribution?

From   Maarten buis <>
Subject   Re: st: how to make an area graph showing distribution?
Date   Sun, 30 Nov 2008 10:52:56 +0000 (GMT)

--- Gisella Young wrote:
> I am trying to make a chart showing the distribution of income by
> occupation. On the x-axis I would like the distribution of income
> from 0 to the highest. Then on the y-axis I want to show the
> proportion of people in different occupations. I have a variable
> (occup) with 6 different occupational categories. In other words, I
> want to show how the different occupations fit into income
> distribution, by showing how the occupational breakdown of income
> changes moving up the income spectrum. I thought an area chart
> (summing to 100) would be the best way to do this, although there
> might be better ways which I would be open to suggestions. I have
> tried the twoway area function with different variations, but it
> doesn't seem to be right (just gives a crazy chart with lines all
> over) and I'm not sure how to do it.

You'll probably need to smooth the proportion as you won't have for
each wage in your data enough cases within each occupational category
to reliably estimate the proportions. In the example below I have done
so by estimating a -mlogit- predicting occupational catagory with a
wage represented as a restricted cubic spline (see -help mkspline-). I
treat the predicted probabilities as the smoothed proportions.

For the graph I created the variables zero, one, and l1 till l5. The
logic is that on the y-axis the first band should range from 0 (zero)
to the first proportion (l1), on second band should start at the first
proportion and end at the first + the second proportion, etc. Two
things are worth noting: 1) you need to sort first on wage (or use the
sort option) to avoid creating modern art, and 2) I reversed the order
in the legend (going from 6 to 1) so that the order in which they
appear in the legend corresponds with the order in which they appear in
the graph (1 at the bottom and 6 at the top).

*--------------- begin example -----------------------
// prepare the example data
sysuse nlsw88, clear
gen ind_gr = industry
recode ind_gr 1/5=1 6=2 7=3 8/10=4 11=5 12=6
label define ind_gr 1 "manual"                ///
                    2 "trade"                 ///
                    3 "finance"               ///
                    4 "other services"        ///
                    5 "professional services" ///
                    6 "public administration"
label value ind_gr ind_gr

// smooth the proportions
mkspline s_w=wage, cubic nknots(5)
mlogit ind_gr s_w*
predict pr*

// create the graph
gen zero = 0
gen one = 1
gen l1 = pr1
gen l2 = pr1 + pr2
gen l3 = pr1 + pr2 + pr3
gen l4 = pr1 + pr2 + pr3 + pr4
gen l5 = pr1 + pr2 + pr3 + pr4 + pr5

sort wage
twoway rarea zero l1 wage || ///
       rarea l1 l2 wage   || ///
       rarea l2 l3 wage   || ///        
       rarea l3 l4 wage   || ///
       rarea l4 l5 wage   || ///
       rarea l5 one wage,    ///
       legend(order( 6 "public administration" ///
                     5 "professional services" ///
                     4 "other services"        ///
                     3 "finance"               ///
                     2 "trade"                 ///
                     1 "manual" ))
*---------------------- end example -----------------
(For more on how to use examples I sent to the Statalist, see )

Hope this helps,

Maarten L. Buis
Department of Social Research Methodology
Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands

visiting address:
Buitenveldertselaan 3 (Metropolitan), room N515

+31 20 5986715

*   For searches and help try:

© Copyright 1996–2021 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index