[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: how to make an area graph showing distribution?

From   Maarten buis <>
Subject   Re: st: how to make an area graph showing distribution?
Date   Sun, 30 Nov 2008 15:37:09 +0000 (GMT)

--- Gisella Young <> wrote:
> On Maarten Buis's suggestion, I am not sure why I would really need
> a regression - I get from his email that this is basically for
> smoothing? 

Yes, as income in the example dataset (and I assume in your dataset as
well) is a continuous variable, there just aren't enough cases for each
income value to estimate the proportions.

> Since I actually want to plot the actual data (but realise
> that this needs smoothing),

You have to choose one or the other, and if you choose to smooth than
my use of -mlogit- is probably the easiest method that will ensure that
the smoothed proportions will add up to one.

> what I would prefer to do would be to have income plotted in for
> example percentiles (on the x-axis) showing for each percentile the
> composition of that income percentile in terms of occupation on the
> y-axis. 

If I understand you correctly, all that is different from my example is
that you want to do that on a transformed metric of income. The way to
do the percentile rank transformation is discussed here:

This has been implemented in the example below (and just because I felt
like it, I replaced the legend with a second y-axis)

*--------------- begin example -----------------------
// prepare the example data
sysuse nlsw88, clear
gen ind_gr = industry
recode ind_gr 1/5=1 6=2 7=3 8/10=4 11=5 12=6
label define ind_gr 1 "manual"                ///
                    2 "trade"                 ///
                    3 "finance"               ///
                    4 "other services"        ///
                    5 "professional services" ///
                    6 "public administration"
label value ind_gr ind_gr

// compute percentile ranks
egen n = count(wage)
egen i = rank(wage)
gen hazen = (i - 0.5) / n * 100
label variable hazen "percentile rank of income"

// smooth the proportions
mkspline s_w=hazen, cubic nknots(5)
mlogit ind_gr s_w*
predict pr*

// create the graph
gen zero = 0
gen one = 100
gen l1 = (pr1)*100
gen l2 = (pr1 + pr2)*100
gen l3 = (pr1 + pr2 + pr3)*100
gen l4 = (pr1 + pr2 + pr3 + pr4)*100
gen l5 = (pr1 + pr2 + pr3 + pr4 + pr5)*100

sort hazen

// collect the labels for the second y-axis
local mid = l1[_N]/2
local yaxis `"`mid' "manual""'

local mid = (l2[_N]-l1[_N])/2 + l1[_N]
local yaxis `"`yaxis' `mid' "trade""'

local mid = (l3[_N]-l2[_N])/2 + l2[_N]
local yaxis `"`yaxis' `mid' "finance""'

local mid = (l4[_N]-l3[_N])/2 + l3[_N]
local yaxis `"`yaxis' `mid' "other services""'

local mid = (l5[_N]-l4[_N])/2 + l4[_N]
local yaxis `"`yaxis' `mid' "professional services""'

local mid = (100-l5[_N])/2 + l5[_N]
local yaxis `"`yaxis' `mid' "public administration""'

twoway rarea zero l1 hazen, yaxis(1) || ///
       rarea l1 l2 hazen, yaxis(2)   || ///
       rarea l2 l3 hazen   ||           ///        
       rarea l3 l4 hazen   ||           ///
       rarea l4 l5 hazen   ||           ///
       rarea l5 one hazen,              ///
       ytitle("percentage")             ///
       ylab(`yaxis', axis(2))           ///
       yscale(range(0 100) axis(1))     ///
       yscale(range(0 100) axis(2))     ///
       ytitle("", axis(2))              ///
       plotregion(margin(zero))         ///
       aspect(1)                        ///
*---------------------- end example -----------------
(For more on how to use examples I sent to the Statalist, see )

Hope this helps,

Maarten L. Buis
Department of Social Research Methodology
Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands

visiting address:
Buitenveldertselaan 3 (Metropolitan), room N515

+31 20 5986715

*   For searches and help try:

© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index