# Re: st: how to make an area graph showing distribution?

 From Gisella Young To statalist@hsphsun2.harvard.edu Subject Re: st: how to make an area graph showing distribution? Date Sun, 30 Nov 2008 07:13:09 -0800 (PST)

```Thank you. Here is a small example dataset (ommitting date, assuming all are from the relevant date; and omitting weights) where occupation 1 is manual workers and occupation 4 is professionals, 2 and 3 in between:

occup         income
4             5000
2             2000
3             4000
3             3000
4             6000
1             1000
etc

Then I want to show for each part of the income distribution (in percentiles) how much of that income percentile falls in each occupation. I've tried to illustrate this below but am not sure how it will come out in email. What I've shown here with letters are not points but would be different colours/patterns of the continuous area chart. The chart is not coming directly from the numerical example above, and would of course be continuous and not in 6 rows. But the chart would show that e.g. at the bottom end of the income spectrum it is almost exclusively manual workers whereas at the top it is mostly professionals and no manuel workers. I guess it is basically a continuous series of 'bar charts' showing the proportions of each occupation for each income percentile, but 'joined' into an area chart.

100 |bcccccccddddddddddddd
|abbbccccccccccccddddd
|aabbbbbcccccccccccddd
|aaaaabbbbbbbbbccccccd
|aaaaaaaaaaaaabbbbbccd
0 |aaaaaaaaaaaaaaaaaabbc
----------------------
0                    100
income (percentiles)

As I mentioned I tried as the first step creating 100 percentiles of the income distribution using sumdist (-sumdist income if date==2007) [fw=weight], n(100) qgp(test)- ) which I was then going to try and plot in terms of occupation, but this didn't work as the groups are not equally sized. Even if that works, I am not sure how to proceed to the chart.

Thanks!!

Gisella

--- On Sun, 11/30/08, Martin Weiss <martin.weiss1@gmx.de> wrote:

> From: Martin Weiss <martin.weiss1@gmx.de>
> Subject: Re: st: how to make an area graph showing distribution?
> To: statalist@hsphsun2.harvard.edu
> Date: Sunday, November 30, 2008, 1:46 PM
> No need to say "sorry", the list is there - in
> particular- for insidious problems. So could you post some
> kind of example dataset with instructions of what you are
> trying to accomplish? The dotplot was a guess in the absence
> of an example...
>
> HTH
> Martin
> _______________________
> ----- Original Message ----- From: "Gisella
> Young" <gisellayoung@yahoo.com>
> To: <statalist@hsphsun2.harvard.edu>
> Sent: Sunday, November 30, 2008 2:43 PM
> Subject: Re: st: how to make an area graph showing
> distribution?
>
>
> > Thank you for your replies. However sorry to come back
> but I am still stuck and wonder whether I could bother
> people for further advice. On Maarten Buis's suggestion,
> I am not sure why I would really need a regression - I get
> from his email that this is basically for smoothing? Since I
> actually want to plot the actual data (but realise that this
> needs smoothing), what I would prefer to do would be to have
> income plotted in for example percentiles (on the x-axis)
> showing for each percentile the composition of that income
> percentile in terms of occupation on the y-axis.
> >
> > I guess one way would be a histogram or bar chart, but
> what I really want is a continuous area plot with
> percentiles of income on the x-axis and percentage in each
> occupation (for each income percentile) on the y-axis.
> >
> > I'm not sure whether a dot graph as Martin Weiss
> suggests will really do this, I've looked into it but it
> seems quite different unless I am misunderstanding? Also, I
> am struggling with even the first step of creating another
> variable first with the proportions of each occupation for
> each income group (eg percentile). I have tried functions
> they are not dividing the population into equally sized
> percentile groups. I have tried for instance -sumdist income
> if date==2007 [fw=weight], n(100) qgp(test)- but the groups
> are not of the same size.
> >
> > I'm hopeful that there must be a simple way to do
> this, in part because in Excel it can be done in a few
> minutes (but excel of course can't handle large survey
> data as I am dealing with). Sorry to bother the list with
> these follow-up enquiries.
> >
> > best,
> > Gisella
> >
> >
> > --- On Sun, 11/30/08, Maarten buis
> <maartenbuis@yahoo.co.uk> wrote:
> >
> >> From: Maarten buis <maartenbuis@yahoo.co.uk>
> >> Subject: Re: st: how to make an area graph showing
> distribution?
> >> To: statalist@hsphsun2.harvard.edu
> >> Date: Sunday, November 30, 2008, 10:52 AM
> >> --- Gisella Young wrote:
> >> > I am trying to make a chart showing the
> distribution
> >> of income by
> >> > occupation. On the x-axis I would like the
> >> distribution of income
> >> > from 0 to the highest. Then on the y-axis I
> want to
> >> show the
> >> > proportion of people in different
> occupations. I have
> >> a variable
> >> > (occup) with 6 different occupational
> categories. In
> >> other words, I
> >> > want to show how the different occupations
> fit into
> >> income
> >> > distribution, by showing how the occupational
> >> breakdown of income
> >> > changes moving up the income spectrum. I
> thought an
> >> area chart
> >> > (summing to 100) would be the best way to do
> this,
> >> although there
> >> > might be better ways which I would be open to
> >> suggestions. I have
> >> > tried the twoway area function with different
> >> variations, but it
> >> > doesn't seem to be right (just gives a
> crazy chart
> >> with lines all
> >> > over) and I'm not sure how to do it.
> >>
> >> You'll probably need to smooth the proportion
> as you
> >> won't have for
> >> each wage in your data enough cases within each
> >> occupational category
> >> to reliably estimate the proportions. In the
> example below
> >> I have done
> >> so by estimating a -mlogit- predicting
> occupational
> >> catagory with a
> >> wage represented as a restricted cubic spline (see
> -help
> >> mkspline-). I
> >> treat the predicted probabilities as the smoothed
> >> proportions.
> >>
> >> For the graph I created the variables zero, one,
> and l1
> >> till l5. The
> >> logic is that on the y-axis the first band should
> range
> >> from 0 (zero)
> >> to the first proportion (l1), on second band
> should start
> >> at the first
> >> proportion and end at the first + the second
> proportion,
> >> etc. Two
> >> things are worth noting: 1) you need to sort first
> on wage
> >> (or use the
> >> sort option) to avoid creating modern art, and 2)
> I
> >> reversed the order
> >> in the legend (going from 6 to 1) so that the
> order in
> >> which they
> >> appear in the legend corresponds with the order in
> which
> >> they appear in
> >> the graph (1 at the bottom and 6 at the top).
> >>
> >> *--------------- begin example
> -----------------------
> >> // prepare the example data
> >> sysuse nlsw88, clear
> >> gen ind_gr = industry
> >> recode ind_gr 1/5=1 6=2 7=3 8/10=4 11=5 12=6
> >> label define ind_gr 1 "manual"
>      ///
>      ///
> >>                     3 "finance"
>      ///
> >>                     4 "other services"
>      ///
> >>                     5 "professional
> services" ///
> >>                     6 "public
> >> label value ind_gr ind_gr
> >>
> >> // smooth the proportions
> >> mkspline s_w=wage, cubic nknots(5)
> >> mlogit ind_gr s_w*
> >> predict pr*
> >>
> >> // create the graph
> >> gen zero = 0
> >> gen one = 1
> >> gen l1 = pr1
> >> gen l2 = pr1 + pr2
> >> gen l3 = pr1 + pr2 + pr3
> >> gen l4 = pr1 + pr2 + pr3 + pr4
> >> gen l5 = pr1 + pr2 + pr3 + pr4 + pr5
> >>
> >> sort wage
> >> twoway rarea zero l1 wage || ///
> >>        rarea l1 l2 wage   || ///
> >>        rarea l2 l3 wage   || ///
> >>        rarea l3 l4 wage   || ///
> >>        rarea l4 l5 wage   || ///
> >>        rarea l5 one wage,    ///
> >>        legend(order( 6 "public
> >> ///
> >>                      5 "professional
> services"
> >> ///
> >>                      4 "other services"
> >> ///
> >>                      3 "finance"
> >> ///
> >> ///
> >>                      1 "manual" ))
> >> *---------------------- end example
> -----------------
> >> (For more on how to use examples I sent to the
> Statalist,
> >> see
> >> http://home.fsw.vu.nl/m.buis/stata/exampleFAQ.html
> )
> >>
> >> Hope this helps,
> >> Maarten
> >>
> >> -----------------------------------------
> >> Maarten L. Buis
> >> Department of Social Research Methodology
> >> Vrije Universiteit Amsterdam
> >> Boelelaan 1081
> >> 1081 HV Amsterdam
> >> The Netherlands
> >>
> >> Buitenveldertselaan 3 (Metropolitan), room N515
> >>
> >> +31 20 5986715
> >>
> >> http://home.fsw.vu.nl/m.buis/
> >> -----------------------------------------
> >>
> >>
> >>
> >> *
> >> *   For searches and help try:
> >> *   http://www.stata.com/help.cgi?search
> >> *   http://www.stata.com/support/statalist/faq
> >> *   http://www.ats.ucla.edu/stat/stata/
> >
> >
> >
> >
> > *
> > *   For searches and help try:
> > *   http://www.stata.com/help.cgi?search
> > *   http://www.stata.com/support/statalist/faq
> > *   http://www.ats.ucla.edu/stat/stata/
> >
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```