Thank you for your replies. However sorry to come back but I am still stuck and wonder whether I could bother people for further advice. On Maarten Buis's suggestion, I am not sure why I would really need a regression - I get from his email that this is basically for smoothing? Since I actually want to plot the actual data (but realise that this needs smoothing), what I would prefer to do would be to have income plotted in for example percentiles (on the x-axis) showing for each percentile the composition of that income percentile in terms of occupation on the y-axis. I guess one way would be a histogram or bar chart, but what I really want is a continuous area plot with percentiles of income on the x-axis and percentage in each occupation (for each income percentile) on the y-axis. I'm not sure whether a dot graph as Martin Weiss suggests will really do this, I've looked into it but it seems quite different unless I am misunderstanding? Also, I am struggling with even the first step of creating another variable first with the proportions of each occupation for each income group (eg percentile). I have tried functions such as sumdist, pctile, and xtile (downloaded from SSC) but they are not dividing the population into equally sized percentile groups. I have tried for instance -sumdist income if date==2007 [fw=weight], n(100) qgp(test)- but the groups are not of the same size. I'm hopeful that there must be a simple way to do this, in part because in Excel it can be done in a few minutes (but excel of course can't handle large survey data as I am dealing with). Sorry to bother the list with these follow-up enquiries. best, Gisella --- On Sun, 11/30/08, Maarten buis <maartenbuis@yahoo.co.uk> wrote: > From: Maarten buis <maartenbuis@yahoo.co.uk> > Subject: Re: st: how to make an area graph showing distribution? > To: statalist@hsphsun2.harvard.edu > Date: Sunday, November 30, 2008, 10:52 AM > --- Gisella Young wrote: > > I am trying to make a chart showing the distribution > of income by > > occupation. On the x-axis I would like the > distribution of income > > from 0 to the highest. Then on the y-axis I want to > show the > > proportion of people in different occupations. I have > a variable > > (occup) with 6 different occupational categories. In > other words, I > > want to show how the different occupations fit into > income > > distribution, by showing how the occupational > breakdown of income > > changes moving up the income spectrum. I thought an > area chart > > (summing to 100) would be the best way to do this, > although there > > might be better ways which I would be open to > suggestions. I have > > tried the twoway area function with different > variations, but it > > doesn't seem to be right (just gives a crazy chart > with lines all > > over) and I'm not sure how to do it. > > You'll probably need to smooth the proportion as you > won't have for > each wage in your data enough cases within each > occupational category > to reliably estimate the proportions. In the example below > I have done > so by estimating a -mlogit- predicting occupational > catagory with a > wage represented as a restricted cubic spline (see -help > mkspline-). I > treat the predicted probabilities as the smoothed > proportions. > > For the graph I created the variables zero, one, and l1 > till l5. The > logic is that on the y-axis the first band should range > from 0 (zero) > to the first proportion (l1), on second band should start > at the first > proportion and end at the first + the second proportion, > etc. Two > things are worth noting: 1) you need to sort first on wage > (or use the > sort option) to avoid creating modern art, and 2) I > reversed the order > in the legend (going from 6 to 1) so that the order in > which they > appear in the legend corresponds with the order in which > they appear in > the graph (1 at the bottom and 6 at the top). > > *--------------- begin example ----------------------- > // prepare the example data > sysuse nlsw88, clear > gen ind_gr = industry > recode ind_gr 1/5=1 6=2 7=3 8/10=4 11=5 12=6 > label define ind_gr 1 "manual" /// > 2 "trade" /// > 3 "finance" /// > 4 "other services" /// > 5 "professional services" /// > 6 "public administration" > label value ind_gr ind_gr > > // smooth the proportions > mkspline s_w=wage, cubic nknots(5) > mlogit ind_gr s_w* > predict pr* > > // create the graph > gen zero = 0 > gen one = 1 > gen l1 = pr1 > gen l2 = pr1 + pr2 > gen l3 = pr1 + pr2 + pr3 > gen l4 = pr1 + pr2 + pr3 + pr4 > gen l5 = pr1 + pr2 + pr3 + pr4 + pr5 > > sort wage > twoway rarea zero l1 wage || /// > rarea l1 l2 wage || /// > rarea l2 l3 wage || /// > rarea l3 l4 wage || /// > rarea l4 l5 wage || /// > rarea l5 one wage, /// > legend(order( 6 "public administration" > /// > 5 "professional services" > /// > 4 "other services" > /// > 3 "finance" > /// > 2 "trade" > /// > 1 "manual" )) > *---------------------- end example ----------------- > (For more on how to use examples I sent to the Statalist, > see > http://home.fsw.vu.nl/m.buis/stata/exampleFAQ.html ) > > Hope this helps, > Maarten > > ----------------------------------------- > Maarten L. Buis
Department of Social Research Methodology
Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands

visiting address:
Buitenveldertselaan 3 (Metropolitan), room N515

+31 20 5986715

http://home.fsw.vu.nl/m.buis/
-----------------------------------------


*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/

