HTH Martin _______________________

To: <statalist@hsphsun2.harvard.edu> Sent: Sunday, November 30, 2008 2:43 PM Subject: Re: st: how to make an area graph showing distribution?

Thank you for your replies. However sorry to come back but I am stillstuck and wonder whether I could bother people for further advice. OnMaarten Buis's suggestion, I am not sure why I would really need aregression - I get from his email that this is basically for smoothing?Since I actually want to plot the actual data (but realise that this needssmoothing), what I would prefer to do would be to have income plotted infor example percentiles (on the x-axis) showing for each percentile thecomposition of that income percentile in terms of occupation on they-axis.I guess one way would be a histogram or bar chart, but what I really wantis a continuous area plot with percentiles of income on the x-axis andpercentage in each occupation (for each income percentile) on the y-axis.I'm not sure whether a dot graph as Martin Weiss suggests will really dothis, I've looked into it but it seems quite different unless I ammisunderstanding? Also, I am struggling with even the first step ofcreating another variable first with the proportions of each occupationfor each income group (eg percentile). I have tried functions such assumdist, pctile, and xtile (downloaded from SSC) but they are not dividingthe population into equally sized percentile groups. I have tried forinstance -sumdist income if date==2007 [fw=weight], n(100) qgp(test)- butthe groups are not of the same size.I'm hopeful that there must be a simple way to do this, in part because inExcel it can be done in a few minutes (but excel of course can't handlelarge survey data as I am dealing with). Sorry to bother the list withthese follow-up enquiries.best, Gisella --- On Sun, 11/30/08, Maarten buis <maartenbuis@yahoo.co.uk> wrote:From: Maarten buis <maartenbuis@yahoo.co.uk> Subject: Re: st: how to make an area graph showing distribution? To: statalist@hsphsun2.harvard.edu Date: Sunday, November 30, 2008, 10:52 AM --- Gisella Young wrote: > I am trying to make a chart showing the distribution of income by > occupation. On the x-axis I would like the distribution of income > from 0 to the highest. Then on the y-axis I want to show the > proportion of people in different occupations. I have a variable > (occup) with 6 different occupational categories. In other words, I > want to show how the different occupations fit into income > distribution, by showing how the occupational breakdown of income > changes moving up the income spectrum. I thought an area chart > (summing to 100) would be the best way to do this, although there > might be better ways which I would be open to suggestions. I have > tried the twoway area function with different variations, but it > doesn't seem to be right (just gives a crazy chart with lines all > over) and I'm not sure how to do it. You'll probably need to smooth the proportion as you won't have for each wage in your data enough cases within each occupational category to reliably estimate the proportions. In the example below I have done so by estimating a -mlogit- predicting occupational catagory with a wage represented as a restricted cubic spline (see -help mkspline-). I treat the predicted probabilities as the smoothed proportions. For the graph I created the variables zero, one, and l1 till l5. The logic is that on the y-axis the first band should range from 0 (zero) to the first proportion (l1), on second band should start at the first proportion and end at the first + the second proportion, etc. Two things are worth noting: 1) you need to sort first on wage (or use the sort option) to avoid creating modern art, and 2) I reversed the order in the legend (going from 6 to 1) so that the order in which they appear in the legend corresponds with the order in which they appear in the graph (1 at the bottom and 6 at the top). *--------------- begin example ----------------------- // prepare the example data sysuse nlsw88, clear gen ind_gr = industry recode ind_gr 1/5=1 6=2 7=3 8/10=4 11=5 12=6 label define ind_gr 1 "manual" /// 2 "trade" /// 3 "finance" /// 4 "other services" /// 5 "professional services" /// 6 "public administration" label value ind_gr ind_gr // smooth the proportions mkspline s_w=wage, cubic nknots(5) mlogit ind_gr s_w* predict pr* // create the graph gen zero = 0 gen one = 1 gen l1 = pr1 gen l2 = pr1 + pr2 gen l3 = pr1 + pr2 + pr3 gen l4 = pr1 + pr2 + pr3 + pr4 gen l5 = pr1 + pr2 + pr3 + pr4 + pr5 sort wage twoway rarea zero l1 wage || /// rarea l1 l2 wage || /// rarea l2 l3 wage || /// rarea l3 l4 wage || /// rarea l4 l5 wage || /// rarea l5 one wage, /// legend(order( 6 "public administration" /// 5 "professional services" /// 4 "other services" /// 3 "finance" /// 2 "trade" /// 1 "manual" )) *---------------------- end example ----------------- (For more on how to use examples I sent to the Statalist, see http://home.fsw.vu.nl/m.buis/stata/exampleFAQ.html ) Hope this helps, Maarten ----------------------------------------- Maarten L. Buis Department of Social Research Methodology Vrije Universiteit Amsterdam Boelelaan 1081 1081 HV Amsterdam The Netherlands visiting address: Buitenveldertselaan 3 (Metropolitan), room N515 +31 20 5986715 http://home.fsw.vu.nl/m.buis/ ----------------------------------------- * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/*

