Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: problem with dividing dataset into equally sized groups


From   "Martin Weiss" <martin.weiss1@gmx.de>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: problem with dividing dataset into equally sized groups
Date   Tue, 2 Dec 2008 15:41:44 +0100

Line for the server...

Try -egen, cut()- with the - group(#)- option.


HTH
Martin


-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Gisella Young
Sent: Tuesday, December 02, 2008 3:29 PM
To: statalist@hsphsun2.harvard.edu
Subject: st: problem with dividing dataset into equally sized groups

I am trying to divide my dataset into equally sized groups on the basis of
an income variable (eg 100 groups from lowest to highest income). I have
tried several methods but the groups are not equally sized. For example,

-xtile cat=income, n(100)-
 (similarly with pctile)
and
-sumdist income, n(100) qgp(cat)-

It produces the desired number of groups but they are not equally sized.
(Which I see by looking at the frequencies when I say -tab cat- thereafter).
The differences are not small - some groups are many times larger than
others. This is not because of weighting as I have tried even without
weights. It is also not related to the size of groups. I wonder whether it
might be because of clustering of incomes around certain values (e.g. 10
000, 15 000) and all of those values being lumped into certain categories. 

Can anyone suggest a way to partition the sample into equally sized groups?


This actually stems from an earlier thread (but no need to read that for the
above) about plotting a chart of income distribution with the occupational
composition of each percentile. Austin's suggestion (below) comes close to
that. However, even with his code the groups are not equally sized, but they
are sized the same as when I use the sumdist or xtile commands mentioned
above.

best,
Gisella

--- On Mon, 12/1/08, Austin Nichols <austinnichols@gmail.com> wrote:

> From: Austin Nichols <austinnichols@gmail.com>
> Subject: Re: st: how to make an area graph showing distribution?
> To: statalist@hsphsun2.harvard.edu
> Date: Monday, December 1, 2008, 2:02 AM
> Gisella Young <gisellayoung@yahoo.com>:
> It may be that you are looking for a simple stacked bar
> graph over
> income quintiles or deciles or the like, as opposed to a
> parametric
> smooth over income quantiles.  If so, you might want to
> adapt one of
> this pair of example graphs to your needs:
> 
> clear all
> sysuse nlsw88
> ren industry i
> tab i, g(ind)
> g w=round(uniform()*20)
> la var w "fake survey weight"
> _pctile wage [pw=w], nq(5)
> g q=1 if wage<=r(r1)
> forv i=2/5 {
>  replace q=`i' if wage>r(r`=`i'-1') &
> wage<=r(r`i')
>  }
> loc y
> forv i=1/12 {
>  loc l "`=substr("`: var la
> ind`i''",4,.)'"
>  loc y `"`y' lab(`i'
> "`l'")"'
>  loc lv`i' `"la var ind`i' "`l'"
> "'
>  }
> gr bar ind* [pw=w], stack over(q) name(b) leg(`y')
> collapse ind* [pw=w], by(q)
> forv i=2/12 {
>  replace ind`i'=ind`i'+ind`=`i'-1'
>  }
> loc v
> forv i=1/12 {
>  `lv`i''
>  loc v "ind`i' `v'"
>  }
> tw bar `v' q, name(tw)
> 
> Note that the commands above destroy the data in memory, so
> make sure
> you -preserve- or -save- first as appropriate.  Also note
> that there
> is no guarantee that the distributions of income by
> occupation, or
> occupation by income category, display any sort of
> stochastic
> dominance that would allow easy ranking of occupations.
> 
> See also
> http://www.stata.com/capabilities/graphexamples.html
> 
> 
> On Sun, Nov 30, 2008 at 10:37 AM, Maarten buis
> <maartenbuis@yahoo.co.uk> wrote:
> > --- Gisella Young <gisellayoung@yahoo.com>
> wrote:
> >> On Maarten Buis's suggestion, I am not sure
> why I would really need
> >> a regression - I get from his email that this is
> basically for
> >> smoothing?
> >
> > Yes, as income in the example dataset (and I assume in
> your dataset as
> > well) is a continuous variable, there just aren't
> enough cases for each
> > income value to estimate the proportions.
> >
> >> Since I actually want to plot the actual data (but
> realise
> >> that this needs smoothing),
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/




      

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index