Unique ranking before cutting will force as equal size groups as possible while simply using -egen ..cut()- will not. Eg: sysuse auto, clear * Per Martin's suggestion egen group1 = cut(mpg), group(4) lab var group1 "Just use cut(mpg)" tab group1 * Alternate using ranking first egen rank = rank(mpg), unique egen group2=cut(rank), group(4) lab var group2 "Use rank(mpg), then cut(rank)" tab group2 table group2 group1,stubw(15) row col DCE On Tue, Dec 2, 2008 at 10:45 AM, Nick Cox <n.j.cox@durham.ac.uk> wrote: > Exactly equal-sized groups are only guaranteed if > > 1. the number of observations is an exact multiple of the number of > groups (which usually bites minutely) > > 2. there are no problems with ties (which often bites substantially). > > Your problem is evidently #2. > > You can only force equal-sized groups if you assign the same value to > different groups in at least some cases. You can always force that by > perturbing your data with random noise before passing them to -xtile-, > but that's hardly a satisfactory approach. > > But the whole approach is pretty unsatisfactory anyway: this kind of > subdivision throws away information which is not obviously dispensable. > > I've not been following this thread carefully but my impression is that > you've had some excellent advice from Maarten Buis that you've chosen to > ignore. That's your prerogative, but you'll get diminishing returns from > asking small variants on the same question. A modern approach to this > uses some kind of smoothing to try to get over the granularity in your > data, which you can do in a controlled way. > > Nick > n.j.cox@durham.ac.uk > > Gisella Young > > I am trying to divide my dataset into equally sized groups on the basis > of an income variable (eg 100 groups from lowest to highest income). I > have tried several methods but the groups are not equally sized. For > example, > > -xtile cat=income, n(100)- > (similarly with pctile) > and > -sumdist income, n(100) qgp(cat)- > > It produces the desired number of groups but they are not equally sized. > (Which I see by looking at the frequencies when I say -tab cat- > thereafter). The differences are not small - some groups are many times > larger than others. This is not because of weighting as I have tried > even without weights. It is also not related to the size of groups. I > wonder whether it might be because of clustering of incomes around > certain values (e.g. 10 000, 15 000) and all of those values being > lumped into certain categories. > > Can anyone suggest a way to partition the sample into equally sized > groups? > > This actually stems from an earlier thread (but no need to read that for > the above) about plotting a chart of income distribution with the > occupational composition of each percentile. Austin's suggestion (below) > comes close to that. However, even with his code the groups are not > equally sized, but they are sized the same as when I use the sumdist or > xtile commands mentioned above. > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > -- David Elliott * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

