[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: even categories

From   "Nick Cox" <>
To   <>
Subject   RE: st: even categories
Date   Tue, 14 Oct 2008 15:57:16 +0100

I agree with Maarten. Unequal frequencies typically reflect the
existence of ties. 
Here is a simple example: 

sysuse auto
xtile cmpg = mpg, n(4)
tab cmpg 
su mpg, detail
dotplot mpg, yli(18 20 25)

A larger question is why categorise at all? Usually this is just
throwing away information. If the difficulty is felt to be working with
a measured variable, statistics has plenty of solutions for that; in
fact, it's a feature. 


Maarten buis

If -xtile- does not give you categories with approximately the same
number of observations then that would indicate that there are some
extreme ties or spikes in your data. With such extreme ties almost any
automated procedure will fail. In that case you should have a good look
at your data and see if there is some substantive reason behind this
clustering. You could use for instance -spikeplot var-, where var is
the variable in question. For example, if you ask people how many hours
a week they work the answer will typically be a multiple of 4 (i.e.
half an 8-hour workday) and there will be a spike at 36 or 38 or 40
depending on what's considered full time, which differs between the
countries. Once you have found such a substantive reason I would try to
use that to classify people into different categories.

Philip Sinclair 

> I would like to make a categorical variable with approximately even
> order categories, but I have found
> that xtile does not always quite do the trick. Please is there an
> alternative?

*   For searches and help try:

© Copyright 1996–2023 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index