[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: correct way to divide the sample into deciles?
You are correct. -group()- went undocumented in Stata 9.
This means the function -group()-, not the -egen- function
It's still there, naturally, so that previous programs
and do files are not broken. But...
In essence, as I interpret the situation, -group()- is problematic
for various reasons. Here I take the standard use to be yours, namely
gen group = group(#)
1. The definition wasn't nearly precise enough to be any
use for really careful work. The on-line help for Stata 8
"group(x) creates a categorical variable that divides the data into x as
nearly equal-sized subsamples as possible, numbering the first group
1, the second group 2, etc."
but that's too vague for anyone to understand or reproduce. As -group()-
is part of the executable, the code is not inspectable. The
documentation could have been fixed, but that wasn't the only problem.
2. Examples show that -group()- can assign observations with
the same value of -myvar- to different groups. That would be widely
be considered pathological, i.e. bad. It's only reproducible,
presumably, if you -set seed- and record that.
3. -group()- doesn't seem to pay special attention to missing values.
4. The name is overloaded. There is, as said, an -egen, group()- which
is different. Svend Juul threw some stones at StataCorp which
started a small avalanche of re-naming, in which StataCorp
tried to tackle various inconsistencies whereby the same
thing had different names and different things had the same
name among various functions and -egen- functions. -group()-
is far less useful, really, than -egen, group()-, so it was
a marked function from the start.
5. As in your case, people who want this really want quantiles
(e.g. deciles) instead, in most cases, and there are much better
documented Stata commands to do that. -search quantile- to
get some suggestions, but be warned that agreed-to-be-correct methods
don't exist. There is a literature on different definitions of quantile,
hinging on what is to be done about ties and what you do when
the number of values is too awkward to be divisible in the way
6. Probably some more problems. Really, -group()- had passed its
I am wondering what is the correct way to divide a sample into 10
deciles based on the value of variable xyz. What I would do is:
The 'group' function wil divide the sample into 10 as-nearly equal size
subgroups. Given the variable in interest is sorted beforhand, it looks
fine to me. I am not sure whether this is the right way. Is there any
other more accurate way to do the job?
Another question is that I just upgraded from STATA7 to STATA9. I
couldn't find explanation on function 'group' in STATA9 manuals or
online document. The 'group' function under 'generate' still works as
under STATA7 though. I am wondering whether 'group' is called another
name under STATA9.
* For searches and help try: