# Re: st:compare distribution of a variable across groups

 From Austin Nichols To statalist@hsphsun2.harvard.edu Subject Re: st:compare distribution of a variable across groups Date Tue, 3 Feb 2009 15:01:57 -0500

```Mandy fu <mandy.fu1@gmail.com>:
You can look at the whole distribution fairly easily--
help kdensity
help tw kdensity
gets you started, or you can modify this example:

ssc install vioplot
sysuse nlsw88, clear
keep if race<3
tw kdensity wage, by(race)
la def grade 8 "8 or less"
vioplot wage, over(g) horiz yla(,angle(0)) name(w)
g c=cond(wage>0,wage^(1/3),-abs(wage)^(1/3))
vioplot c, over(g) horiz yla(,angle(0)) name(c)

You might have to take the cube root of wage if it is too skewed to
see the distribution without transformation (wages are positive, but
earnings and such need not be, which is why I use the convoluted
syntax above).

Note the large literature on how difficult it is to estimate the
effect of education on wages without obtaining badly biased results;
see e.g. http://www.nber.org/papers/w7769

Percentiles and such can be obtained from _pctile (and many other
commands, not all of which allow survey weights) or qreg (in a
regression framework).

I think you mean "standard deviations" rather than "standard errors"
below, but don't make the mistake of thinking wages or incomes are
normally (or even symmetrically) distributed.

On Mon, Feb 2, 2009 at 4:46 PM, Mandy fu <mandy.fu1@gmail.com> wrote:
> Hi all,
>
> I'm wondering if anyone here could give me some suggestion on
> comparing  a variable's distribution across several groups.
>
> I'm going to compare the effect of education on wage rates by racial
> groups. Here's what I'm thinking:
>
> step 1.
>  I will check the means,standard errors,and ranges of wages by racial groups.
> step 2.
>  I will check percentile wage rates,like decimal percentile, for each group.
>
> What I'd like to ask is: what is the usual way to compare the
> distribution of a variable across several groups? My concern is
> that,maybe I  would miss something if only comparing the means or
> standard errors of wages. So, I'm curious how most researchers deal
> with this .
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```