Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st:compare distribution of a variable across groups


From   Austin Nichols <[email protected]>
To   [email protected]
Subject   Re: st:compare distribution of a variable across groups
Date   Tue, 3 Feb 2009 15:01:57 -0500

Mandy fu <[email protected]>:
You can look at the whole distribution fairly easily--
help kdensity
help tw kdensity
gets you started, or you can modify this example:

ssc install vioplot
sysuse nlsw88, clear
keep if race<3
tw kdensity wage, by(race)
replace grade=8 if grade<9
la def grade 8 "8 or less"
la val grade grade
egen g=group(race grade), label
vioplot wage, over(g) horiz yla(,angle(0)) name(w)
g c=cond(wage>0,wage^(1/3),-abs(wage)^(1/3))
vioplot c, over(g) horiz yla(,angle(0)) name(c)

You might have to take the cube root of wage if it is too skewed to
see the distribution without transformation (wages are positive, but
earnings and such need not be, which is why I use the convoluted
syntax above).

Note the large literature on how difficult it is to estimate the
effect of education on wages without obtaining badly biased results;
see e.g. http://www.nber.org/papers/w7769

Percentiles and such can be obtained from _pctile (and many other
commands, not all of which allow survey weights) or qreg (in a
regression framework).

I think you mean "standard deviations" rather than "standard errors"
below, but don't make the mistake of thinking wages or incomes are
normally (or even symmetrically) distributed.

On Mon, Feb 2, 2009 at 4:46 PM, Mandy fu <[email protected]> wrote:
> Hi all,
>
> I'm wondering if anyone here could give me some suggestion on
> comparing  a variable's distribution across several groups.
>
> I'm going to compare the effect of education on wage rates by racial
> groups. Here's what I'm thinking:
>
> step 1.
>  I will check the means,standard errors,and ranges of wages by racial groups.
> step 2.
>  I will check percentile wage rates,like decimal percentile, for each group.
>
> What I'd like to ask is: what is the usual way to compare the
> distribution of a variable across several groups? My concern is
> that,maybe I  would miss something if only comparing the means or
> standard errors of wages. So, I'm curious how most researchers deal
> with this .
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index