[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: RE: st: significance of mean and median

From   "Lachenbruch, Peter" <>
To   <>
Subject   RE: RE: st: significance of mean and median
Date   Wed, 26 Nov 2008 10:02:19 -0800

In general, I've found that bad skewness/asymmetry messes up
significance tests more than heavy tails.  I know I read this somewhere
long ago, and it seems to work pretty well.
When you have skewness, looking for transformations is a good idea.
Where you can get messed up is when the skewness is caused by a lumping
at a value - e.g. the number of subjects who have 0 days of
hospitalization.  Then no transformation will help - and it's probably
better to fit models to no response and response given that it's greater
than 0.  This might be a two-part or hurdle model or a mixture of
distributions (such as zip or zinb)


Peter A. Lachenbruch
Department of Public Health
Oregon State University
Corvallis, OR 97330
Phone: 541-737-3832
FAX: 541-737-4001

-----Original Message-----
[] On Behalf Of Maarten buis
Sent: Wednesday, November 26, 2008 6:49 AM
Subject: Re: RE: st: significance of mean and median

--- Bastian Steingros <> wrote:
> using 
> sysuse auto, clear
> reg mpg, nohe
> mean mpg
> ttest mpg==0
> displays the same results. However, how do these tests deal with the
> assumption, that mpg has to normal distributed? 
> More precisely , how important is the fact that mpg is normal
> distributed? Most of the variables in my sample are left or right
> skewed... 
> Is ttest also in this case reliable it? 

You can find that out using -simulate-. One way to figure this out is
to use simulation. You declare your data to be the population and
repeatedly test a true hypothesis on a random sample from your
"population" (N out of N with replacement, just like the bootstrap),
and than you look at whether the p-value folows a uniform distribution,
and whether you reject the null in only 5% of the samples. See the
example below and .

*-------------- begin example --------------------
capture program drop sim
program define sim, rclass
	sysuse auto, clear
	sum mpg, meanonly
	replace mpg = mpg - r(mean)
	ttest mpg = 0
	return scalar p = r(p)
simulate p=r(p), reps(5000): sim

hist p // should be a uniform distribution

gen sig = p < .05

sum sig // mean should be .05
*--------------- end example --------------------
(For more on how to use examples I sent to the Statalist, see )

> by the way, median mpg require a option. So, how can I test if the
> median of a var. is significant without using this command? Because I
> have no idea which by-option would make sense in my sample.

I think that the term "significant" has done more harm than good
because it hides the null hypothesis. As a consequence too many
non-sensical hypotheses are being tested. What you need to do is to
specify a null hypothesis and justify why anyone should care about this
hypothesis. The hypothesis that the mean or the median of a variable is
zero is almost never of interest, and thus should almost never be
tested. It is usually much more interesting to compare the mean/median
between groups, for example men and women. So this is probably why it
never occured to someone (or no one thought it was worth their time) to
implement a test whether or not the median is equal to a certain fixed

> Nick Cox seems not to be fully agreed with LAD/qreg...

Nick can speak for himself, but I got the impression that he wasn't
negative about -qreg-, but just noted that -qreg- did not have a neat
test equivalent like -regress- and -ttest-. 

-- Maarten

Maarten L. Buis
Department of Social Research Methodology
Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands

visiting address:
Buitenveldertselaan 3 (Metropolitan), room N515

+31 20 5986715

*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2015 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index