Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

R: RE: st: significance of mean and median


From   "Carlo Lazzaro" <carlo.lazzaro@tiscalinet.it>
To   <statalist@hsphsun2.harvard.edu>
Subject   R: RE: st: significance of mean and median
Date   Wed, 26 Nov 2008 19:45:47 +0100

For those who are interested in the debate raised by Bastian's thread, I
would refer to the following article, which covers transformation issues for
skewed sampling distributions of continuous variables (costs):


Barber JA and Thompson SG. Analysis of cost data in randomized trials: an
application of the non-parametric bootstrap. Statistics in Medicine 2000;
19:3219-3236

Kind Regards,
Carlo
-----Messaggio originale-----
Da: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] Per conto di Lachenbruch,
Peter
Inviato: mercoledì 26 novembre 2008 19.02
A: statalist@hsphsun2.harvard.edu
Oggetto: RE: RE: st: significance of mean and median

In general, I've found that bad skewness/asymmetry messes up
significance tests more than heavy tails.  I know I read this somewhere
long ago, and it seems to work pretty well.
When you have skewness, looking for transformations is a good idea.
Where you can get messed up is when the skewness is caused by a lumping
at a value - e.g. the number of subjects who have 0 days of
hospitalization.  Then no transformation will help - and it's probably
better to fit models to no response and response given that it's greater
than 0.  This might be a two-part or hurdle model or a mixture of
distributions (such as zip or zinb)

Tony

Peter A. Lachenbruch
Department of Public Health
Oregon State University
Corvallis, OR 97330
Phone: 541-737-3832
FAX: 541-737-4001


-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Maarten buis
Sent: Wednesday, November 26, 2008 6:49 AM
To: statalist@hsphsun2.harvard.edu
Subject: Re: RE: st: significance of mean and median

--- Bastian Steingros <Steingros@gmx.de> wrote:
> using 
> sysuse auto, clear
> reg mpg, nohe
> mean mpg
> ttest mpg==0
> 
> displays the same results. However, how do these tests deal with the
> assumption, that mpg has to normal distributed? 
> More precisely , how important is the fact that mpg is normal
> distributed? Most of the variables in my sample are left or right
> skewed... 
> Is ttest also in this case reliable it? 

You can find that out using -simulate-. One way to figure this out is
to use simulation. You declare your data to be the population and
repeatedly test a true hypothesis on a random sample from your
"population" (N out of N with replacement, just like the bootstrap),
and than you look at whether the p-value folows a uniform distribution,
and whether you reject the null in only 5% of the samples. See the
example below and http://ideas.repec.org/p/boc/nsug08/14.html .

*-------------- begin example --------------------
capture program drop sim
program define sim, rclass
	sysuse auto, clear
	sum mpg, meanonly
	replace mpg = mpg - r(mean)
	bsample
	ttest mpg = 0
	return scalar p = r(p)
end
simulate p=r(p), reps(5000): sim

hist p // should be a uniform distribution

gen sig = p < .05

sum sig // mean should be .05
*--------------- end example --------------------
(For more on how to use examples I sent to the Statalist, see
http://home.fsw.vu.nl/m.buis/stata/exampleFAQ.html )

> by the way, median mpg require a option. So, how can I test if the
> median of a var. is significant without using this command? Because I
> have no idea which by-option would make sense in my sample.

I think that the term "significant" has done more harm than good
because it hides the null hypothesis. As a consequence too many
non-sensical hypotheses are being tested. What you need to do is to
specify a null hypothesis and justify why anyone should care about this
hypothesis. The hypothesis that the mean or the median of a variable is
zero is almost never of interest, and thus should almost never be
tested. It is usually much more interesting to compare the mean/median
between groups, for example men and women. So this is probably why it
never occured to someone (or no one thought it was worth their time) to
implement a test whether or not the median is equal to a certain fixed
value. 

> Nick Cox seems not to be fully agreed with LAD/qreg...

Nick can speak for himself, but I got the impression that he wasn't
negative about -qreg-, but just noted that -qreg- did not have a neat
test equivalent like -regress- and -ttest-. 

-- Maarten

-----------------------------------------
Maarten L. Buis
Department of Social Research Methodology
Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands

visiting address:
Buitenveldertselaan 3 (Metropolitan), room N515

+31 20 5986715

http://home.fsw.vu.nl/m.buis/
-----------------------------------------


      
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index