Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.

# Re: st: Interpretation of Two-sample t test with equal variances?

 From David Hoaglin To statalist@hsphsun2.harvard.edu Subject Re: st: Interpretation of Two-sample t test with equal variances? Date Wed, 20 Mar 2013 19:50:32 -0400

```Jay,

If the way people teach boxplots is the (main) source of the
difficulty, I would not be inclined to blame the boxplot!

I'm not aware of an assumption that outliers are an issue.  If the
data contain outliers, a boxplot will show them as individual points,
beyond the ends of the "whiskers."  The aim is to show observations
that are "outside" and may need further scrutiny.  People do refer,
incorrectly, to observations that are beyond the "fences" as
"outliers."  In data from a normal distribution, however, much more
than 5% of small to moderate-sized samples contain one or more
"outside" observations.

I'm not sure what you mean by "the box ends up being too big" if the
data are light-tailed.  I would expect the "whiskers" to be unusually
short.

A boxplot can do only so much.  The display was not designed to reveal
bimodal or multimodal data.  A dotplot would usually show that
structure easily.

David Hoaglin

On Wed, Mar 20, 2013 at 7:19 PM, JVerkuilen (Gmail)
<jvverkuilen@gmail.com> wrote:
> On Wed, Mar 20, 2013 at 3:22 PM, David Hoaglin <dchoaglin@gmail.com> wrote:
>> Jay,
>>
>> I'm not aware that boxplots make any assumptions.  They show what they
>> are intended to show.  Their "performance" comes from the way people
>> interpret them.  Boxplots of skewed data will tend to have certain
>> characteristics, boxplots of light-tailed data will have other
>> characteristics, and so on.  Some patterns suggest bimodal data.
>
> Oh definitely they show what they were intended to show, and they are
> incredibly useful, but the way we teach them I think leads many folks
> down the garden path. The assumptions I'm thinking of include ones
> such as the largely unstated background assumption that outliers are
> an issue. I've become adept at recognizing when a boxplot is giving me
> a light tailed distribution because the box ends up being too big, but
> if you have multiple modes that will get blown away and they provide
> too much reduction.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
```