Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: OLS on a Small Sample

 From Nick Cox To "statalist@hsphsun2.harvard.edu" Subject Re: st: OLS on a Small Sample Date Wed, 5 Jun 2013 23:28:30 +0100

```With nothing else said, we assume that this is plain linear
regression.  Actually if "performance" is something positive, it's
quite possible that Poisson regression is  a better choice. See
William Gould's classic blog post at
http://blog.stata.com/2011/08/22/use-poisson-rather-than-regress-tell-a-friend/

What you should care about, in approximate order according to my instincts, are

0. 27 plants are for the same company. What kind of dependencies does
that create? Do we have as much information as from 27 plants from
different companies?  Is there a  cluster structure? What is the
population from which this is the sample? 4 plants refused too, so
that's a source of bias.

1. Whether the model is a good choice.  It is a plane in 3D space, so
there is some scope to visualize data points in that space, although
all the pairwise scatter plots, some basic residual plots and some
added variable plots might give sharper answers to what works and what
does not work well.

2. How reliable the coefficient estimates are

3. How reliable the standard errors are.

4. How reliable everything based on the SEs is.

As far as sample size goes, using t-based statistics is designed to
catch uncertainties based on sample size.

If other assumptions are wrong, the regression will be problematic,
but that's not necessarily related to sample size.

Bootstrapping the SEs won't do any harm. Checking the model fit by
something quite different, say -qreg-  or a different -glm-, is a
better check. Finding out how fuzzy an answer is does not imply that
you asked the right question. By the way, I would not say that
bootstrapping OLS makes an analysis nonparametric: it's still OLS.

Nick
njcoxstata@gmail.com

On 5 June 2013 22:36, Lloyd Dumont <lloyddumont@yahoo.com> wrote:

> A
> colleague of mine asked me to give him some feedback on a paper.  In it, he runs OLS on a sample of 27 of the 31
> plants in a single company, predicting performance as a function two
> independent variables (and a constant, of course).  (Four plants are excluded for idiosyncratic
> reasons, e.g., one makes an oddball product, one refused to share data,
> etc.)
>
> Assuming
> all of the other regression assumptions hold, should the relatively small
> sample size lead me to call these results into question?  I mean, how reliable are the standard errors
> and resulting t-tests when n = 27?
>
> And,
> assuming others are as uncomfortable as I am, is there an obvious nonparametric
> alternative to OLS for this situation?  If nothing else, he could use bootstrap standard errors instead of the
> standard variance estimator, right?
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
```