Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: OLS on a Small Sample


From   Nick Cox <njcoxstata@gmail.com>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   Re: st: OLS on a Small Sample
Date   Wed, 5 Jun 2013 23:28:30 +0100

With nothing else said, we assume that this is plain linear
regression.  Actually if "performance" is something positive, it's
quite possible that Poisson regression is  a better choice. See
William Gould's classic blog post at
http://blog.stata.com/2011/08/22/use-poisson-rather-than-regress-tell-a-friend/

What you should care about, in approximate order according to my instincts, are

0. 27 plants are for the same company. What kind of dependencies does
that create? Do we have as much information as from 27 plants from
different companies?  Is there a  cluster structure? What is the
population from which this is the sample? 4 plants refused too, so
that's a source of bias.

1. Whether the model is a good choice.  It is a plane in 3D space, so
there is some scope to visualize data points in that space, although
all the pairwise scatter plots, some basic residual plots and some
added variable plots might give sharper answers to what works and what
does not work well.

2. How reliable the coefficient estimates are

3. How reliable the standard errors are.

4. How reliable everything based on the SEs is.

As far as sample size goes, using t-based statistics is designed to
catch uncertainties based on sample size.

If other assumptions are wrong, the regression will be problematic,
but that's not necessarily related to sample size.

Bootstrapping the SEs won't do any harm. Checking the model fit by
something quite different, say -qreg-  or a different -glm-, is a
better check. Finding out how fuzzy an answer is does not imply that
you asked the right question. By the way, I would not say that
bootstrapping OLS makes an analysis nonparametric: it's still OLS.

Nick
njcoxstata@gmail.com


On 5 June 2013 22:36, Lloyd Dumont <lloyddumont@yahoo.com> wrote:

> A
> colleague of mine asked me to give him some feedback on a paper.  In it, he runs OLS on a sample of 27 of the 31
> plants in a single company, predicting performance as a function two
> independent variables (and a constant, of course).  (Four plants are excluded for idiosyncratic
> reasons, e.g., one makes an oddball product, one refused to share data,
> etc.)
>
> Assuming
> all of the other regression assumptions hold, should the relatively small
> sample size lead me to call these results into question?  I mean, how reliable are the standard errors
> and resulting t-tests when n = 27?
>
> And,
> assuming others are as uncomfortable as I am, is there an obvious nonparametric
> alternative to OLS for this situation?  If nothing else, he could use bootstrap standard errors instead of the
> standard variance estimator, right?
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index