Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: OLS on a Small Sample

From	Nick Cox <[email protected]>
To	"[email protected]" <[email protected]>
Subject	Re: st: OLS on a Small Sample
Date	Wed, 5 Jun 2013 23:28:30 +0100

With nothing else said, we assume that this is plain linear
regression.  Actually if "performance" is something positive, it's
quite possible that Poisson regression is  a better choice. See
William Gould's classic blog post at
http://blog.stata.com/2011/08/22/use-poisson-rather-than-regress-tell-a-friend/

What you should care about, in approximate order according to my instincts, are

0. 27 plants are for the same company. What kind of dependencies does
that create? Do we have as much information as from 27 plants from
different companies?  Is there a  cluster structure? What is the
population from which this is the sample? 4 plants refused too, so
that's a source of bias.

1. Whether the model is a good choice.  It is a plane in 3D space, so
there is some scope to visualize data points in that space, although
all the pairwise scatter plots, some basic residual plots and some
added variable plots might give sharper answers to what works and what
does not work well.

2. How reliable the coefficient estimates are

3. How reliable the standard errors are.

4. How reliable everything based on the SEs is.

As far as sample size goes, using t-based statistics is designed to
catch uncertainties based on sample size.

If other assumptions are wrong, the regression will be problematic,
but that's not necessarily related to sample size.

Bootstrapping the SEs won't do any harm. Checking the model fit by
something quite different, say -qreg-  or a different -glm-, is a
better check. Finding out how fuzzy an answer is does not imply that
you asked the right question. By the way, I would not say that
bootstrapping OLS makes an analysis nonparametric: it's still OLS.

Nick
[email protected]

On 5 June 2013 22:36, Lloyd Dumont <[email protected]> wrote:

> A
> colleague of mine asked me to give him some feedback on a paper.  In it, he runs OLS on a sample of 27 of the 31
> plants in a single company, predicting performance as a function two
> independent variables (and a constant, of course).  (Four plants are excluded for idiosyncratic
> reasons, e.g., one makes an oddball product, one refused to share data,
> etc.)
>
> Assuming
> all of the other regression assumptions hold, should the relatively small
> sample size lead me to call these results into question?  I mean, how reliable are the standard errors
> and resulting t-tests when n = 27?
>
> And,
> assuming others are as uncomfortable as I am, is there an obvious nonparametric
> alternative to OLS for this situation?  If nothing else, he could use bootstrap standard errors instead of the
> standard variance estimator, right?
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: OLS on a Small Sample
  - From: Lloyd Dumont <[email protected]>

Prev by Date: Re: st: RE: Testing for instrument relevance and overidentification when the endogeneous variable is used in interaction terms
Next by Date: st: estimating cumulative hazard
Previous by thread: st: OLS on a Small Sample
Next by thread: st: estimating cumulative hazard
Index(es):
- Date
- Thread