Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nick Cox <njcoxstata@gmail.com> |

To |
"statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |

Subject |
Re: st: OLS on a Small Sample |

Date |
Wed, 5 Jun 2013 23:28:30 +0100 |

With nothing else said, we assume that this is plain linear regression. Actually if "performance" is something positive, it's quite possible that Poisson regression is a better choice. See William Gould's classic blog post at http://blog.stata.com/2011/08/22/use-poisson-rather-than-regress-tell-a-friend/ What you should care about, in approximate order according to my instincts, are 0. 27 plants are for the same company. What kind of dependencies does that create? Do we have as much information as from 27 plants from different companies? Is there a cluster structure? What is the population from which this is the sample? 4 plants refused too, so that's a source of bias. 1. Whether the model is a good choice. It is a plane in 3D space, so there is some scope to visualize data points in that space, although all the pairwise scatter plots, some basic residual plots and some added variable plots might give sharper answers to what works and what does not work well. 2. How reliable the coefficient estimates are 3. How reliable the standard errors are. 4. How reliable everything based on the SEs is. As far as sample size goes, using t-based statistics is designed to catch uncertainties based on sample size. If other assumptions are wrong, the regression will be problematic, but that's not necessarily related to sample size. Bootstrapping the SEs won't do any harm. Checking the model fit by something quite different, say -qreg- or a different -glm-, is a better check. Finding out how fuzzy an answer is does not imply that you asked the right question. By the way, I would not say that bootstrapping OLS makes an analysis nonparametric: it's still OLS. Nick njcoxstata@gmail.com On 5 June 2013 22:36, Lloyd Dumont <lloyddumont@yahoo.com> wrote: > A > colleague of mine asked me to give him some feedback on a paper. In it, he runs OLS on a sample of 27 of the 31 > plants in a single company, predicting performance as a function two > independent variables (and a constant, of course). (Four plants are excluded for idiosyncratic > reasons, e.g., one makes an oddball product, one refused to share data, > etc.) > > Assuming > all of the other regression assumptions hold, should the relatively small > sample size lead me to call these results into question? I mean, how reliable are the standard errors > and resulting t-tests when n = 27? > > And, > assuming others are as uncomfortable as I am, is there an obvious nonparametric > alternative to OLS for this situation? If nothing else, he could use bootstrap standard errors instead of the > standard variance estimator, right? * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**References**:**st: OLS on a Small Sample***From:*Lloyd Dumont <lloyddumont@yahoo.com>

- Prev by Date:
**Re: st: RE: Testing for instrument relevance and overidentification when the endogeneous variable is used in interaction terms** - Next by Date:
**st: estimating cumulative hazard** - Previous by thread:
**st: OLS on a Small Sample** - Next by thread:
**st: estimating cumulative hazard** - Index(es):