Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: When number of regressors greater than the number of clusters in OLS regression


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: When number of regressors greater than the number of clusters in OLS regression
Date   Tue, 2 Sep 2008 12:57:57 +0100

In the back-and-forth with several penetrating comments from Mark
Schaffer and Steve Samuels one key question was raised by Steve but not
as far as I can see really answered and another key question was not
raised at all. 

First off, at the risk of being obvious, states for which data are
available as sampled population seem most unlikely on the face of it to
be a undistorted sample of the target population, presumably all India.
My guess would be that various states with no data, say those in remote
or mountainous areas or politically or militarily sensitive, are also
often states with low provision. (I'll bet Kashmir or Himachal Pradesh
is not in the 17, for example.) As your research question seems likely
to entail extra-statistical inference to all India, it would be vital to
take account as far as you possibly can of the likely biases. For
example, you could try to see where the 17 lie in the all-India
frequency distributions for your predictors or for other
standard-of-living measures or proxies. 

Second, share whether measured as proportion (0-1) or percent (0-100%)
is bounded and that raises the question, often addressed on this list,
of whether your modelling should pay direct attention to that. There is
nothing in standard regression that guarantees predictions for such a
response within feasible ranges, and worrying econometrics-style about
how to handle the error term should surely take second place to thinking
about the best handling of the response variable! At best this may not
bite much in practice if values are near the middle of the range, 0.5 or
50%, and vary little. However, a wild guess is that your likely range is
much larger than that and that values near 0.1 or 0.9 may arise in some
districts. The problem will be compounded if your project tempts you
into making out-of-sample predictions for areas where share is expected
to be low. 

Kit Baum recently surveyed the leading options here in a concise and
highly informative Stata Journal Tip: 

SJ-8-2  st0147  . . . . . . . . . . . . . . Stata tip 63: Modeling
proportions
        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C.
F. Baum
        Q2/08   SJ 8(2):299--303                                 (no
commands)
        tip on how to model a response variable that appears
        as a proportion or fraction

and, as said, there has been much discussion on the list on how to
handle proportional responses.   

Nick
n.j.cox@durham.ac.uk 

Divya Balasubramaniam

Thank you all for your invaluable suggestions. I really appreciate it.


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index