[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: When number of regressors greaterthan the number of clusters in OLS regression

From   Divya Balasubramaniam <>
Subject   RE: st: When number of regressors greaterthan the number of clusters in OLS regression
Date   Tue, 2 Sep 2008 08:32:00 -0400 (EDT)

Hello Dr.Cox,

Thanks a lot for pointing the issue on the share variable. I will look into the reference.


---- Original message ----
>Date: Tue, 2 Sep 2008 12:57:57 +0100
>From: "Nick Cox" <>  
>Subject: RE: st: When number of regressors greater than the number of clusters in OLS regression  
>To: <>
>In the back-and-forth with several penetrating comments from Mark
>Schaffer and Steve Samuels one key question was raised by Steve but not
>as far as I can see really answered and another key question was not
>raised at all. 
>First off, at the risk of being obvious, states for which data are
>available as sampled population seem most unlikely on the face of it to
>be a undistorted sample of the target population, presumably all India.
>My guess would be that various states with no data, say those in remote
>or mountainous areas or politically or militarily sensitive, are also
>often states with low provision. (I'll bet Kashmir or Himachal Pradesh
>is not in the 17, for example.) As your research question seems likely
>to entail extra-statistical inference to all India, it would be vital to
>take account as far as you possibly can of the likely biases. For
>example, you could try to see where the 17 lie in the all-India
>frequency distributions for your predictors or for other
>standard-of-living measures or proxies. 
>Second, share whether measured as proportion (0-1) or percent (0-100%)
>is bounded and that raises the question, often addressed on this list,
>of whether your modelling should pay direct attention to that. There is
>nothing in standard regression that guarantees predictions for such a
>response within feasible ranges, and worrying econometrics-style about
>how to handle the error term should surely take second place to thinking
>about the best handling of the response variable! At best this may not
>bite much in practice if values are near the middle of the range, 0.5 or
>50%, and vary little. However, a wild guess is that your likely range is
>much larger than that and that values near 0.1 or 0.9 may arise in some
>districts. The problem will be compounded if your project tempts you
>into making out-of-sample predictions for areas where share is expected
>to be low. 
>Kit Baum recently surveyed the leading options here in a concise and
>highly informative Stata Journal Tip: 
>SJ-8-2  st0147  . . . . . . . . . . . . . . Stata tip 63: Modeling
>        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C.
>F. Baum
>        Q2/08   SJ 8(2):299--303                                 (no
>        tip on how to model a response variable that appears
>        as a proportion or fraction
>and, as said, there has been much discussion on the list on how to
>handle proportional responses.   
>Divya Balasubramaniam
>Thank you all for your invaluable suggestions. I really appreciate it.
>*   For searches and help try:
Divya Balasubramaniam
Economics PhD Student
Terry College of Business
University of Georgia
Athens -30602.
*   For searches and help try:

© Copyright 1996–2015 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index