Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down at the end of May, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Austin Nichols <austinnichols@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: RE: Right-skewed dependent variable and spatial autocorrelation |

Date |
Thu, 21 Jun 2012 12:17:08 -0400 |

Francisco Rowe <frowe@ucn.cl> : Note also that neither skewness nor a theoretical limit on y prevents you using a linear model; it is true that if you use continuous RHS predictors you can get predictions outside the allowable range, but often a linear model is preferable on various grounds, especially if you have dummies and their interactions as predictors. Papke and Wooldridge often point out that the linear analog gives the right marginal effects inference, provided you use suitably robust SE calculations. Skewness in the depvar need not even be skewness in errors; many predictors may also be right skewed. On Thu, Jun 21, 2012 at 10:14 AM, Nick Cox <n.j.cox@durham.ac.uk> wrote: > I am puzzled by this. The idea that a proportional response is best modelled using a logit link function seems natural to me, but I can't see that spatial autocorrelation threatens or undermines that in any theoretical or practical sense. Much depends on how you intend to model the autocorrelation, whether through some assumption about error structure, through extra predictors, etc., but I don't see why you can't keep within a logit framework. You are just changing what is on the right-hand side of the model. (I would want to add predictors personally, but there you go.) > > Also, you seem to be worrying about zeros, but much of the point of a logit link is that observed zeros (or indeed observed ones) are _not_ problematic as the implication is that the _mean_ response given predictors is in (0,1) which is easy to buy. (If this were not true, -logit- would be quite impossible!) > > I can see no case for mapping a proportion p to log(p + 1); that does not even treat values near the extremes symmetrically, which is important in principle for a proportion. The right skewness here also does not sound worrying as for p near 0, logit p = ln [ p / (1 - p)] behaves like ln p and does what you want for skewness. (In any case, skewness of marginal distribution is of secondary concern.) > > I wouldn't move towards Poisson for this. You are evidently interested in proportions, not counts, so keep it there. > > Nick > n.j.cox@durham.ac.uk > > Francisco Rowe > > I am trying to estimate a model in which the dependent variable is the share of people with bachelor degree or above. Its distribution has some particular properties. It is right skewed, non-negative and fluctuates between 0 and 0.5 (theoretically up to 1). > > To estimate a model with a dependent variable with these characteristics, the strategy proposed by Papke and Wooldridge (1996) could be used (Baum 2008). However, it does not account for spatial autocorrelation, which based on theoretical grounds and previous research seems to play a role. How can spatial autocorrelation be taken into account in the strategy proposed by Papke and Wooldridge (1996)? > > Regarding this, two options appear as alternatives: > > 1) Normalise the dependent variable and run either a spatial autorregresive or error model. However, normalisation is problematic given the zero values in the dependent variable. Someone suggested that I could use this transformation: y*=log(y+1), so map zeros to zeros. Does it seem appropriate? > > 2) Instead of using the share as dependent variable, use the count of people and run a poisson or zero-inflated poisson model with spatial autocorrelation. However, for this alternative, I don't know any command in Stata that can do this. Is there any? I also know a package in R that do this (spatcounts), but it is poorly documented so it is hard to know that input data structure and learn what it does. > > Do you suggest any other alternatives? > > Baum, C 2008, 'Stata tip 63: Modeling proportions', Stata Journal, vol. 8, no. 2, pp. 299-303. > Papke, L and Wooldridge, J 1996, 'Econometric methods for fractional response variables with an application to 401 (k) plan participation rates', Journal of Applied Econometrics, vol. 11, no. 6, pp. 619-32. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: RE: Right-skewed dependent variable and spatial autocorrelation***From:*Francisco Rowe <frowe@ucn.cl>

**References**:**st: Right-skewed dependent variable and spatial autocorrelation***From:*Francisco Rowe <frowe@ucn.cl>

**st: RE: Right-skewed dependent variable and spatial autocorrelation***From:*Nick Cox <n.j.cox@durham.ac.uk>

- Prev by Date:
**st: Mhodds and Bootstrapping** - Next by Date:
**st: Independent variable lag length selection** - Previous by thread:
**st: RE: Right-skewed dependent variable and spatial autocorrelation** - Next by thread:
**Re: st: RE: Right-skewed dependent variable and spatial autocorrelation** - Index(es):