Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: Right-skewed dependent variable and spatial autocorrelation

From   Nick Cox <>
To   "''" <>
Subject   st: RE: Right-skewed dependent variable and spatial autocorrelation
Date   Thu, 21 Jun 2012 15:14:45 +0100

I am puzzled by this. The idea that a proportional response is best modelled using a logit link function seems natural to me, but I can't see that spatial autocorrelation threatens or undermines that in any theoretical or practical sense. Much depends on how you intend to model the autocorrelation, whether through some assumption about error structure, through extra predictors, etc., but I don't see why you can't keep within a logit framework. You are just changing what is on the right-hand side of the model. (I would want to add predictors personally, but there you go.) 

Also, you seem to be worrying about zeros, but much of the point of a logit link is that observed zeros (or indeed observed ones) are _not_ problematic as the implication is that the _mean_ response given predictors is in (0,1) which is easy to buy. (If this were not true, -logit- would be quite impossible!) 

I can see no case for mapping a proportion p to log(p + 1); that does not even treat values near the extremes symmetrically, which is important in principle for a proportion. The right skewness here also does not sound worrying as for p near 0, logit p = ln [ p / (1 - p)] behaves like ln p and does what you want for skewness. (In any case, skewness of marginal distribution is of secondary concern.) 

I wouldn't move towards Poisson for this. You are evidently interested in proportions, not counts, so keep it there. 


Francisco Rowe

I am trying to estimate a model in which the dependent variable is the share of people with bachelor degree or above. Its distribution has some particular properties. It is right skewed, non-negative and fluctuates between 0 and 0.5 (theoretically up to 1). 

To estimate a model with a dependent variable with these characteristics, the strategy proposed by Papke and Wooldridge (1996) could be used (Baum 2008). However, it does not account for spatial autocorrelation, which based on theoretical grounds and previous research seems to play a role. How can spatial autocorrelation be taken into account in the strategy proposed by Papke and Wooldridge (1996)?

Regarding this, two options appear as alternatives:
1) Normalise the dependent variable and run either a spatial autorregresive or error model. However, normalisation is problematic given the zero values in the dependent variable. Someone suggested that I could use this transformation: y*=log(y+1), so map zeros to zeros. Does it seem appropriate?

2) Instead of using the share as dependent variable, use the count of people and run a poisson or zero-inflated poisson model with spatial autocorrelation. However, for this alternative, I don't know any command in Stata that can do this. Is there any? I also know a package in R that do this (spatcounts), but it is poorly documented so it is hard to know that input data structure and learn what it does.

Do you suggest any other alternatives?

Baum, C 2008, 'Stata tip 63: Modeling proportions', Stata Journal, vol. 8, no. 2, pp. 299-303.
Papke, L and Wooldridge, J 1996, 'Econometric methods for fractional response variables with an application to 401 (k) plan participation rates', Journal of Applied Econometrics, vol. 11, no. 6, pp. 619-32.

I will appreciate your comments and suggestions.


*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index