Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: RE: Right-skewed dependent variable and spatial autocorrelation

 From Austin Nichols <[email protected]> To [email protected] Subject Re: st: RE: Right-skewed dependent variable and spatial autocorrelation Date Thu, 21 Jun 2012 12:17:08 -0400

```Francisco Rowe <[email protected]> :
Note also that neither skewness nor a theoretical limit on y prevents
you using a linear model; it is true that if you use continuous RHS
predictors you can get predictions outside the allowable range, but
often a linear model is preferable on various grounds, especially if
you have dummies and their interactions as predictors. Papke and
Wooldridge often point out that the linear analog gives the right
marginal effects inference, provided you use suitably robust SE
calculations.  Skewness in the depvar need not even be skewness in
errors; many predictors may also be right skewed.

On Thu, Jun 21, 2012 at 10:14 AM, Nick Cox <[email protected]> wrote:
> I am puzzled by this. The idea that a proportional response is best modelled using a logit link function seems natural to me, but I can't see that spatial autocorrelation threatens or undermines that in any theoretical or practical sense. Much depends on how you intend to model the autocorrelation, whether through some assumption about error structure, through extra predictors, etc., but I don't see why you can't keep within a logit framework. You are just changing what is on the right-hand side of the model. (I would want to add predictors personally, but there you go.)
>
> Also, you seem to be worrying about zeros, but much of the point of a logit link is that observed zeros (or indeed observed ones) are _not_ problematic as the implication is that the _mean_ response given predictors is in (0,1) which is easy to buy. (If this were not true, -logit- would be quite impossible!)
>
> I can see no case for mapping a proportion p to log(p + 1); that does not even treat values near the extremes symmetrically, which is important in principle for a proportion. The right skewness here also does not sound worrying as for p near 0, logit p = ln [ p / (1 - p)] behaves like ln p and does what you want for skewness. (In any case, skewness of marginal distribution is of secondary concern.)
>
> I wouldn't move towards Poisson for this. You are evidently interested in proportions, not counts, so keep it there.
>
> Nick
> [email protected]
>
> Francisco Rowe
>
> I am trying to estimate a model in which the dependent variable is the share of people with bachelor degree or above. Its distribution has some particular properties. It is right skewed, non-negative and fluctuates between 0 and 0.5 (theoretically up to 1).
>
> To estimate a model with a dependent variable with these characteristics, the strategy proposed by Papke and Wooldridge (1996) could be used (Baum 2008). However, it does not account for spatial autocorrelation, which based on theoretical grounds and previous research seems to play a role. How can spatial autocorrelation be taken into account in the strategy proposed by Papke and Wooldridge (1996)?
>
> Regarding this, two options appear as alternatives:
>
> 1) Normalise the dependent variable and run either a spatial autorregresive or error model. However, normalisation is problematic given the zero values in the dependent variable. Someone suggested that I could use this transformation: y*=log(y+1), so map zeros to zeros. Does it seem appropriate?
>
> 2) Instead of using the share as dependent variable, use the count of people and run a poisson or zero-inflated poisson model with spatial autocorrelation. However, for this alternative, I don't know any command in Stata that can do this. Is there any? I also know a package in R that do this (spatcounts), but it is poorly documented so it is hard to know that input data structure and learn what it does.
>
> Do you suggest any other alternatives?
>
> Baum, C 2008, 'Stata tip 63: Modeling proportions', Stata Journal, vol. 8, no. 2, pp. 299-303.
> Papke, L and Wooldridge, J 1996, 'Econometric methods for fractional response variables with an application to 401 (k) plan participation rates', Journal of Applied Econometrics, vol. 11, no. 6, pp. 619-32.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```