Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: Right-skewed dependent variable and spatial autocorrelation


From   Nick Cox <njcoxstata@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: RE: Right-skewed dependent variable and spatial autocorrelation
Date   Fri, 22 Jun 2012 00:08:31 +0100

You can use -findit- to look for one. If not, Stata is eminently programmable.

I would try a predictor which was a weighted average of neighbouring
values. I would certainly try something simple first.

On Thu, Jun 21, 2012 at 11:59 PM, Francisco Rowe <frowe@ucn.cl> wrote:
> Thanks Maarten, Nick and Austin,
>
> Nick: My only problem implementing the logit approach is incorporating and estimating a model with spatial autocorrelation through the error structure. How could it be implemented in Stata? Is there any user-written package?
>
> Regards,
>
> Francisco.
>
>
>
> On 22/06/2012, at 2:17 AM, Austin Nichols wrote:
>
>> Francisco Rowe <frowe@ucn.cl> :
>> Note also that neither skewness nor a theoretical limit on y prevents
>> you using a linear model; it is true that if you use continuous RHS
>> predictors you can get predictions outside the allowable range, but
>> often a linear model is preferable on various grounds, especially if
>> you have dummies and their interactions as predictors. Papke and
>> Wooldridge often point out that the linear analog gives the right
>> marginal effects inference, provided you use suitably robust SE
>> calculations.  Skewness in the depvar need not even be skewness in
>> errors; many predictors may also be right skewed.
>>
>> On Thu, Jun 21, 2012 at 10:14 AM, Nick Cox <n.j.cox@durham.ac.uk> wrote:
>>> I am puzzled by this. The idea that a proportional response is best modelled using a logit link function seems natural to me, but I can't see that spatial autocorrelation threatens or undermines that in any theoretical or practical sense. Much depends on how you intend to model the autocorrelation, whether through some assumption about error structure, through extra predictors, etc., but I don't see why you can't keep within a logit framework. You are just changing what is on the right-hand side of the model. (I would want to add predictors personally, but there you go.)
>>>
>>> Also, you seem to be worrying about zeros, but much of the point of a logit link is that observed zeros (or indeed observed ones) are _not_ problematic as the implication is that the _mean_ response given predictors is in (0,1) which is easy to buy. (If this were not true, -logit- would be quite impossible!)
>>>
>>> I can see no case for mapping a proportion p to log(p + 1); that does not even treat values near the extremes symmetrically, which is important in principle for a proportion. The right skewness here also does not sound worrying as for p near 0, logit p = ln [ p / (1 - p)] behaves like ln p and does what you want for skewness. (In any case, skewness of marginal distribution is of secondary concern.)
>>>
>>> I wouldn't move towards Poisson for this. You are evidently interested in proportions, not counts, so keep it there.
>>>
>>> Nick
>>> n.j.cox@durham.ac.uk
>>>
>>> Francisco Rowe
>>>
>>> I am trying to estimate a model in which the dependent variable is the share of people with bachelor degree or above. Its distribution has some particular properties. It is right skewed, non-negative and fluctuates between 0 and 0.5 (theoretically up to 1).
>>>
>>> To estimate a model with a dependent variable with these characteristics, the strategy proposed by Papke and Wooldridge (1996) could be used (Baum 2008). However, it does not account for spatial autocorrelation, which based on theoretical grounds and previous research seems to play a role. How can spatial autocorrelation be taken into account in the strategy proposed by Papke and Wooldridge (1996)?
>>>
>>> Regarding this, two options appear as alternatives:
>>>
>>> 1) Normalise the dependent variable and run either a spatial autorregresive or error model. However, normalisation is problematic given the zero values in the dependent variable. Someone suggested that I could use this transformation: y*=log(y+1), so map zeros to zeros. Does it seem appropriate?
>>>
>>> 2) Instead of using the share as dependent variable, use the count of people and run a poisson or zero-inflated poisson model with spatial autocorrelation. However, for this alternative, I don't know any command in Stata that can do this. Is there any? I also know a package in R that do this (spatcounts), but it is poorly documented so it is hard to know that input data structure and learn what it does.
>>>
>>> Do you suggest any other alternatives?
>>>
>>> Baum, C 2008, 'Stata tip 63: Modeling proportions', Stata Journal, vol. 8, no. 2, pp. 299-303.
>>> Papke, L and Wooldridge, J 1996, 'Econometric methods for fractional response variables with an application to 401 (k) plan participation rates', Journal of Applied Econometrics, vol. 11, no. 6, pp. 619-32.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index