From Austin Nichols To statalist@hsphsun2.harvard.edu Subject Re: st: Graphing quadratic relationship Date Tue, 22 Sep 2009 08:12:40 -0400

```Nick, Ronan, et al.--
I believe the bell or Gaussian shape comes up a lot, but including a
quadratic term in a logit need not get you good estimates if it is not
the "true" model. Though of course it may be good enough to give
predicted probabilities with approximately the right joint
distribution with X. It is my impression the shape often arises due to
competing risks, e.g. in the case of Dietary Reference Intakes in the
nutrition field, where we can imagine two competing logistic curves:
one of increasing risk of some body system breakdown given too little
of nutrient X and another of increasing risk given too much of
nutrient X.  The risk is then of morbidity (or mortality), but the
illnesses, symptoms, and treatments are quite different in the two
directions.  Of course the original poster does not indicate what kind
of theoretical model underlies the "need to show how both the 0's and
1's vary according to x and x^2."  I think -locpr- on SSC is a good
starting point in the absence of any compelling theory. A true local
logit regression could be even better. Doesn't -fracpoly- assume an
unbounded depvar?

On Tue, Sep 22, 2009 at 7:02 AM, Nick Cox <n.j.cox@durham.ac.uk> wrote:
> Ronan's prejudices usually match mine, but I think what he's warning about does not bite in this case.
>
> It's true that a quadratic model is in general dangerous because the squared term will become arbitrarily large for extreme values of the relevant predictor.
>
> In practice that may not bite if the squared term is just acting to flatten or add slight curvature within the range of the data, as seems often to be so when income is regressed against age.
>
> In this case, the logit link will ensure that even if the squared term is very large within, or beyond, the data range the predicted probability will just go to 0 or 1 as the case may be.
>
> In addition, logit or logistic with a quadratic on the RHS is a very interesting model because it is a neat way of fitting a Gaussian shape.
> This is well known in at least some fields: ecology is one known to me.
>
> See e.g. Jongman, R.H.G., ter Braak, C.J.F. and van Tongeren, O.F.R. (eds) 1995. Data analysis in community and landscape ecology. Cambridge University Press.
>
> Thus it is common that probability of occurrence of some species can be modelled as increasing to and then decreasing from a maximum along environmental gradients, e.g. rather too wet -- just right -- rather too dry; rather too hot etc. Farmers, gardeners and observant tourists know this too!
>
> This may be too well known to deserve emphasis, but equally there may be others who like me were surprised to come across this neat idea. Conversely, I'd be interested in non-ecological examples or references.
>
> Nick
> n.j.cox@durham.ac.uk
>
> Ronan Conroy
>
> I would be wary of quadratic terms, which tend to produce nonsense
> estimates at the extremes of the data, or to extrapolate to nonsense
> estimates beyond the observed range. Have you tried fractional
> polynomials?
>
> On 18 MFómh 2009, at 18:35, Stephanie L Kent wrote:
>
>> I would like to graph a relationship between a quadratic independent
>> variable and my dependent variable to see how y varies acccording to
>> x and
>> x^2.  It's a logistic regression so my DV is 0-1 and I need to show
>> how
>> both the 0's and 1's vary according to x and x^2.  Any advice on how
>> to get
>> started is much appreciated!
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```