st: RE: areg question

 From "Nick Cox" To Subject st: RE: areg question Date Sun, 26 Apr 2009 16:22:14 +0100

```Evidently Sue wants to explain or predict variations in infant
mortality.

At a guess, the *rain* variables are powers of rainfall (likely meaning,
mean annual rainfall) which may be serving, or intended to serve, as
proxies for various direct and indirect effects of climate.

I am pretty clear as an occasional climatologist that's there no
theoretical basis [pun intended] for using a polynomial representation
here and even if there were it would not be a good idea in practice. I'd
recommend instead some more stable method, say orthogonal polynomials or
a restricted cubic spline representation. See -orthpoly- or -rcspline-.

Nick
n.j.cox@durham.ac.uk

Sue

I'm running the following regression:

areg infant_mort *rain* urban country ethn_rc, absorb(mother_rc)

where urban, country and ethn_rc are variables that don't vary within
mother_rc (the FE category).

My questions are:

1) since urban, country and ethn_rc don't vary within mother_rc, they
should all get dropped. However, ethn_rc gets estimated. What is odd
is that when I generate
bys mother_rc: egen ddd = mean(ethn_rc)
gen diff = ethn_rc - ddd

diff has only values of zero and it still gets estimated with fixed
effects. Again, ethn_rc is constant within mother_rc.

2) there are 4 variables in rain: long_rain1, long_rain2, long_rain3
and long_rain4.

long_rain 1 and long_rain3 are highly correlated and long_rain2 and
long_rain4 are highly correlated(0.88). My understanding was that it
shouldn't get dropped unless they are perfectly correlated. However
one variable gets dropped(long_rain4). I looked at the raw data but
the numbers are not identical. The values are very small in the range
of 0.0xxx - 0.00xxx. Could this be causing the problem?

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```