Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.

# Re: st: Regression with multiple age groups

 From David Hoaglin To statalist@hsphsun2.harvard.edu Subject Re: st: Regression with multiple age groups Date Wed, 25 Apr 2012 09:09:33 -0400

Dear Shirley,

Others will agree that you need to tell the list more about the data
and the analysis that you intend to do, before we can make useful
suggestions.

It would be appropriate to handle the age categories, at least
initially, by using a separate dummy variable for each category except
the first (which will be fitted by the intercept term in your model).
You can then plot the coefficients against the midpoint of the
category and consider whether to revise the model.

You said that you have the age category, separately, for the husband
and the wife.  Thus, you would use two separate sets of dummy
variables, one for the husband's age and the other for the wife's age.
You may want to explore the possibility of interactions between the
two ages.

Please say more about the counts that you listed for 1985 and 1986 (I
assume that those are two of the 20 years).  They appear to be the
frequency distribution of the numbers of divorces by one of the sets
of age categories.  If you are projecting divorce rates, and those
counts represent the number of divorces in one year, what is the
denominator for the rate?  Do you have the denominator for each age
category (and even the combination of husband's age category and
wife's age category) or only for the year as a whole?

Do the data come from a survey?  If so, you will need to take the
sampling design and the weights into account.

What type of regression are you planning to use?  Ordinary regression
will probably not be appropriate for rates.  You should consider
Poisson regression (and perhaps negative binomial regression)?

So far, I can envision a model that contains a time pattern (initially
based on a dummy variable for each year after the first), effects for
husband's age, and effects for wife's age (and perhaps some form of
husband-wife interaction).  Do you have covariates that you are
planning to include?

David Hoaglin

On Wed, Apr 25, 2012 at 12:36 AM, Shirley Sy <shirleysy@hotmail.co.uk> wrote:
> Dear Statalisters,
> I am a complete beginner at Stata so my question is very basic but I am having trouble finding an answer on the web. I am doing a time series regression project forecasting divorce rates. My data spans 20 years and for both husband and wife the 'age at divorce' variable is split into groups i.e it looks something like this:
>
> Year           under 20         20 to 29         30 to 39         40 to 49        50 to 59      60plus      not stated1985             458               1154             78                52             3             2            3
> 1986             221               956              50                59             9             5            0
> How would I run a regression with the total numbers in each age group? Would I use dummy variables? I understand how I would do it if I had individual ages but since this is a time series model and I have the total number in each age group, I am finding it slightly more complicated.
>
> ThanksShirley

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/