# Re: st: The dependent variable is a multi-proportion in actual values

 From Maarten buis To statalist@hsphsun2.harvard.edu Subject Re: st: The dependent variable is a multi-proportion in actual values Date Mon, 15 Jun 2009 09:55:04 +0000 (GMT)

```--- On Sat, 13/6/09, jverkuilen wrote:
> > See John Aitchison, 2003, Compositional Data Analysis.
> > -dirifit- implements the Dirichlet model, which is highly
> > restrictive. Otherwise you need to transform the proportions
> > and use a multivariate multiple regression type approach.

--- On Sun, 14/6/09, sjsamuels@gmail.com wrote:
> -fmlogit- by Maarten Buis (downloadable from SSC) does
> regression on such fractional or compositional data.

To add a bit of context: When you think of linear regression,
-regress-, you model two elements of the dependent variable:
the mean (and how it changes over the explanatory variables)
and the variance conditional on the expalantory variables
(i.e. the variance of the error term, this is shown in the
output of -regress- as "root MSE"). You are modeling multiple
dependent variables (proportion spent on food, on cloths, and
on recreation), so appart from the mean and the variance you
also have the covariance between the dependent variables.

-dirifit- assumes that this covariance is always negative
(For the exact forumula see page 32 of
http://home.fsw.vu.nl/m.buis/presentations/UKsug06.pdf ).
This can make sense: if you spent more on clothing then
there is less income left to spent on recreation or food.
But this does not necesarily have to be the case: We could
imagine that "Fun-loving-people" would spent high
proportions on both clothing and recreation, thus creating
a positive correlation between the two. The correlation
structure of -dirifit- does not allow for this possibility
and can thus be considered to be pretty restricted.

Often (but not always) we only care about the how the means
(i.e. predicted proportions) changes when the explanatory
variables change, the variances and covariances are in that
case just nuisance parameters. If you have a large sample
than you can use Quasi-likelihood to get correct inference
even if you mis-specify the model of the nuisance
parameters. This is what -fmlogit- does. The basic idea
is discussed in (Papke and wooldridge 1996). A critique
on quasi-likihood / robust standard errors in general
can be found in (freedman 2006).

Hope this helps,
Maarten

Freedman, David A. (2006) On The So-Called "Huber Sandwich
Estimator" and "Robust Standard Errors", The American
Statistician, 60(4), pp. 299-302.

Papke, Leslie E. and Jeffrey M. Wooldridge. (1996)
Econometric Methods for Fractional Response Variables with
an Application to 401(k) Plan Participation Rates, Journal
of Applied Econometrics 11(6):619-632.

-----------------------------------------
Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany

http://home.fsw.vu.nl/m.buis/
-----------------------------------------

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```