[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: The dependent variable is a multi-proportion in actual values

From   Maarten buis <>
Subject   Re: st: The dependent variable is a multi-proportion in actual values
Date   Mon, 15 Jun 2009 09:55:04 +0000 (GMT)

--- On Sat, 13/6/09, jverkuilen wrote:
> > See John Aitchison, 2003, Compositional Data Analysis.
> > -dirifit- implements the Dirichlet model, which is highly
> > restrictive. Otherwise you need to transform the proportions
> > and use a multivariate multiple regression type approach.

--- On Sun, 14/6/09, wrote:
> -fmlogit- by Maarten Buis (downloadable from SSC) does
> regression on such fractional or compositional data.

To add a bit of context: When you think of linear regression,
-regress-, you model two elements of the dependent variable:
the mean (and how it changes over the explanatory variables)
and the variance conditional on the expalantory variables 
(i.e. the variance of the error term, this is shown in the 
output of -regress- as "root MSE"). You are modeling multiple 
dependent variables (proportion spent on food, on cloths, and
on recreation), so appart from the mean and the variance you 
also have the covariance between the dependent variables. 

-dirifit- assumes that this covariance is always negative 
(For the exact forumula see page 32 of ).
This can make sense: if you spent more on clothing then
there is less income left to spent on recreation or food.
But this does not necesarily have to be the case: We could
imagine that "Fun-loving-people" would spent high 
proportions on both clothing and recreation, thus creating
a positive correlation between the two. The correlation 
structure of -dirifit- does not allow for this possibility
and can thus be considered to be pretty restricted. 

Often (but not always) we only care about the how the means 
(i.e. predicted proportions) changes when the explanatory 
variables change, the variances and covariances are in that 
case just nuisance parameters. If you have a large sample 
than you can use Quasi-likelihood to get correct inference 
even if you mis-specify the model of the nuisance 
parameters. This is what -fmlogit- does. The basic idea
is discussed in (Papke and wooldridge 1996). A critique
on quasi-likihood / robust standard errors in general 
can be found in (freedman 2006).

Hope this helps,

Freedman, David A. (2006) On The So-Called "Huber Sandwich
Estimator" and "Robust Standard Errors", The American 
Statistician, 60(4), pp. 299-302.

Papke, Leslie E. and Jeffrey M. Wooldridge. (1996)  
Econometric Methods for Fractional Response Variables with 
an Application to 401(k) Plan Participation Rates, Journal 
of Applied Econometrics 11(6):619-632.

Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen


*   For searches and help try:

© Copyright 1996–2022 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index