On Thu, Feb 28, 2013 at 4:19 AM, Nick Cox <njcoxstata@gmail.com> wrote:
>
> 2. For different reasons log and logit transformations might be
> considered. There is a very inward-looking literature on compositional
> data analysis centred on more exotic transformations tailored to the
> problem. The reference I gave earlier is one entry into that.
I was going to throw out the same reference. It's not a trivial
problem, but a narrow one due to the way it's been written. But the
walkaway message of most of it is that the log-ratio transformation is
the most reasonable one. This all just works out to being logit if you
only had two, or log-odds. The logic is very similar to the
multinomial logit, with the same difficult dependence structure.
> 3. The two previous points are often complicated by measured zeros.
> There is then a long slow agony about whether they are structural or
> sampling zeros and what to do about them. The more components are
> measured, the worse this usually gets, whether it is a fractions of a
> budget spent on different things, or proportions of a material by
> elements or compounds or particle size classes, or whatever.
Yes, this is a real issue, and unfortunately the transformations used
can create huge outlier problems, just like log transforms do when
there's a 0 value.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/