Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: RE: RE: RE: RE: Re: Compositional data


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: RE: RE: RE: RE: Re: Compositional data
Date   Wed, 5 Mar 2008 21:58:42 -0000

As you know, Maarten Buis is first author of -dirifit-. 

I am listed as second author because -dirifit- was written in the first
instance 
as a modification of something else I wrote (which in turn was written
as a modification
of something else by Stephen Jenkins). But the design is Maarten's. 

I guess you're mixing three quite different issues on the exact 0s and
1s.

1. Substantively, exact 0s and 1s can certainly be part of genuine and
informative
observations. I agree completely. 

2. Theoretically, I wasn't aware that the Dirichlet is general enough to

include spikes in its density function. -dirifit- is aimed at fitting
Dirichlet 
distributions. It is not intended for other distributions. 

3. Practically, I guess Maarten like most authors wanted most to get a
program
working that was suitable for his purposes. As -dirifit- is just a
wrapper 
for -ml-, your reference raises a very good question of whether the
procedure you 
refer to can be implemented with -ml-. I don't know. 

-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Verkuilen,
Jay
Sent: 05 March 2008 21:36
To: statalist@hsphsun2.harvard.edu
Subject: st: RE: RE: RE: RE: Re: Compositional data

Nick Cox wrote

>>If three variables say x, y, z add to 1 then x + y + z = 1 defines a
plane in 3-space 
and you can lay such a plane flat, i.e. project it onto 2-space 
without distortion. That, as everyone knows, is the reason you can draw
a triangular plot (or whatever else it's called). What is about that
which is not Euclidean? I think Euclid 
would have felt very much at home with that triangle.<< 

The issue is that the Euclidean distance between points in a triangle
plot doesn't say what most people think it says in analogy to an
ordinary scatterplot. That's all I meant. 


>>Anyway, all the alternatives I know to that stretch and shrink
different parts of the 
space, and none is more intuitive than the original. But some can be
more convenient. <<

The fact that compositional data are dependent due to the sum constraint
makes them strange. Unordered choice data has the exact same problem.
Aitchison provides some ways of dealing with the issue, but only at the
expense of having to look at nasty things like log-ratios.

As an aside, why did you guys kick out simplex corner observations in
the Dirichlet model? These are perfectly valid observations, indeed
quite possibly very informative ones since they say "I spent all my
budget on X". A boundary point is a pain because the likelihood is
undefined there, but the procedure described in one of Tim Fry's
articles (Fry, et al, Modelling Zeroes in Microdata, Applied Economics,
2000, 33, 383-392) avoids the problem and preserves subcompositional
invariance. 

Jay

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index