Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Proportional Independent Variables

From	Nick Cox <[email protected]>
To	[email protected]
Subject	Re: st: Proportional Independent Variables
Date	Thu, 28 Feb 2013 13:32:06 +0000

You will have to fudge the zeros (#2) before you apply logratios (#1).
As before, a key question is: are they structural (inevitably 0) or
sampling (happen to be or to be reported as 0)?

I got some of the guts of this field coded up as Mata functions a
while back, but there is no documentation and that may not help much.

// compositional data analysis

mata :

mata drop cda_*()

// NJC 1 Sept 2008
// rows scaled to sum to 1
real matrix function cda_closure(real matrix X) {
        return(X :/ rowsum(X))

}

// NJC 1 Sept 2008
// ln(all but last column / last column)
real matrix function cda_alr(real matrix X) {
        real scalar c, cm1
        c = cols(X); cm1 = c - 1
        return(ln(X[, (1 .. cm1)]) :- ln(X[, c]))
}

// NJC 1 Sept 2008
// ln(all / row geometric means)
real matrix function cda_clr(real matrix X) {
        return(ln(X) :- mean(ln(X'))')
}

// NJC 1 Sept 2008
// centring
real matrix cda_centre(real matrix X) {
        real rowvector centre, invcentre
        centre = cda_closure(exp(mean(ln(X))))
        invcentre = cda_closure((1 :/ centre))
        return(cda_closure(X :* invcentre))
}

// NJC 3 Sept 2008
// column geometric means
real matrix cda_colgmean(real matrix X) {
        return(exp(mean(ln(X))))
}

// NJC 3 Sept 2008
// row geometric means
real matrix cda_rowgmean(real matrix X) {
        return(exp(mean(ln(X'))'))
}

// NJC 2 Sept 2008
// multiplicative replacement for rounded zeros
real matrix cda_mrzero(real matrix X, real rowvector delta, | real
scalar total) {
        real matrix iszero
        if (total == .) total = 1
        iszero = X :== 0
        return((iszero :* delta) + ((!iszero) :* X :* (1 :-
rowsum(iszero :* delta) :/ total)))
}

// NJC 10 Oct 2008
// isometric log-ratio transformation
real matrix function cda_ilr(real matrix X) {
        real scalar c, j
        real matrix Y, lnX
        c = cols(X)
        Y = X[, (1 .. c - 1)]; lnX = ln(X)
        for (j = 1; j < c; j++) {
                Y[, j] = rowsum(lnX[, (1 .. j)]) - j * lnX[, j + 1]
                Y[, j] = (1 / sqrt(j * (j + 1))) * Y[, j]
        }
        return(Y)
}

end

On Thu, Feb 28, 2013 at 1:19 PM, nick bungy
<[email protected]> wrote:
> Thank you for your responses,
> My thoughts following this discussion are the following:
> 1. Apply a logratio transformation to the data in the short run
> 2. Look into a simplex mixture approach as a longer term aspiration, given my data does have a very large amount of 0's. I noticed the topic was mentioned in the book you kindly linked Nick, so that will be my first avenue to explore.
> Best,
> Nick
>
> ----------------------------------------
>> Date: Thu, 28 Feb 2013 07:35:23 -0500
>> Subject: Re: st: Proportional Independent Variables
>> From: [email protected]
>> To: [email protected]
>>
>> On Thu, Feb 28, 2013 at 4:19 AM, Nick Cox <[email protected]> wrote:
>> >
>> > 2. For different reasons log and logit transformations might be
>> > considered. There is a very inward-looking literature on compositional
>> > data analysis centred on more exotic transformations tailored to the
>> > problem. The reference I gave earlier is one entry into that.
>>
>> I was going to throw out the same reference. It's not a trivial
>> problem, but a narrow one due to the way it's been written. But the
>> walkaway message of most of it is that the log-ratio transformation is
>> the most reasonable one. This all just works out to being logit if you
>> only had two, or log-odds. The logic is very similar to the
>> multinomial logit, with the same difficult dependence structure.
>>
>>
>>
>> > 3. The two previous points are often complicated by measured zeros.
>> > There is then a long slow agony about whether they are structural or
>> > sampling zeros and what to do about them. The more components are
>> > measured, the worse this usually gets, whether it is a fractions of a
>> > budget spent on different things, or proportions of a material by
>> > elements or compounds or particle size classes, or whatever.
>>
>> Yes, this is a real issue, and unfortunately the transformations used
>> can create huge outlier problems, just like log transforms do when
>> there's a 0 value.
>> *
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: Proportional Independent Variables
  - From: nick bungy <[email protected]>
- Re: st: Proportional Independent Variables
  - From: Nick Cox <[email protected]>
- Re: st: Proportional Independent Variables
  - From: Nick Cox <[email protected]>
- Re: st: Proportional Independent Variables
  - From: Nick Cox <[email protected]>
- Re: st: Proportional Independent Variables
  - From: "JVerkuilen (Gmail)" <[email protected]>
- RE: st: Proportional Independent Variables
  - From: nick bungy <[email protected]>

Prev by Date: RE: st: Proportional Independent Variables
Next by Date: Re: statalist-digest V4 #4807 (st: reliability with -icc- ) - Statistics as APPLIED science
Previous by thread: RE: st: Proportional Independent Variables
Next by thread: Re: st: Proportional Independent Variables
Index(es):
- Date
- Thread