Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Skewness estimates with svyset data


From   Paul Seed <paul.seed@kcl.ac.uk>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   Re: st: Skewness estimates with svyset data
Date   Mon, 17 Nov 2008 12:54:58 +0000

Dear Richard Palmer-Jones, Statalist,

I have not followed this thread from the beginning, so may have missed something important; however, I have a little experience of using the LMS method for growth charts.

In it's essence, the L, M and S are the parameters that define the expected distribution of the outcome at any given age. L is the shape transformation for a Box-Cox transformation, M is median, S is the spread (either standard deviaiton or coefficient of variations). A fitted model would include a formula for L, M, S (constant, linear, quadratic, fractional polynomial etc.)
depending on the age.

As far as I know, Cole's method for fitting such models, using penalised quasi-likelihoods, has never been properly implemented in Stata; I had a go, but found the time to convergence was unfeasibly long. Also, I found the -xriml- package by Wright & Thompson that did everything I needed at that time using maximum likelihood & generalised least squares (STB-40 sbe13.3). The regression tables from these include confidence intervals for the L parameter, which can be tested against a null value of 1 (identity transformation or no effect) in the usual way.

These packages include a large number of families of distributions that it would be excessive to describe here. I recommned you read the STB articles or consult the authors for more information.

However, -xriml- does not directly deal with repeated measurements over time, or with other features of survey data, which may rule it out for what Richard Palmer-Jones has in mind. I am not sure if Tim Cole's work on the LMS estimation method has anything to offer; I
know he has some growth data on measurements repeated at regular intervals.

If anyone were to improve -xriml- so that it could handle repeated measures, I, for one,
would be very grateful.


Paul T Seed MSc CStat, Lecturer in Medical Statistics,
tel  (+44) (0) 20 7188 3642, fax (+44) (0) 20 7620 1227
Wednesdays: (+4) (0) 20 7848 4148

paul.seed@kcl.ac.uk, paul.t.seed@gmail.com

King's College London, Division of Reproduction and Endocrinology
St Thomas' Hospital, Westminster Bridge Road, London SE1 7EH




Date: Sun, 16 Nov 2008 18:22:30 +0000
From: "Richard Palmer-Jones" <richard.palmerjones@gmail.com>
Subject: Re: st: Skewness estimates with svyset data

Thanks for this. I had to do something else for afew days, then got
the papers from ILL (our library did not have them) and I corresponded
with Cole and his co-author, who clarifed the original papers (Cole
1990, The LMS Method for constructing normalised growth curves,
European Journal of Clinical Nutrition, 44, 45-60, makes things
clear). They also pointed me to the two (MS excel add-in) progammes
they have published LMSGrowth and LMSChartmaker - the latter not being
immediately obvious).

LMSChartmaker allows you to input raw height, weight etc., and age
data and compute L, M, and S curves that can be input to LMSGrowth. M
is the median, S the coefficient of variation and L the Cox-Box Power
used to transform the indicator variable at each age. From a casual
reading I see that these parameters  are constrained to be smoothly
related to their neighbours.

M<y calculations of LMS suggest that L for adults is not constant at 1
(normal) over ages, but I need to onfirm this, even though I see that
the L variable in LMSChartmaker for height is 1. Strange. Weight is
certainly not normal. I agree that nlcom does not seem a reliable way
to calculate skewness (= 3rd moment).

They also directed me to Rigby and Stasinopoulous, 2005, Generalized
additive Models for location, scale and shape, Applied Statistics, 54,
pt. 3, 507-554, for a similar approach with a R suite of programmes,
which I have yet to explore and might be worth porting to Stata..

As soon as I get time I hope to produce my LMS parameters, and then
"z-scores" using LMSChartmaker, which will go back into Stata. It
should be possible to use the LMS parameters to extend thier zanthro
Stata ado file to enable that to be used beyond the age of 20 (USA
flavour) or 23 (UK flavour), but that is some way down the road.

I did a rough work through my data late one night which suggested that
whether one uses the standard (not adjusted fro skewness) z-scores of
males and females from USA data, or used the LMS z-scores I computed
from those data, there is no good reason to thing heights of Indian
males increased any faster (in tedrms of z-scores) than those of
Indian women, in fact, rather the reverse. But this needs more work.

.Richard


On Wed, Nov 5, 2008 at 1:22 PM, Nick Cox <n.j.cox@durham.ac.uk> wrote:
First, I think you need to keep explaining for the benefit of anyone
trying to pick up on this thread that LMS refers to a method devised by
[Timothy J.] Cole and others for handling growth curves. You earlier
gave a reference that was just Cole et al. 2008. Despite a strong hint
earlier from Stas Kolenikov, the further details of that reference are
still outstanding.

One of my dictionaries explains LMS as London Mathematical Society,
London Missionary Society, and London, Midland and Scottish Railway. It
is easy to guess that none of those apply but not so obvious that LMS
here does _not_ mean Least Median of Squares as devised by Rousseeuw, as
many statistically-minded people might imagine.

Rousseeuw, P.J. 1984.  Least median of squares regression.  Journal,
American Statistical Association 79: 871-880.

The more general point, which should be obvious except that many list
members act as if it were not true, is that the list includes people
from several quite different disciplines. Hence if you want to maximise
the readership of a question some explanations help a lot and rarely do
harm.

In terms of what you want to do:

Several people on this list should know much, much more about Cole's
method than I do but they are keeping quiet. I am surprised at the
implication that you need to feed skewness to Cole's method. That is
not, in particular, the case for -colelms- from SSC. I understood that
Cole's method was in essence designed to work well with the possibly
skew distributions that do occur and as such there is no specific need
to prepare the data or satisfy the assumptions of the method, as there
aren't any, except I guess that ages are accurate and size measurement
error negligible.

On the other hand, it may be that the missing reference, Cole et al.
2008,  gives a quite different twist to the method, but then we are back
to my earlier point.

In general ignoring some fraction of data in the tail seems a very bad
idea unless it is obvious that the values concerned are all
untrustworthy. Even them some sensitivity analysis (with outliers vs
without outliers) would seem advisable.

Nick
n.j.cox@durham.ac.uk

Richard Palmer-Jones

Yes, I have been planning to use LMS method - basically adding the
adult parameters to the child hood ones given there. LMS needs
skewness - hence my interest. I am only interested in the adults older
that 25 (when both males and females have reached their full height)
so complicated smoothing is not necessary.

Yes, NHANES has heavy weighting which makes a considerable difference
to estimates (and false PSUs).

However, since the skewness reported by summarize is positive in
adults I am wondering whether a simpler procedure is to truncate the
parameter for valuies > 2.5sd, or to transform to logs, or some such
and work in them. Unfortunately ln(weight) is also skewed.


Stas Kolenikov

To Nick: yes, I've used skewness and kurtosis to test for normality a
bunch of times (and there's a famous Mardia's multivariate
generalization that I programmed up :)). But frankly I personally
don't remember seeing confidence intervals on skewness anywhere at
all. Estimation and testing are two related ways of looking at data
with statistics, but with skewness and kurtosis you really estimate
something to see that it is close enough to zero... and sometimes you
don't even estimate a thing and go straight to the test statistic.
*

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index