[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Skewness estimates with svyset data
"Richard Palmer-Jones" <email@example.com>
Re: st: Skewness estimates with svyset data
Sun, 23 Nov 2008 11:18:36 +0000
I have been a bit distracted looking at R for reasons explained below,
so had not yet replied.
I do not have repeat measures - I am using cross section data on
adults. It would be interesting to have repeat anthropometric measures
on adults (males as well as females) as cross section data are surely
affected by survival.
I have not looked as -xriml- but probably should. Instead I have been
trying to use the GAMLSS package in R (http://www.gamlss.com/) which
is apparently used in the recent WHO nutritional charts (Anthro -
http://www.who.int/childgrowth/software/en/). I am having convergence
problems, by the way.
For heights conventional tests (omninorm, sktest) do not generally
show significant skewness of height (by single year of age, or in 5
year groups) in adults from 20-55, apart from a few which I suppose is
to be expected. However, both LMSChartmaker and GAMLSS seem to
indicate there is. LMSGrowth assums L = 1 which is no skewness, so
maybe this is a red herring unless I want to work with weight?
For those who interested I have adapted Huebler's AutoIt scripts to
run with R from Textpad (or notepad++), which helps from familiarity
with using these editors with Stata.
On Mon, Nov 17, 2008 at 12:54 PM, Paul Seed <firstname.lastname@example.org> wrote:
> Dear Richard Palmer-Jones, Statalist,
> I have not followed this thread from the beginning, so may have missed
> something important;
> however, I have a little experience of using the LMS method for growth
> In it's essence, the L, M and S are the parameters that define the expected
> distribution of
> the outcome at any given age. L is the shape transformation for a Box-Cox
> M is median, S is the spread (either standard deviaiton or coefficient of
> A fitted model would include a formula for L, M, S (constant, linear,
> quadratic, fractional polynomial etc.)
> depending on the age.
> As far as I know, Cole's method for fitting such models, using penalised
> quasi-likelihoods, has never been
> properly implemented in Stata; I had a go, but found the time to convergence
> was unfeasibly long.
> Also, I found the -xriml- package by Wright & Thompson that did everything I
> at that time using maximum likelihood & generalised least squares (STB-40
> sbe13.3). The regression tables from these include confidence intervals for
> the L parameter, which can be
> tested against a null value of 1 (identity transformation or no effect) in
> the usual way.
> These packages include a large number of families of distributions that it
> would be excessive to describe here.
> I recommned you read the STB articles or consult the authors for more
> However, -xriml- does not directly deal with repeated measurements over
> time, or with
> other features of survey data, which may rule it out for what Richard
> Palmer-Jones has
> in mind. I am not sure if Tim Cole's work on the LMS estimation method has
> anything to offer; I
> know he has some growth data on measurements repeated at regular intervals.
> If anyone were to improve -xriml- so that it could handle repeated measures,
> I, for one,
> would be very grateful.
> Paul T Seed MSc CStat, Lecturer in Medical Statistics,
> tel (+44) (0) 20 7188 3642, fax (+44) (0) 20 7620 1227
> Wednesdays: (+4) (0) 20 7848 4148
> email@example.com, firstname.lastname@example.org
> King's College London, Division of Reproduction and Endocrinology
> St Thomas' Hospital, Westminster Bridge Road, London SE1 7EH
>> Date: Sun, 16 Nov 2008 18:22:30 +0000
>> From: "Richard Palmer-Jones" <email@example.com>
>> Subject: Re: st: Skewness estimates with svyset data
>> Thanks for this. I had to do something else for afew days, then got
>> the papers from ILL (our library did not have them) and I corresponded
>> with Cole and his co-author, who clarifed the original papers (Cole
>> 1990, The LMS Method for constructing normalised growth curves,
>> European Journal of Clinical Nutrition, 44, 45-60, makes things
>> clear). They also pointed me to the two (MS excel add-in) progammes
>> they have published LMSGrowth and LMSChartmaker - the latter not being
>> immediately obvious).
>> LMSChartmaker allows you to input raw height, weight etc., and age
>> data and compute L, M, and S curves that can be input to LMSGrowth. M
>> is the median, S the coefficient of variation and L the Cox-Box Power
>> used to transform the indicator variable at each age. From a casual
>> reading I see that these parameters are constrained to be smoothly
>> related to their neighbours.
>> M<y calculations of LMS suggest that L for adults is not constant at 1
>> (normal) over ages, but I need to onfirm this, even though I see that
>> the L variable in LMSChartmaker for height is 1. Strange. Weight is
>> certainly not normal. I agree that nlcom does not seem a reliable way
>> to calculate skewness (= 3rd moment).
>> They also directed me to Rigby and Stasinopoulous, 2005, Generalized
>> additive Models for location, scale and shape, Applied Statistics, 54,
>> pt. 3, 507-554, for a similar approach with a R suite of programmes,
>> which I have yet to explore and might be worth porting to Stata..
>> As soon as I get time I hope to produce my LMS parameters, and then
>> "z-scores" using LMSChartmaker, which will go back into Stata. It
>> should be possible to use the LMS parameters to extend thier zanthro
>> Stata ado file to enable that to be used beyond the age of 20 (USA
>> flavour) or 23 (UK flavour), but that is some way down the road.
>> I did a rough work through my data late one night which suggested that
>> whether one uses the standard (not adjusted fro skewness) z-scores of
>> males and females from USA data, or used the LMS z-scores I computed
>> from those data, there is no good reason to thing heights of Indian
>> males increased any faster (in tedrms of z-scores) than those of
>> Indian women, in fact, rather the reverse. But this needs more work.
>> On Wed, Nov 5, 2008 at 1:22 PM, Nick Cox <firstname.lastname@example.org> wrote:
>>> First, I think you need to keep explaining for the benefit of anyone
>>> trying to pick up on this thread that LMS refers to a method devised by
>>> [Timothy J.] Cole and others for handling growth curves. You earlier
>>> gave a reference that was just Cole et al. 2008. Despite a strong hint
>>> earlier from Stas Kolenikov, the further details of that reference are
>>> still outstanding.
>>> One of my dictionaries explains LMS as London Mathematical Society,
>>> London Missionary Society, and London, Midland and Scottish Railway. It
>>> is easy to guess that none of those apply but not so obvious that LMS
>>> here does _not_ mean Least Median of Squares as devised by Rousseeuw, as
>>> many statistically-minded people might imagine.
>>> Rousseeuw, P.J. 1984. Least median of squares regression. Journal,
>>> American Statistical Association 79: 871-880.
>>> The more general point, which should be obvious except that many list
>>> members act as if it were not true, is that the list includes people
>>> from several quite different disciplines. Hence if you want to maximise
>>> the readership of a question some explanations help a lot and rarely do
>>> In terms of what you want to do:
>>> Several people on this list should know much, much more about Cole's
>>> method than I do but they are keeping quiet. I am surprised at the
>>> implication that you need to feed skewness to Cole's method. That is
>>> not, in particular, the case for -colelms- from SSC. I understood that
>>> Cole's method was in essence designed to work well with the possibly
>>> skew distributions that do occur and as such there is no specific need
>>> to prepare the data or satisfy the assumptions of the method, as there
>>> aren't any, except I guess that ages are accurate and size measurement
>>> error negligible.
>>> On the other hand, it may be that the missing reference, Cole et al.
>>> 2008, gives a quite different twist to the method, but then we are back
>>> to my earlier point.
>>> In general ignoring some fraction of data in the tail seems a very bad
>>> idea unless it is obvious that the values concerned are all
>>> untrustworthy. Even them some sensitivity analysis (with outliers vs
>>> without outliers) would seem advisable.
>>> Richard Palmer-Jones
>>> Yes, I have been planning to use LMS method - basically adding the
>>> adult parameters to the child hood ones given there. LMS needs
>>> skewness - hence my interest. I am only interested in the adults older
>>> that 25 (when both males and females have reached their full height)
>>> so complicated smoothing is not necessary.
>>> Yes, NHANES has heavy weighting which makes a considerable difference
>>> to estimates (and false PSUs).
>>> However, since the skewness reported by summarize is positive in
>>> adults I am wondering whether a simpler procedure is to truncate the
>>> parameter for valuies > 2.5sd, or to transform to logs, or some such
>>> and work in them. Unfortunately ln(weight) is also skewed.
>>>> Stas Kolenikov
>>>> To Nick: yes, I've used skewness and kurtosis to test for normality a
>>>> bunch of times (and there's a famous Mardia's multivariate
>>>> generalization that I programmed up :)). But frankly I personally
>>>> don't remember seeing confidence intervals on skewness anywhere at
>>>> all. Estimation and testing are two related ways of looking at data
>>>> with statistics, but with skewness and kurtosis you really estimate
>>>> something to see that it is close enough to zero... and sometimes you
>>>> don't even estimate a thing and go straight to the test statistic.
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
* For searches and help try: