Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Can Y be a predicted variable?

From   Tinna <>
Subject   Re: st: Can Y be a predicted variable?
Date   Fri, 9 Sep 2005 15:45:40 -0400

So will I get fried if I do it my proposed way, or will the results
just be difficult to read for non-statisticians.

On 9/9/05, Roger Newson <> wrote:
> At 18:31 09/09/2005, Tina wrote:
> >Dear statalisters,
> >
> >I have a dependent variable in 5 levels (Self-Assessed Health Status
> >from very good to very poor). I am currently assuming a latent
> >continuous variable, but that is problematic for some of my analysis.
> >I have some other measures of health in my data and was wondering if
> >it was appropriate to create a new one that would be continuous. My
> >suggestion would be:
> >
> >1. regress SAHS on other health variables.
> >2. Predict SAHS (lets call it SAHShat) based on the previous regression.
> >3. The new measure would be calculated as an average of SAHS and SAHShat
> >
> >This looks like a good idea to me, but I wonder why I don't see anyone
> >else doing this if it is OK. Those of you that fell of your office
> >chairs in laughter could maybe get back on and explain why not,
> >because it seems fine idea to me right now.
> In general, categorical ordinal outcomes like SAHS are a problem,
> especially if you want to have a parameter estimate that can be understood
> by non-statisticians.
> A possible solution is to use Somers' D, which can be estimated (with
> confidence limits) using the -somersd- package (downloadable from SSC using
> the -ssc- command). Somers' D is defined in terms of Kendall's tau-a, which
> is defined as
> tau(X,Y) = E[sign(X1-X2)sign(Y1-Y2)]
> where (X1,Y1) and (X2,Y2) are sampled independently from the same
> population. Somers' D is defined as
> D(Y|X) = tau(X,Y)/tau(X,X)
> Therefore, Kendall's tau-a is the difference between 2 probabilities,
> namely the probability that the larger of 2 randomly-sampled X-values is
> associated with the larger of the 2 corresponding Y-values and the
> probability that the larger of the 2 X-values is associated with the
> smaller of the 2 Y-values. Somers' D is the difference between the 2
> corresponding conditional probabilities, given that the 2 X-values are not
> equal. Somers' D and Kendall's tau-a are discussed in the manual
> -somersd.pdf-, distributed on SSC with the -somersd- package, and also in
> Newson (2002).
> Tina does not mention the proposed predictor variables in the proposed
> regression model. However, in a multivariate regression model, there is
> usually one predictor X that is really interesting and other predictors
> that are confounders. For instance, we might want to know how daily
> cigarette consumption predicts SAHS, adjusting for confounders such as
> income, access to a car. and other indicators of general standard of
> living. To estimate a Somers' D of SAHS with respect to cigarettes adjusted
> for the confounders, the first step is to define a propensity score for
> cigarette consumption by regressing cigarette consumption with respect to
> confounders and using the predicted level of cigarette consumption
> (calculated using -predict-) as the cigarette propensity score. We can then
> use -xtile- to define a number of cigarette propensity groups from the
> propensity score, and use -somersd- with the -wstrata()- option to estimate
> a Somers' D of SAHS with respect to cigarette consumption stratified by
> cigarette propensity group. This Somers' D measures association between
> SAHS and cigarette consumption in pairs of patients in the same cigarette
> propensity group. If it is high, then we can say that higher cigarette
> consumers have poorer health than lower cigarette smokers with similar
> "cigarette propensity" based on the confounders. In other words, if the
> stratified Somers' D is high, then the poorer health of cigarette smokers
> is not caused by the fact that cigarette smokers are cigarette-prone
> because of their low general standard of living. Some references about
> propensity scores are given on the manual -somersd.pdf-.
> I hope this helps.
> Roger
> References
> Newson R. Parameters behind "nonparametric" statistics: Kendall's tau,
> Somers' D and median differences. The Stata Journal 2002; 2 (1): 45-64.
> Also downloadable from my website at
> --
> Roger Newson
> Lecturer in Medical Statistics
> Department of Public Health Sciences
> Division of Asthma, Allergy and Lung Biology
> King's College London
> 5th Floor, Capital House
> 42 Weston Street
> London SE1 3QD
> United Kingdom
> Tel: 020 7848 6648 International +44 20 7848 6648
> Fax: 020 7848 6620 International +44 20 7848 6620
>   or 020 7848 6605 International +44 20 7848 6605
> Email:
> Website:
> Opinions expressed are those of the author, not the institution.
> *
> *   For searches and help try:
> *
> *
> *

*   For searches and help try:

© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index