Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Can Y be a predicted variable?

From   Roger Newson <>
Subject   Re: st: Can Y be a predicted variable?
Date   Fri, 09 Sep 2005 19:53:56 +0100

At 18:31 09/09/2005, Tina wrote:
Dear statalisters,

I have a dependent variable in 5 levels (Self-Assessed Health Status
from very good to very poor). I am currently assuming a latent
continuous variable, but that is problematic for some of my analysis.
I have some other measures of health in my data and was wondering if
it was appropriate to create a new one that would be continuous. My
suggestion would be:

1. regress SAHS on other health variables.
2. Predict SAHS (lets call it SAHShat) based on the previous regression.
3. The new measure would be calculated as an average of SAHS and SAHShat

This looks like a good idea to me, but I wonder why I don't see anyone
else doing this if it is OK. Those of you that fell of your office
chairs in laughter could maybe get back on and explain why not,
because it seems fine idea to me right now.
In general, categorical ordinal outcomes like SAHS are a problem, especially if you want to have a parameter estimate that can be understood by non-statisticians.

A possible solution is to use Somers' D, which can be estimated (with confidence limits) using the -somersd- package (downloadable from SSC using the -ssc- command). Somers' D is defined in terms of Kendall's tau-a, which is defined as

tau(X,Y) = E[sign(X1-X2)sign(Y1-Y2)]

where (X1,Y1) and (X2,Y2) are sampled independently from the same population. Somers' D is defined as

D(Y|X) = tau(X,Y)/tau(X,X)

Therefore, Kendall's tau-a is the difference between 2 probabilities, namely the probability that the larger of 2 randomly-sampled X-values is associated with the larger of the 2 corresponding Y-values and the probability that the larger of the 2 X-values is associated with the smaller of the 2 Y-values. Somers' D is the difference between the 2 corresponding conditional probabilities, given that the 2 X-values are not equal. Somers' D and Kendall's tau-a are discussed in the manual -somersd.pdf-, distributed on SSC with the -somersd- package, and also in Newson (2002).

Tina does not mention the proposed predictor variables in the proposed regression model. However, in a multivariate regression model, there is usually one predictor X that is really interesting and other predictors that are confounders. For instance, we might want to know how daily cigarette consumption predicts SAHS, adjusting for confounders such as income, access to a car. and other indicators of general standard of living. To estimate a Somers' D of SAHS with respect to cigarettes adjusted for the confounders, the first step is to define a propensity score for cigarette consumption by regressing cigarette consumption with respect to confounders and using the predicted level of cigarette consumption (calculated using -predict-) as the cigarette propensity score. We can then use -xtile- to define a number of cigarette propensity groups from the propensity score, and use -somersd- with the -wstrata()- option to estimate a Somers' D of SAHS with respect to cigarette consumption stratified by cigarette propensity group. This Somers' D measures association between SAHS and cigarette consumption in pairs of patients in the same cigarette propensity group. If it is high, then we can say that higher cigarette consumers have poorer health than lower cigarette smokers with similar "cigarette propensity" based on the confounders. In other words, if the stratified Somers' D is high, then the poorer health of cigarette smokers is not caused by the fact that cigarette smokers are cigarette-prone because of their low general standard of living. Some references about propensity scores are given on the manual -somersd.pdf-.

I hope this helps.



Newson R. Parameters behind "nonparametric" statistics: Kendall's tau, Somers' D and median differences. The Stata Journal 2002; 2 (1): 45-64. Also downloadable from my website at

Roger Newson
Lecturer in Medical Statistics
Department of Public Health Sciences
Division of Asthma, Allergy and Lung Biology
King's College London

5th Floor, Capital House
42 Weston Street
London SE1 3QD
United Kingdom

Tel: 020 7848 6648 International +44 20 7848 6648
Fax: 020 7848 6620 International +44 20 7848 6620
or 020 7848 6605 International +44 20 7848 6605

Opinions expressed are those of the author, not the institution.

* For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index