# Re: st: Can Y be a predicted variable?

 From Tinna <[email protected]> To [email protected] Subject Re: st: Can Y be a predicted variable? Date Fri, 9 Sep 2005 15:45:40 -0400

```So will I get fried if I do it my proposed way, or will the results
just be difficult to read for non-statisticians.
Tina

On 9/9/05, Roger Newson <[email protected]> wrote:
> At 18:31 09/09/2005, Tina wrote:
> >Dear statalisters,
> >
> >I have a dependent variable in 5 levels (Self-Assessed Health Status
> >from very good to very poor). I am currently assuming a latent
> >continuous variable, but that is problematic for some of my analysis.
> >I have some other measures of health in my data and was wondering if
> >it was appropriate to create a new one that would be continuous. My
> >suggestion would be:
> >
> >1. regress SAHS on other health variables.
> >2. Predict SAHS (lets call it SAHShat) based on the previous regression.
> >3. The new measure would be calculated as an average of SAHS and SAHShat
> >
> >This looks like a good idea to me, but I wonder why I don't see anyone
> >else doing this if it is OK. Those of you that fell of your office
> >chairs in laughter could maybe get back on and explain why not,
> >because it seems fine idea to me right now.
>
> In general, categorical ordinal outcomes like SAHS are a problem,
> especially if you want to have a parameter estimate that can be understood
> by non-statisticians.
>
> A possible solution is to use Somers' D, which can be estimated (with
> the -ssc- command). Somers' D is defined in terms of Kendall's tau-a, which
> is defined as
>
> tau(X,Y) = E[sign(X1-X2)sign(Y1-Y2)]
>
> where (X1,Y1) and (X2,Y2) are sampled independently from the same
> population. Somers' D is defined as
>
> D(Y|X) = tau(X,Y)/tau(X,X)
>
> Therefore, Kendall's tau-a is the difference between 2 probabilities,
> namely the probability that the larger of 2 randomly-sampled X-values is
> associated with the larger of the 2 corresponding Y-values and the
> probability that the larger of the 2 X-values is associated with the
> smaller of the 2 Y-values. Somers' D is the difference between the 2
> corresponding conditional probabilities, given that the 2 X-values are not
> equal. Somers' D and Kendall's tau-a are discussed in the manual
> -somersd.pdf-, distributed on SSC with the -somersd- package, and also in
> Newson (2002).
>
> Tina does not mention the proposed predictor variables in the proposed
> regression model. However, in a multivariate regression model, there is
> usually one predictor X that is really interesting and other predictors
> that are confounders. For instance, we might want to know how daily
> cigarette consumption predicts SAHS, adjusting for confounders such as
> income, access to a car. and other indicators of general standard of
> living. To estimate a Somers' D of SAHS with respect to cigarettes adjusted
> for the confounders, the first step is to define a propensity score for
> cigarette consumption by regressing cigarette consumption with respect to
> confounders and using the predicted level of cigarette consumption
> (calculated using -predict-) as the cigarette propensity score. We can then
> use -xtile- to define a number of cigarette propensity groups from the
> propensity score, and use -somersd- with the -wstrata()- option to estimate
> a Somers' D of SAHS with respect to cigarette consumption stratified by
> cigarette propensity group. This Somers' D measures association between
> SAHS and cigarette consumption in pairs of patients in the same cigarette
> propensity group. If it is high, then we can say that higher cigarette
> consumers have poorer health than lower cigarette smokers with similar
> "cigarette propensity" based on the confounders. In other words, if the
> stratified Somers' D is high, then the poorer health of cigarette smokers
> is not caused by the fact that cigarette smokers are cigarette-prone
> because of their low general standard of living. Some references about
> propensity scores are given on the manual -somersd.pdf-.
>
> I hope this helps.
>
> Roger
>
>
> References
>
> Newson R. Parameters behind "nonparametric" statistics: Kendall's tau,
> Somers' D and median differences. The Stata Journal 2002; 2 (1): 45-64.
> http://phs.kcl.ac.uk/rogernewson/papers.htm
>
>
>
> --
> Roger Newson
> Lecturer in Medical Statistics
> Department of Public Health Sciences
> Division of Asthma, Allergy and Lung Biology
> King's College London
>
> 5th Floor, Capital House
> 42 Weston Street
> London SE1 3QD
> United Kingdom
>
> Tel: 020 7848 6648 International +44 20 7848 6648
> Fax: 020 7848 6620 International +44 20 7848 6620
>   or 020 7848 6605 International +44 20 7848 6605
> Email: [email protected]
> Website: http://phs.kcl.ac.uk/rogernewson/
>
> Opinions expressed are those of the author, not the institution.
>
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```