Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Can Y be a predicted variable?

From   Constantine Daskalakis <>
Subject   Re: st: Can Y be a predicted variable?
Date   Fri, 09 Sep 2005 17:18:32 -0400

At 03:45 PM 9/9/2005, Tinna wrote:
So will I get fried if I do it my proposed way, or will the results
just be difficult to read for non-statisticians.

> At 18:31 09/09/2005, Tina wrote:
> >Dear statalisters,
> >
> >I have a dependent variable in 5 levels (Self-Assessed Health Status
> >from very good to very poor). I am currently assuming a latent
> >continuous variable, but that is problematic for some of my analysis.
> >I have some other measures of health in my data and was wondering if
> >it was appropriate to create a new one that would be continuous. My
> >suggestion would be:
> >
> >1. regress SAHS on other health variables.
> >2. Predict SAHS (lets call it SAHShat) based on the previous regression.
> >3. The new measure would be calculated as an average of SAHS and SAHShat
> >
> >This looks like a good idea to me, but I wonder why I don't see anyone
> >else doing this if it is OK. Those of you that fell of your office
> >chairs in laughter could maybe get back on and explain why not,
> >because it seems fine idea to me right now.

I am not laughing, but I think there are plenty of reasons why this is NOT done.

For one, the "predicted" variable (SAHShat) is not an observed outcome but one that you have "imputed" (via your regression of SAHS on other health variables). How good the prediction is can be debatable. And what beast this "predicted" variable really is can be equally debatable.

Second, it is not clear how you would even regress the ordinal SAHS variable on other variables. Through some ordinal regression model? Then, your predicted values would still be discrete. Through some linear regression model? But if your original SAHS variable were continuous enough for that, you wouldn't worry in the first place.

Third, the fact that those values are imputed rather than observed adds variability that will not be accounted for in your main analyses. Usual methods take Y to be an actually observed outcome.

Fourth, why "average" the observed and predicted? What is the rationale for that and what do you get out of it?

In addition to Roger's suggestions, another avenue might be to actually do a formal latent variable analysis, where you would use SAHS (and possibly other covariates) as proxies of your unobserved latent outcome. I think structural equations come into this but it is not my field.


The documents accompanying this transmission may contain confidential health or business information. This information is intended for the use of the individual or entity named above. If you have received this information in error, please notify the sender immediately and arrange for the return or destruction of these documents.

Constantine Daskalakis, ScD
Assistant Professor,
Thomas Jefferson University, Division of Biostatistics,
211 S. 9th St., Suite 602, Philadelphia, PA 19107
Tel: 215-955-5695
Fax: 215-503-3804
* For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index