The real reason why the regression line is not equal to the conditional means is that
with regression we usually assume that we that these conditional means lie on a
straight line. In reality this constraint does not have to be true. Also, we can relax
this assumption by adding the variable as a series of dummies. See the example
below:
*----------------begin example--------------------
sysuse auto, clear
recode rep78 1/2=3
tabstat price, stat(mean) by(rep78)
reg price rep78
adjust, by(rep78)
xi: reg price i.rep78
adjust, by(rep78)
*------------------end example-------
Hallo,
by definition the regression line should go through the conditional means of Y-variable in each X-variable. But when I compute conditional means of Y in each category of X the regression line does not go through these conditional means.
In statistical literature concerning regression I have found this explanation: we work with sample and therefore we do not have all observations of Y variable in each X-variable, and this is the reason why we do not observe the correspondence among measured conditional means of Y variable and predicted conditional means given by regression line.
Is this explanation correct and just one possible?
