On Tue, Apr 30, 2013 at 10:31 AM, Nick Cox <njcoxstata@gmail.com> wrote: > The deeper idea, I suggest, is that it is the _definition_ of a > regression line (function, more generally) that it is the locus of the > means of the response. On top of that we often build an _assertion_ or > _assumption_ that that function is linear in the parameters. Yes, though of course regression can be even broader and talk about the notion of other conditional quantities, or, I suppose about conditional distributions entirely. > It's important to separate the assumptions of linear models from the > estimators we happen to use to get at parameters. That the regression > line goes through the means is not a consequence of using OLS. Right, it's a consequence of the fact that OLS is minimized for the sample mean and when the vector 1 is in the column space of X it preserves the mean. Regression through the origin or through some other constant will not go through the sample mean. If you switch loss functions, you will get a different answer. By choosing a different loss function you have, implicitly or explicitly, asked a different question and are thereby likely to get a different answer. There's nothing particularly desirable about OLS aside from the fact that the math for it is "nice". There's a neat little article on this: R. DeLaubenfels. 2006. The victory of least squares and orthogonality in statistics. The American Statistician, 60, 315-321. -- JVVerkuilen, PhD jvverkuilen@gmail.com

