Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: Multiple linear regression the right approach?

 From Nick Cox To "statalist@hsphsun2.harvard.edu" Subject Re: st: Multiple linear regression the right approach? Date Mon, 24 Jun 2013 20:28:56 +0100

```This is representative of a fairly large class of questions on
Statalist which treats the list as some kind of Delphic oracle capable
of giving wise statistical advice to those lost and lacking purpose.
History tells that the oracle's advice was often vague and deeply
ambiguous, and although this wasn't recorded in any of the Greek
history I ever read presumably the priestesses felt that they didn't
usually have enough information to do any better.

For example, why you think that price might be dichotomised is not
explained, but it sounds like a recipe for discarding much of your
information.

The word "interval" here is ambiguous here as between (a) interval
scale of measurement (why isn't price ratio scale?) and (b) defined
only on an interval (in the latter case what would that be?). But
price itself is necessarily positive so I would expect regression with
some kind of log link to be the best first approximation. Poisson
regression springs to mind and the fact that price is not (presumably)
a count is secondary here, as witness
http://blog.stata.com/tag/poisson-regression/

The fact that price is, on your evidence, not normally distributed is
not compelling. No regression technique using predictors assumes the
marginal distribution of the response to be normal and even
conditional normality of the response is a relatively unimportant
assumption for classical linear regression. If you used Poisson
regression even that would not be assumed.

In a sentence, read Bill Gould's blog entry as above for one possibility.

Nick
njcoxstata@gmail.com

On 24 June 2013 19:46, Simon Hauburger <simonhauburger@gmail.com> wrote:
> Dear potential helpers,
>
> I have a problem figuring out the right regression for my model:
>
> - It has a interval dependent variable (costs in \$)  that looks
> normally distributed, but according to shapiro-wilk test isn't
> - a number of independent variables which are categorial (scale from
> 1-6) and interval (assets in \$)
>
> My first guess was to use a multiple linear regression, but not all of
> the independent variables are linearly related to the dependent
> variable (tested with cprplot lowess), even after having tried the
> common transformation techniques (log, square...)
>
> Any reommendations for my next steps? Keep trying to transform the
> variables and use the multiple linear regression or try an alternative
> method? If so, which method could it be? Logistic regression?
> (Transformation of the dependent variable to a binary variable is
> possible)
>
> I am really confused, statistics will never become my best friend....
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
```