Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Re: st: Automatic fit of distribution

From	Richard Williams <[email protected]>
To	[email protected], [email protected]
Subject	Re: Re: st: Automatic fit of distribution
Date	Thu, 11 Jul 2013 13:04:56 -0500

Changing the subject slightly -- it is often recommended that youexamine your data, e.g. do graphs or whatever, run variousdiagnostics. I am inclined to agree; indeed I always tell people tostart with assorted descriptive statistics before launching intotheir high tech models. However, things like stepwise regression arewidely condemned. Again I am inclined to agree, but I have a hardtime explaining what exactly the difference is. In both cases, aren'tyou looking at the data first and using that information to guideyour model building? By graphing the data first, couldn't that leadto over-fitting, and run the risk that analysis with different datawould lead to different results? If, say, my visual examination ordiagnostics have led me to add squared terms or even use a differentstatistical method, aren't my p values misleading? It seems like alot of the cautions and concerns raised with stepwise could also beraised for approaches that are considered much more acceptable. Myinstincts go with the conventional wisdom but I am not sure how Iwould respond if pressed on these matters.


At 11:29 AM 7/11/2013, David Hoaglin wrote:

Diagnostics are fine, but there is no sustitute for looking at the
data (e.g., in well-chosen histograms and quantile-quantile plots).
Programs that rely on the sample skewness and kurtosis will be blind
to mixtures that show more than one mode, and the sensitivity of
sample moments to outliers makes those measures unsuitable for
diagnosing distribution shape.

Also, the process should take into account whether the data are
continuous or discrete.

David Hoaglin

On Thu, Jul 11, 2013 at 11:45 AM, Ariel Linden. DrPH
<[email protected]> wrote:
> I completely agree with Nick and Maarten that the user should do the work

> required to determine what type of distribution they are dealingwith and go

> from there. However, it seems to me that there could be a program that
> "points the user in the right direction" after running a few simple
> diagnostics. For example, there are several programs already available to
> test for normality (ie., -sktest-, -swilk-, -ksmirnov-). It would be rather
> straightforward to test for a Poisson distribution based on the variance =
> mean. It would get harder as we go to other distributions, or fall between
> choices...
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
OFFICE: (574)631-6668, (574)631-6463
HOME:   (574)289-5227
EMAIL:  [email protected]
WWW:    http://www.nd.edu/~rwilliam

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: Re: st: Automatic fit of distribution
  - From: David Hoaglin <[email protected]>

References:
- re: Re: st: Automatic fit of distribution
  - From: "Ariel Linden. DrPH" <[email protected]>
- Re: Re: st: Automatic fit of distribution
  - From: David Hoaglin <[email protected]>

Prev by Date: st: propensity score matching with panel data
Next by Date: st: odd behavior of -sem- in Stata 13
Previous by thread: Re: Re: st: Automatic fit of distribution
Next by thread: Re: Re: st: Automatic fit of distribution
Index(es):
- Date
- Thread