Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Maarten Buis <maartenlbuis@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
st: query on testing uniform distributions |

Date |
Tue, 1 Nov 2011 09:27:33 +0100 |

--- Sergio wrote me privately: > I hope you can help me with the following query. Such question should be asked to the statalist and not to its members privately. This is not a silly rule, there are good reasons for it, which are listed here: <http://www.stata.com/support/faqs/res/statalist.html#private>. > I have read your suggestions on testing whether > observed data follow a uniform distribution: <http://www.stata.com/statalist/archive/2010-10/msg00146.html> > > and I am a bit puzzled by the results I obtain when > applying your syntax. > > I observe the dates people start their employment spells > over each tax year and I want to check if these dates are > distributed uniformly over the year. The dates are in > numeric format so I observe 364 different numbers for > each tax year. > > If I use the syntax you suggest: > > egen n = count(employment_start_dates) > egen i = rank(employment_start_dates) > gen hazen = (i - 0.5) / n > drop n i > > quantile hazen , aspect(1) name(quantile, replace) This graph tests whether the variable hazen is uniformly distributed, which is trivially the case since it is only based on the rank. I used that graph to spot ties, not to check whether the variable of interest (in your case employement_start_dates) is uniformly distributed. I suspect that in your case you would see 365 little horizontal plateaus on the 45 degree line. This may well be too subtle to easily see in that graph, but given your sample size of almost 3 million observations, I suspect that these ties might matter for your test. If you want to graphically test whether your variable of interest is uniformly distributed you would type in Stata: -quantile employement_start_dates, aspect(1)-. > In my case this graph shows values which lie exactly on > the 45 degree line (a histogram also shows data are more > or less uniformely distributed). However, the output I get > with the ksmirnov test is > > ksmirnov hazen=hazen > > One-sample Kolmogorov-Smirnov test against theoretical > distribution > hazen > > Smaller group D P-value Corrected > ---------------------------------------------- > hazen: 0.0081 0.000 > Cumulative: -0.0081 0.000 > Combined K-S: 0.0081 0.000 0.000 > >Note: ties exist in dataset; > there are 365 unique values out of 2887994 observations. > > I understand this means I reject Ho and therefore the finding > is that my data do not follow a uniform distribution. Can the > ksmirnov tests and the quantile plot produce totally opposite > results as in my case? Should the case of discrete values > (my case) be treated differently from the continuous case > you talk about? Here, I am assuming I have applied your > syntax correctly. Many thanks for your help, very much > appreciated. As I said above the graph does not test the same thing as the test, so it can easily be that the two lead to different conclusion. Moreover, there are two types of uniform distribution: a discrete and a continuous uniform distribution. For example the results of throwing a six sided die would follow a discrete uniform distribution, while the -runiform()- function in Stata produces draws from a continuous uniform distribution. The syntax you used tested against a continuous uniform distribution. However, in your case, you would have a discrete uniform distribution with 365 possible values. In what I would call normal size samples (say 1,000 to 10,000 observations) I would suspect that a continuous uniform distribution would be a perfectly acceptable approximation, but in your case it might make a difference. Hope this helps, Maarten -------------------------- Maarten L. Buis Institut fuer Soziologie Universitaet Tuebingen Wilhelmstrasse 36 72074 Tuebingen Germany http://www.maartenbuis.nl -------------------------- * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**st: RE: query on testing uniform distributions***From:*Sergio Salis <S.Salis@psi.org.uk>

**Re: st: query on testing uniform distributions***From:*Nick Cox <njcoxstata@gmail.com>

- Prev by Date:
**Re: st: RE: r(1000) error when running xi** - Next by Date:
**Re: st: 3sls-fe regression for panel data** - Previous by thread:
**Re: st: stratified Cox proportional hazards model and AIC** - Next by thread:
**Re: st: query on testing uniform distributions** - Index(es):