Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: RE: query on testing uniform distributions


From   Nick Cox <n.j.cox@durham.ac.uk>
To   "'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu>
Subject   st: RE: RE: query on testing uniform distributions
Date   Tue, 1 Nov 2011 17:30:26 +0000

It is not clear whether you have been following the entire thread started by Maarten Buis. 

Quite what you want to do is unclear. In principle, -ksmirnov- contains no special way of respecting the discreteness of either the data or any theoretical distribution. In practice, that might not matter with 360+ distinct values. But it's a blunt tool any way. 

Also, your syntax  

14341+int((14705-14341+1)*runiform())

as you say reveals that you have just a single year's data, namely 

. di %d 14341
07apr1999

. di %d 14705
05apr2000

For your set-up -ksmirnov- sounds a red herring in that it won't tell you much more than -quantile-. In fact, -quantile- is evidently too coarse a tool if you cannot see any fine structure in what it produces. Some suggestions of other tools have already been made in the thread, which is where I started. 

Nick 
n.j.cox@durham.ac.uk 

Sergio Salis

Dear Marteen,

Many thanks for your response. You suggest using

-quantile employment_start_dates, aspect(1)

to graphically test whether employment_start_dates follow a uniform distribution. (I can see from the graph it does.)

I understand that to check whether employment_start_dates follow a uniform distribution I can also use the ksmirnov test. It is not clear to me what the command -ksmirnov hazel=hazel- actually does (so I would appreciate if you could provide some further explanation on this). However, if employment_start_dates was a continuous variable I would use

-ksmirnov employment_start_date=14341+int((14705-14341+1)*runiform())

where 14705 is the first and 14341 the last day of the tax year of interest. Is this syntax line correct in case one deals with a continuous variable? If so, how can it be changed to deal with a discrete variable, which is my case?

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index