st: RE: RE: cumulative distribution function (assigned values)

Tue, 3 Jun 2003 16:58:57 +0100

Manuel Kast > > how can I generate in Stata 8.0 a new variable which > > contains the values > > assigned by the observed cumulative distribution of one > > variable? In other > > words, I would like to get those values stored in a new > > variable, that are > > used by the command "cdf varname" to plot the sample cumulative > > distribution function of varname. > > I don't think the command "cumul varname" will not work for > > my case, since > > my variable contains several observations with the same > > values, but "cumul > > varname" assigns different values to to these, depending > > how they were > > initially ordered. Nick Cox > sort varname > gen cumul = sum(varname < .) > by varname: replace cumul = cumul[_N] > replace cumul = cumul / cumul[_N] Here is another way to do it, exploiting the fact that ranking and calculation of cumulative probabilities are sibling problems. (The messy small details arise from ways of handling ties.) The FAQ at http://www.stata.com/support/faqs/stat/pcrank.html explores other connections. First the code: egen cumul = rank(varname), field egen n = count(varname) replace cumul = (n + 1 - cumul) / n A way to see this is that for cumulative probabilities we want a "rank" that looks like this: data "rank" 1 1 2 2 3 5 3 5 3 5 4 6 5 7 Here "rank"(x) is the number <= x. This doesn't look like any of the usual ranks, until you compare it with the field rank, i.e. the ranking if this were a field event in which highest value wins: data "rank" field rank 1 1 7 2 2 6 3 5 3 3 5 3 3 5 3 4 6 2 5 7 1 from which it is clear that "rank" + field rank = n + 1 and the rest is immediate. The advantage of an -egen- approach here is that it easily takes care of any or all of -if- or -in- restrictions doing it -by:- missing values Having said all that, there should, in my view, be an option (say -equal-) to -cumul- which ensures that equal values get equal probabilities (frequencies) assigned. Nick n.j.cox@durham.ac.uk * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

