--- Austin Nichols wrote: > > I think it's fairly easy to prove via counterexample or simulation > > that this can easily give the wrong answer. Can you give a > > referencethat supports it? --- David Radwin <radwin@berkeley.edu> wrote: > It is true, of course, as with many statistical techniques, that this > technique may lead you astray. I have not done any simulations > myself, but I will refer you again to the reference in my original > posting: > > Parker, R. N., & Fenwick, R. (1983). The Pareto curve and its utility > for open-ended income distributions in survey research. Social > Forces, Vol. 61, No. 3, 872-885. > http://www.jstor.org/view/00377732/di010900/01p0014t/0 In the example below I simulate the the results if we had the continuous data, use -intreg- with log transformed endpoints, and log transformed mid-point scaling. All three methods seem to perform ok. This doesn't mean that midpoint scaling will always be ok, because a) I created the data to be well behaved and the model to be appropriate for that data, and b) as Austin already remarked, this method can get very sensitve to wrongly chosen values of the lowest and highes midpoints and these midpoints are hardest to choose since they tend to be open intervals. Hope this helps, Maarten *---------------- begin example ------------------ set seed 12345 capture program drop sim program define sim, rclass drop _all set obs 500 gen x = _n < 251 gen y = exp(.7*invnorm(uniform())+ 11 + .25*x) bys x: sum y egen cat = cut(y), /// at(0, 25000, 50000, 100000, 150000, 200000, 500000, 1e7) gen cat2 = cat recode cat2 ( 0 = 25000) /// ( 25000 = 50000) /// ( 50000 = 100000) /// (100000 = 150000) /// (150000 = 200000) /// (200000 = 500000) /// (500000 = 1000000) gen mid = cat recode mid ( 0 = 20000) /// ( 25000 = 37500) /// ( 50000 = 75000) /// (100000 = 125000) /// (150000 = 175000) /// (200000 = 350000) /// (500000 = 750000) gen lny = ln(y) gen lncat = ln(cat+1) gen lncat2 = ln(cat2) gen lnmid = ln(mid) reg lny x return scalar xcont = _b[x] intreg lncat lncat2 x return scalar xcat = _b[x] reg lnmid x return scalar xmid = _b[x] end simulate cont=r(xcont) cat=r(xcat) mid=r(xmid), reps(10000): sim twoway kdensity cont || /// kdensity cat || /// kdensity mid, /// xline(.25) /// xtitle("effect of x") /// ytitle("density") /// legend(order(1 "continous" /// "data" /// 2 "intreg" /// 3 "mid point" /// "scoring")) sum *------------------ end example ------------------------- (For more on how to use examples I sent to the Statalist, see http://home.fsw.vu.nl/m.buis/stata/exampleFAQ.html ) ----------------------------------------- Maarten L. Buis Department of Social Research Methodology Vrije Universiteit Amsterdam Boelelaan 1081 1081 HV Amsterdam The Netherlands visiting address: Buitenveldertselaan 3 (Metropolitan), room Z434 +31 20 5986715 http://home.fsw.vu.nl/m.buis/ ----------------------------------------- ___________________________________________________________ Yahoo! Answers - Got a question? Someone out there knows the answer. Try it now. http://uk.answers.yahoo.com/ * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

