--- Austin Nichols wrote:
> > I think it's fairly easy to prove via counterexample or simulation
> > that this can easily give the wrong answer.  Can you give a
> > referencethat supports it?
--- David Radwin <[email protected]> wrote:
> It is true, of course, as with many statistical techniques, that this
> technique may lead you astray. I have not done any simulations 
> myself, but I will refer you again to the reference in my original 
> posting:
> 
> Parker, R. N., & Fenwick, R. (1983). The Pareto curve and its utility
> for open-ended income distributions in survey research. Social 
> Forces, Vol. 61, No. 3, 872-885. 
> http://www.jstor.org/view/00377732/di010900/01p0014t/0
In the example below I simulate the the results if we had the
continuous data, use -intreg- with log transformed endpoints, and log
transformed mid-point scaling. All three methods seem to perform ok.
This doesn't mean that midpoint scaling will always be ok, because a) I
created the data to be well behaved and the model to be appropriate for
that data, and b) as Austin already remarked, this method can get very
sensitve to wrongly chosen values of the lowest and highes midpoints
and these midpoints are hardest to choose since they tend to be open
intervals.
Hope this helps,
Maarten
*---------------- begin example ------------------
set seed 12345
capture program drop sim
program define sim, rclass
	drop _all
	set obs 500
	gen x = _n < 251
	gen y = exp(.7*invnorm(uniform())+ 11 + .25*x)
	bys x: sum y
	egen cat = cut(y), ///
        at(0, 25000, 50000, 100000, 150000, 200000, 500000, 1e7)
	gen cat2 = cat
	recode cat2 (     0 =   25000)  ///
      	            ( 25000 =   50000)  /// 
            	    ( 50000 =  100000)  ///
	            (100000 =  150000)  ///
      	            (150000 =  200000)  ///
            	    (200000 =  500000)  ///
	            (500000 = 1000000)
	gen mid = cat
	recode mid  (     0 =   20000)  ///
      	            ( 25000 =   37500)  /// 
            	    ( 50000 =   75000)  ///
	            (100000 =  125000)  ///
      	            (150000 =  175000)  ///
            	    (200000 =  350000)  ///
	            (500000 =  750000)
	
	gen lny = ln(y)
	gen lncat = ln(cat+1)
	gen lncat2 = ln(cat2)
	gen lnmid = ln(mid)
	reg lny x
	return scalar xcont = _b[x]
	intreg lncat lncat2 x
	return scalar xcat = _b[x]
	reg lnmid x
	return scalar xmid = _b[x]	
end
simulate cont=r(xcont) cat=r(xcat) mid=r(xmid), reps(10000): sim
twoway kdensity cont ||           ///
       kdensity cat  ||           ///
       kdensity mid,              ///
       xline(.25)                 ///
       xtitle("effect of x")      ///
       ytitle("density")          ///   
       legend(order(1 "continous" ///
                      "data"      ///
                    2 "intreg"    ///
                    3 "mid point" ///
                      "scoring"))
sum
*------------------ end example -------------------------
(For more on how to use examples I sent to the Statalist, see
http://home.fsw.vu.nl/m.buis/stata/exampleFAQ.html )
-----------------------------------------
Maarten L. Buis
Department of Social Research Methodology
Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands
visiting address:
Buitenveldertselaan 3 (Metropolitan), room Z434
+31 20 5986715
http://home.fsw.vu.nl/m.buis/
-----------------------------------------
      ___________________________________________________________
Yahoo! Answers - Got a question? Someone out there knows the answer. Try it
now.
http://uk.answers.yahoo.com/ 
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/