Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: resistant line or median median line


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   st: RE: resistant line or median median line
Date   Wed, 9 Feb 2005 13:34:20 -0000

There are several slightly different recipes for 
this line. Tukey used similar ideas around the 
time of his Exploratory data analysis (1977), 
and there is an older literature going back 
at least to the 1940s. The key point of most
of the recipes I have seen is that they 
are amenable to hand calculation, insofar as the x and y 
medians of each group can be determined by 
eye on a scatter plot for modest sample sizes. 
So in a sense I think it's arguable that the 
method has been superseded by quantile regression. 

It is indeed not (guaranteed to be) exactly the 
same as quantile regression. (However, is it 
true that a quantile regression necessarily passes 
through (median of x, median of y)? I doubt it.) 

I am not aware of a Stata implementation. 
Still, it is possible to make a hack at one. 

*! NJC 1.0.0 9 February 2005 
program resline
	version 8 
	syntax varlist(min=2 max=2) [if] [in] [, * ] 
	
	quietly { 
		marksample touse
		count if `touse' 
		if r(N) == 0 error 2000 

		tokenize `varlist' 
		args y x 

		tempvar cut 
		egen `cut' = cut(`x') if `touse', group(3) 

		su `y' if `cut' == 0, detail 
		local y0 = r(p50) 

		su `y' if `cut' == 1, detail 
		local y1 = r(p50) 
		
		su `y' if `cut' == 2, detail 
		local y2 = r(p50) 

		su `x' if `cut' == 0, detail 
		local x0 = r(p50) 
		
		su `x' if `cut' == 1, detail 
		local x1 = r(p50) 

		su `x' if `cut' == 2, detail 
		local x2 = r(p50) 

		local slope = ((`y2') - (`y0')) / ((`x2') - (`x0')) 
		if `slope' == . { 
			di as err "no go: slope indeterminate"
			exit 498 
		} 

		local intercept = ((`y2') + (`y1') + (`y0')) / 3 
		if `intercept' == . { 
			di as err "no go: intercept indeterminate" 
			exit 498 
		} 
	}	

	di 
	di as txt "slope" "{col 12}" as res %12.3f `slope'
	local b : di %4.3f `slope' 
	di as txt "y summary" "{col 12}" as res %12.3f `intercept' 
	local a : di %4.3f `intercept' 
	local X1 : di %4.3f `x1' 

	twoway function resistant = ///
      `intercept' + `slope' * (x - `x1'), ///
	range(`x') t1(`y' = `a' + `b' * (`x' - `x1')) ///
	|| scatter `y' `x' if `touse', `options'

end 	

e.g. resline mpg weight 

Nick 
[email protected] 

Faith Anne
 
> I need to calculate a specific type of line through a two-variable
> dataset. In exploratory data analysis, what I need is called a
> resistant line. In my high school classes, we called it a
> median-median line. The way it's calculated is to divide the data into
> three groups, find the x-median and y-median values (called the
> summary point) for each group, and then use those three summary points
> to determine the line.  The outer two summary points determine the
> slope, and an average of all of them determines the intercept.
> 
> As far as I can tell, this isn't quite the same as the quantile
> regression command, because the resistant line doesn't necessarily go
> through the median of the whole dataset.  In the resistant line
> calculation, you ignore all information besides the summary points, so
> you don't actually take into account the absolute deviations and try
> to minimize them. Someone please correct me if I have misunderstood
> this!
> 
> I'm aware of the pros and cons of this method as compared to least
> squares linear regression, but I am required to do this analysis and
> compare it to least squares. Minitab can do this through its menu of
> EDA commands, but I'm deeply frustrated with Minitab's data management
> and graphing, so I'd really like to know how to do this with Stata.

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index