Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: graphs, outliers, labels


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: graphs, outliers, labels
Date   Fri, 14 May 2004 19:00:46 +0100

Use -summarize, detail- to see the median and quartiles. 

The adjacent values are the extreme data points within 
1.5 iqr of the nearer quartile. I think you might have
to re-create those for yourself, as -graph box- doesn't 
seem to leave them in memory. Nor should it really, 
as there could be lots of them. 

With this, 

*! NJC 1.0.0 14 May 2004 
program adjacent 
	version 8 
	syntax varname(numeric) [if] [in] [ , by(varlist) ] 

	quietly { 
		marksample touse 
		count if `touse' 
		if r(N) == 0 error 2000 

		if "`by'" == "" { 
			tempvar by 
			tempname bylabel 
			gen byte `by' = 1 
			local label "`varlist'"
			label def `bylabel' 1 `"  "' 
			label val `by' `bylabel' 
		} 
		else { 
			markout `touse' `by', strok 
			local label "`by'" 
		} 	

		tempvar group upper lower 
		egen `group' = group(`by') if `touse', label
		label var `group' "`label'" 
		
		gen `upper' = . 
		gen `lower' = . 
		su `group', meanonly 	
		
		forval i = 1/`r(max)' { 
			su `varlist' if `group' == `i', detail 
			local u = r(p75) + (3/2) * (r(p75) - r(p25)) 
			local l = r(p25) - (3/2) * (r(p75) - r(p25)) 
			su `varlist' if `group' == `i' /// 
				& `varlist' <= `u', meanonly  
			replace `upper' = r(max) if `group' == `i' 
			su `varlist' if `group' == `i' ///
				& `varlist' >= `l', meanonly  
			replace `lower' = r(min) if `group' == `i' 
		} 

		label var `upper' "upper adjacent"
		label var `lower' "lower adjacent"
	} 	

	tabdisp `group' if `touse', c(`lower' `upper') 
end 	
	
and a few tests, 

. sysuse auto, clear
(1978 Automobile Data)

. adjacent mpg

------------------------------------------
      mpg | lower adjacent  upper adjacent
----------+-------------------------------
        . |             12              35
------------------------------------------

. adjacent mpg, by(foreign) 

------------------------------------------
  foreign | lower adjacent  upper adjacent
----------+-------------------------------
 Domestic |             12              30
  Foreign |             14              35
------------------------------------------

. adjacent mpg, by(foreign rep78) 

-------------------------------------------
foreign    |
rep78      | lower adjacent  upper adjacent
-----------+-------------------------------
Domestic 1 |             18              24
Domestic 2 |             14              24
Domestic 3 |             12              28
Domestic 4 |             14              28
Domestic 5 |             30              34
 Foreign 3 |             21              26
 Foreign 4 |             21              28
 Foreign 5 |             17              41
-------------------------------------------

I seem to get the same values as do the box
plot routines. Note that adjacent values 
need not be unique. More testing advisable. 

Nick 
n.j.cox@durham.ac.uk 

daphna
 
> I am using the graph commands on stata for the first time and 
> am running 
> into some confusion.
> 
> Is there any way I can get labels on my box plot graphs?
> 
> Specifically, I am using the box command.  I would like to 
> see the values 
> of the median, 25th percentile, 75 %...etc.   Most 
> importantly, I want to 
> see/know the values of the top and bottom cut off lines.  How 
> do I find 
> these values?
> 
> I am interested in analyzing the outliers or outside values, 
> but I am not 
> able to see what the specific lower and upper cut off values are.
> 

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index