Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: qnorm


From   Nick Cox <n.j.cox@durham.ac.uk>
To   "'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu>
Subject   RE: st: qnorm
Date   Tue, 6 Mar 2012 12:18:17 +0000

Corrected code below if anyone is interested. Let me stress what is embedded below, that this is a helper file that assumes use of -qplot- (SJ). It is, despite the name, nothing to do with Stata's official -qnorm-. (So, when this emerges eventually on SSC, it may well have a different name.) 

Nick 
n.j.cox@durham.ac.uk 

I pushed this a bit further. This is just a helper program that calculates variables defining an envelope; it deliberately stops short of the graphics, which -qplot- (SJ) can do for you. A help file will follow in due course. 

*! 1.0.0 NJC 5 March 2012
program qnormenv 
	version 9    
	syntax varname(numeric) [if] [in], GENerate(str) [ reps(int 100) level(int 95) ] 
	
	marksample touse
	qui count if `touse' 
	if r(N) == 0 error 2000 
	
	tokenize "`generate'" 
	if "`2'" == "" | "`3'" != "" { 
		di as err "two names required in generate()"
		exit 198 
	}
	
	confirm new var `generate' 

	mata : _qnormenv("`varlist'", "`touse'", "`generate'", `reps', `level') 
end 
	
mata: 

void _qnormenv(
string scalar varname, 
string scalar tousename, 
string rowvector newnames, 
real scalar reps, 
real scalar level) 
{ 	
	
real matrix compare 	
real colvector y 	
real scalar mean, sd, n, l1, l2, u1, u2 
	
st_view(y = ., ., varname, tousename)
mean = mean(y)
sd = sqrt(variance(y)) 
n = rows(y) 
compare = J(n, 0, .)

for (j = 1; j <= reps; j++) { 
	compare = compare, sort(rnormal(n, 1, mean, sd), 1) 
}

level1 = (100 - level) / 200 
level2 = (100 + level) / 200 
l1 = floor(reps * level1)
l2 = ceil(reps * level1)
u1 = floor(reps * level2)
u2 = ceil(reps * level2)

envelope = J(n, 2, .) 
for (i = 1; i <= n; i++) { 	
	x = sort(compare[i,]', 1) 
	envelope[i,] = ((x[l1] + x[l2])/2, (x[u1] + x[u2])/2)  
} 

newnames = tokens(newnames) 
(void) st_addvar("float", newnames) 
st_store(., newnames, tousename, envelope) 

}

end

-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox
Sent: 05 March 2012 15:24
To: 'statalist@hsphsun2.harvard.edu'
Subject: RE: st: qnorm

The approach in Maarten's program is to generate a number of random samples and show the lot as replicates. 

Here as an alternative is some example code for individual 95% confidence intervals for each plotted point. -qplot- used at the end is from SJ. The code isn't smart about missing values, but it could easily be made smarter. I also guess the code could be shortened in the middle. 

sysuse auto, clear

mata 
y = sort(st_data(., "mpg"), 1) 
mean = mean(y)
sd = sqrt(variance(y)) 
n = rows(y) 

compare = J(n, 0, .)

for (j = 1; j <= 100; j++) { 
	compare = compare, sort(rnormal(n, 1, mean, sd), 1) 
}

envelope = J(n, 2, .) 
for (i = 1; i <= n; i++) { 	
	x = sort(compare[i,]', 1) 
	envelope[i,] = ((x[2] + x[3])/2, (x[97] + x[98])/2)  
} 

names = tokens("_lower _upper") 
(void) st_addvar("float", names) 
st_store(., names, envelope) 

end 

qplot mpg _lower _upper, ms(O i i) c(.  J J) legend(off) ytitle("`: var label mpg'") 

Nick 
n.j.cox@durham.ac.uk 

Maarten Buis

On Mon, Mar 5, 2012 at 10:03 AM, Nick Cox wrote:
> 1. Simulate several samples from a distribution with the same mean and
> standard deviation (or more generally an appropriate mean and standard
> deviation) and use the resulting portfolio of plots in assessing what
> kind of variability is to be expected.

An easy way to do so is to use the -margdistfit- package, which you
can install by typing in Stata -ssc install margdistfit-. The default
is actually to first sample the mean and the standard deviation from
its sampling distribution and than sample a new variable with those
sampled means and standard deviation. I suspect that this makes sense
in most cases, though I also suspect that it won't matter much. If you
want to do exactly what Nick proposes you can add the -noparsamp-
option.

Here is an example of what such a graph would look like:

*-------- begin example -----------
sysuse auto, clear
reg mpg
margdistfit, qq name(qq)
margdistfit, pp name(pp)
margdistfit, hangroot name(hangr)
margdistfit, cumul name(cumul)
*--------- end example ------------
(For more on examples I sent to the Statalist see:
http://www.maartenbuis.nl/example_faq )


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index