Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Computing local variance


From   Maarten buis <[email protected]>
To   stata list <[email protected]>
Subject   RE: st: Computing local variance
Date   Fri, 20 Feb 2009 22:57:23 +0000 (GMT)

--- Benjamin Villena Roldan wrote:
> I'm dealing with the following problem. I have two continuous variables
> X and Y. I'm trying to do the following:
> 1. Sort the data using X
> 2. For each observation of X, I compute the local variance of Y by a
> nearest neighborhood approach. I take the 2k closest observations to an
> observation X[i], i.e. using observations between X[i-k+1] and X[i+k].
> 3. I'm implementing this approach by using a forvalue loop such as
<snip>
> So, I have two questions/problems about this code
> 1. I need to do the same procedure several times and it is very
> time-consuming. Is there a way to speed up the execution? How much time
> would I gain if I implement a similar code in C++?
> 2. There are missing observations in X and Y, how can I restrict the sort
> command to deal with nonmissing values of both variables. A simple
> answer is to do 
> -keep if X!=. & Y!=.
> Can I do it without dropping data?

You don't have to move to C++, you can use Mata instead. The big 
advantage of Mata is that it plays well with Stata. In the example 
below I have defined a Mata function -nneigh()- which computes the 
local standard deviation. It takes 4 arguments, which are in order: 

o the variable on which you sort (X in your example), 
o the variable whose variance you want (Y in your example), 
o a variable which is 1 when you want to include it in the analysis 
  and 0 if you want to ignore it (for example because it has missing 
  values on either X or Y, i.e. this answers your question 2)
o the name of the new variable that is to be created (SD_Y in your
  example)

All these names need to be surrounded by quotes. You define this function at the top of your do file, and whenever you need to create these local standard deviation you type

mata nneigh("x","y","touse","sd_y")

where the arguments have the appropriate names relevant to your situation.

Hope this helps,
Maarten

*----------------- begin example ---------------------
sysuse auto, clear
gen touse = !missing(price, mpg)
sort price
clear mata
mata
void nneigh(string scalar x,
            string scalar y, 
            string scalar touse, 
            string scalar sd_y) {
	data = .
	st_view(data,.,(x,y), touse)
	k=ceil(rows(data)^0.5/2)
	K = rows(data)-k
	res = J(rows(data),1,.)
	for(i=k; i<=K; i++) {
		k0 = i - k + 1
		k1 = i + k
		res[i,1] = sqrt(variance(data[|k0,1\k1,1|]))
	}
	idx = st_addvar("float", sd_y)
	st_store(.,idx,touse,res)
}
end

mata nneigh("price","mpg","touse","sd_mpg")
*--------------------- end example ----------------------------
(For more on how to use examples I sent to the Statalist, see
http://home.fsw.vu.nl/m.buis/stata/exampleFAQ.html )

-----------------------------------------
Maarten L. Buis
Department of Social Research Methodology
Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands

visiting address:
Buitenveldertselaan 3 (Metropolitan), room N515

+31 20 5986715

http://home.fsw.vu.nl/m.buis/
-----------------------------------------


      

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index