RE: st: Computing local variance

 From Maarten buis To stata list Subject RE: st: Computing local variance Date Fri, 20 Feb 2009 22:57:23 +0000 (GMT)

```--- Benjamin Villena Roldan wrote:
> I'm dealing with the following problem. I have two continuous variables
> X and Y. I'm trying to do the following:
> 1. Sort the data using X
> 2. For each observation of X, I compute the local variance of Y by a
> nearest neighborhood approach. I take the 2k closest observations to an
> observation X[i], i.e. using observations between X[i-k+1] and X[i+k].
> 3. I'm implementing this approach by using a forvalue loop such as
<snip>
> 1. I need to do the same procedure several times and it is very
> time-consuming. Is there a way to speed up the execution? How much time
> would I gain if I implement a similar code in C++?
> 2. There are missing observations in X and Y, how can I restrict the sort
> command to deal with nonmissing values of both variables. A simple
> -keep if X!=. & Y!=.
> Can I do it without dropping data?

You don't have to move to C++, you can use Mata instead. The big
advantage of Mata is that it plays well with Stata. In the example
below I have defined a Mata function -nneigh()- which computes the
local standard deviation. It takes 4 arguments, which are in order:

o the variable on which you sort (X in your example),
o the variable whose variance you want (Y in your example),
o a variable which is 1 when you want to include it in the analysis
and 0 if you want to ignore it (for example because it has missing
values on either X or Y, i.e. this answers your question 2)
o the name of the new variable that is to be created (SD_Y in your
example)

All these names need to be surrounded by quotes. You define this function at the top of your do file, and whenever you need to create these local standard deviation you type

mata nneigh("x","y","touse","sd_y")

where the arguments have the appropriate names relevant to your situation.

Hope this helps,
Maarten

*----------------- begin example ---------------------
sysuse auto, clear
gen touse = !missing(price, mpg)
sort price
clear mata
mata
void nneigh(string scalar x,
string scalar y,
string scalar touse,
string scalar sd_y) {
data = .
st_view(data,.,(x,y), touse)
k=ceil(rows(data)^0.5/2)
K = rows(data)-k
res = J(rows(data),1,.)
for(i=k; i<=K; i++) {
k0 = i - k + 1
k1 = i + k
res[i,1] = sqrt(variance(data[|k0,1\k1,1|]))
}
st_store(.,idx,touse,res)
}
end

mata nneigh("price","mpg","touse","sd_mpg")
*--------------------- end example ----------------------------
(For more on how to use examples I sent to the Statalist, see
http://home.fsw.vu.nl/m.buis/stata/exampleFAQ.html )

-----------------------------------------
Maarten L. Buis
Department of Social Research Methodology
Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands

Buitenveldertselaan 3 (Metropolitan), room N515

+31 20 5986715

http://home.fsw.vu.nl/m.buis/
-----------------------------------------

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```