[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Maarten buis <maartenbuis@yahoo.co.uk> |

To |
stata list <statalist@hsphsun2.harvard.edu> |

Subject |
RE: st: Computing local variance |

Date |
Fri, 20 Feb 2009 22:57:23 +0000 (GMT) |

--- Benjamin Villena Roldan wrote: > I'm dealing with the following problem. I have two continuous variables > X and Y. I'm trying to do the following: > 1. Sort the data using X > 2. For each observation of X, I compute the local variance of Y by a > nearest neighborhood approach. I take the 2k closest observations to an > observation X[i], i.e. using observations between X[i-k+1] and X[i+k]. > 3. I'm implementing this approach by using a forvalue loop such as <snip> > So, I have two questions/problems about this code > 1. I need to do the same procedure several times and it is very > time-consuming. Is there a way to speed up the execution? How much time > would I gain if I implement a similar code in C++? > 2. There are missing observations in X and Y, how can I restrict the sort > command to deal with nonmissing values of both variables. A simple > answer is to do > -keep if X!=. & Y!=. > Can I do it without dropping data? You don't have to move to C++, you can use Mata instead. The big advantage of Mata is that it plays well with Stata. In the example below I have defined a Mata function -nneigh()- which computes the local standard deviation. It takes 4 arguments, which are in order: o the variable on which you sort (X in your example), o the variable whose variance you want (Y in your example), o a variable which is 1 when you want to include it in the analysis and 0 if you want to ignore it (for example because it has missing values on either X or Y, i.e. this answers your question 2) o the name of the new variable that is to be created (SD_Y in your example) All these names need to be surrounded by quotes. You define this function at the top of your do file, and whenever you need to create these local standard deviation you type mata nneigh("x","y","touse","sd_y") where the arguments have the appropriate names relevant to your situation. Hope this helps, Maarten *----------------- begin example --------------------- sysuse auto, clear gen touse = !missing(price, mpg) sort price clear mata mata void nneigh(string scalar x, string scalar y, string scalar touse, string scalar sd_y) { data = . st_view(data,.,(x,y), touse) k=ceil(rows(data)^0.5/2) K = rows(data)-k res = J(rows(data),1,.) for(i=k; i<=K; i++) { k0 = i - k + 1 k1 = i + k res[i,1] = sqrt(variance(data[|k0,1\k1,1|])) } idx = st_addvar("float", sd_y) st_store(.,idx,touse,res) } end mata nneigh("price","mpg","touse","sd_mpg") *--------------------- end example ---------------------------- (For more on how to use examples I sent to the Statalist, see http://home.fsw.vu.nl/m.buis/stata/exampleFAQ.html ) ----------------------------------------- Maarten L. Buis Department of Social Research Methodology Vrije Universiteit Amsterdam Boelelaan 1081 1081 HV Amsterdam The Netherlands visiting address: Buitenveldertselaan 3 (Metropolitan), room N515 +31 20 5986715 http://home.fsw.vu.nl/m.buis/ ----------------------------------------- * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**RE: st: Computing local variance***From:*"Benjamin Villena Roldan" <bvillena@troi.cc.rochester.edu>

- Prev by Date:
**Re: st: re: controlling output in the results window** - Next by Date:
**RE:RE: st: Computing local variance** - Previous by thread:
**st: Meaning of ! following * at the top of an ado file** - Next by thread:
**RE: st: Computing local variance** - Index(es):

© Copyright 1996–2016 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |