[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"David M. Drukker" <ddrukker@stata.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: calculating nearest neighbors; looping back to the beginningof observations |

Date |
Thu, 11 Oct 2007 09:54:54 -0500 (CDT) |

Sarah Cohodes <sarah.cohodes@gmail.com> asked how to calculate a summary

statistic of a variable for the k nearest neighbors.

Austin Nichols <austinnichols@gmail.com> replied with a good solution.

We put the minindex() function into Mata to handle problems like Sarah's.

Below I outline a possible solution method using the minindex() function in

Mata.

The two advantages of this solution are that it is fast and that the

minindex() function returns exactly what you want, a vector of the indices

of the smallest distances.

I have appended a version of the code for a problem like Sarah's below.

The code

1) simulates some data;

2) copies the variables into Mata vectors; and

3) for each observation it

a) finds the vector of indices of the closest observations,

b) extracts the vector of the closest observations from y, and

c) calculates the mean of the closest observations in y.

To illustrate how the code works, ind, y[ind] and mean(y[ind]) are displayed. In adopting this code for her own use, Sarah could remove

these display statements.

To keep it simple, tied distances would expand the number of indices

returned by minindex() as discussed in help mata minindex().

I hope that this helps.

--David

ddrukker@stata.com

---------------------------Begin example code----------------------------------

version 10

clear all

set seed 12345

set obs 1000

gen x1 = uniform()

gen x2 = uniform()

gen y = invnormal(uniform()) + x1^2 + x2^2

mata:

x1 = st_data(., "x1") // put x1 variable into x1 vector

x2 = st_data(., "x2") // put x2 variable into x2 vector

y = st_data(., "y") // put y variable into y vector

n = rows(x1)

ind = . // initialize ind vector

w = . // initialize w vector

// loop over observations

// I am working over first 3

// observations for illustration

// purposes

// change the 3 to n for the full

// problem

for(i=1; i<=3; ++i) {

// calculate distance for i(th)

// observation

d = sqrt((x1:-x1[i]):^2 + (x2:-x2[i]):^2)

//put vector of minimum indices into

// ind, if no ties ignore w, if ties

// use w to handle ties

minindex(d, 5, ind, w)

// display ind

"ind is "

ind

// display corresponding values from y

"y extract is "

y[ind]

// calculate mean of appropriate

// values of y

mean(y[ind])

}

end

---------------------------End example code----------------------------------

*

* For searches and help try:

* http://www.stata.com/support/faqs/res/findit.html

* http://www.stata.com/support/statalist/faq

* http://www.ats.ucla.edu/stat/stata/

**References**:**st: calculating nearest neighbors; looping back to the beginning of observations***From:*"Sarah Cohodes" <sarah.cohodes@gmail.com>

- Prev by Date:
**st: Controlling output in Results window and/or in log file** - Next by Date:
**Re: st: Controlling output in Results window and/or in log file** - Previous by thread:
**Re: st: calculating nearest neighbors; looping back to the beginning of observations** - Next by thread:
**st: Prais-Winsten regression: problem with coefficient estimates** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |