Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Re: Identify 5 closest observations of a variable


From   Gordon Hughes <G.A.Hughes@ed.ac.uk>
To   statalist@hsphsun2.harvard.edu
Subject   st: Re: Identify 5 closest observations of a variable
Date   Tue, 18 Sep 2012 10:39:12 +0100

This seems to be a slightly bizarre exercise and I doubt that there is any simple solution. However, for a brute force method consider the following.

Assume that you have sorted the data on region & var1 (as in your example). What you are trying to do is (a) to identify a window of 6 observations s ... s+5 (s <= t <= s+5) for which the sum{var1[s .. s+5]} minus 6*var1[t] is minimised, and (b) to calculate sum{var2[s .. s+5]}-var2[t]. You need to write the code to process each region separately and then be sure that you get the special cases at the beginning and end of each regional sample correct - i.e. for t=1 (within the region), s must be 1; for t=N (the last observation), s=t-5; etc.

It is not difficult but it is tedious to program because of all the special cases, so it will only be worthwhile if your dataset is large enough. You should document carefully what you have done because there is a large risk of making mistakes and/or being unable to replicate your results.

Gordon Hughes
g.a.hughes@ed.ac.uk

================================

Dear Statalisters,

The data below shows three variables:- region, var1 and var2. For each
observation in a given region, I want the 5 closest observations based
on var1 (not counting the observation in question). I basically need
the average value of var2 for the 5 observations that are identified.
I don't have any missing values in my data for all three variables
below. I can also confirm that I have a few regions with less than 6
observations each; hence these regions will be ignored. I am using
Stata 12.

Thanks,

Joe

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index