Sandra.Lange@whu.edu

statalist@hsphsun2.harvard.edu

st: Nearest neighbor distance

Tue, 23 Aug 2011 17:27:03 +0000

I would like to modify the code of the stata command 'nearest' to identify the closest neighbor (from a defined set of observations) for specific observations in a panel data set. I work with an unbalanced sample of firms which ranges over a time period of about 20 years. The dataset contains the portfolio of subsidiaries of each firm in each year and consists of over 100,000 observations (one observation = subsidiary of a firm in one year). In addition, several country characteristics were merged into the dataset. Below you find an excerpt to get an impression of how the data looks like: firm_id unit_id year status country countryname pdi idv mas uai subyears nearest nearest_id 100 15 1990 U 215 Japan 54 46 95 92 2 100 44 1990 I 235 Russia 93 39 36 95 0 100 4 1990 U 404 Belgium 65 75 54 94 3 100 46 1990 I 408 Germany 35 67 66 65 0 100 18 1990 U 408 Germany 35 67 66 65 4 100 2 1990 U 408 Germany 35 67 66 65 4 100 38 1990 I 434 Switzerland 34 68 70 58 0 100 15 1991 U 215 Japan 54 46 95 92 3 100 44 1991 U 235 Russia 93 39 36 95 1 100 4 1991 U 404 Belgium 65 75 54 94 4 100 46 1991 U 408 Germany 35 67 66 65 7 100 18 1991 U 408 Germany 35 67 66 65 7 100 2 1991 U 408 Germany 35 67 66 65 7 100 38 1991 U 434 Switzerland 34 68 70 58 1 100 54 1991 I 429 Poland 68 60 64 93 0 100 53 1991 I 429 Poland 68 60 64 93 0 100 51 1991 I 430 Portugal 63 27 31 104 0 . . . . . . . . . . . ... 101 181 1985 U 215 Japan 54 46 95 92 1 101 150 1985 U 236 Saudi-Arabia 80 38 52 68 1 101 146 1985 U 237 Singapur 74 20 48 8 1 101 140 1985 U 404 Belgium 65 75 54 94 2 101 155 1985 U 408 Germany 35 67 66 65 3 101 83 1985 U 408 Germany 35 67 66 65 3 101 84 1985 U 408 Germany 35 67 66 65 3 101 133 1985 U 411 France 68 71 43 86 2 101 147 1985 U 411 France 68 71 43 86 2 101 222 1985 I 438 Spain 34 51 42 86 0 . . . . . . . . . . . More precisely, this is what I would like to do: 1. for each observation with status 'I' (Investment), I am looking for the closest country in terms of cultural dimensions (pdi, idv, mas, uai) in the firms existing portfolio (observations with status 'U'). I suppose I could use the code for 'nearest'; however, I probably would have to change it slightly, because the 'nearest' command finds the closest neighbor in N; however, I am looking for the closest neighbor in _n which should be somehow specified as the existing portfolio (all subsidiary-year observations with status ==U). - Is it possible to modify the code of the command 'nearest' for that in the first place? Does someone have a suggestion? - How should I deal with the fact that I have multiple dimensions in the code of the command 'nearest'? I want to use the Kogut&Singh index for calculating the distance based on these four dimensions. At some point I would have to indicate that, but I do not know where. 2. A slight modification of 1.: for each observation with status 'I' (Investment), I am looking for the closest country (in the firms existing portfolio) in terms of cultural dimensions (pdi, idv, mas, uai) AND subyears. If subyears < 5, then the country should not qualify for being selected as the closest neighbor. In this case the second closest neighbor should be chosen and checked if subyears >= 5. Otherwise the third closest neighbor should be investigated, and so on. I appreciate your input! Thanks, Sandra * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

