Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

AW: st: Nearest neighbor distance

From	"Lange, Sandra" <[email protected]>
To	"[email protected]" <[email protected]>
Subject	AW: st: Nearest neighbor distance
Date	Wed, 24 Aug 2011 09:56:00 +0000

Dear Nick,
Thank you for your reply. 
You are definitely right. I should have indicated the sources of the program I refer to.
This sounds indeed like a bigger issue. However, at least I may use the code from 'nearest' as a starting point for writing a code that solves the problem I stated. 
The Kogut&Singh index is a composite measure of distance of Hofestede's cultural dimensions.



-----Ursprüngliche Nachricht-----
Von: [email protected] [mailto:[email protected]] Im Auftrag von Nick Cox
Gesendet: Wednesday, August 24, 2011 9:20 AM
An: [email protected]
Betreff: Re: st: Nearest neighbor distance

-nearest- is a user-written program from SSC. You are asked to identify where user-written programs you refer to come from.

Your problem is both similar to and very different from that of
-nearest- and you would need to rewrite -nearest-. I wouldn't call that a slight modification.

-nearest- is indifferent to ties and whether nearest neighours are reflexive, i.e. A is the nearest neighbour of B, and also vice versa.
These could be bigger issues with your kind of data.

I have no idea what the Kogut and Singh index is.

Nick

On Tue, Aug 23, 2011 at 6:27 PM, Lange, Sandra <[email protected]> wrote:
> I would like to modify the code of the stata command 'nearest' to identify the closest neighbor (from a defined set of observations) for specific observations in a panel data set.
> I work with an unbalanced sample of firms which ranges over a time period of about 20 years.
> The dataset contains the portfolio of subsidiaries of each firm in each year and consists of over 100,000 observations (one observation = subsidiary of a firm in one year). In addition, several country characteristics were merged into the dataset. Below you find an excerpt to get an impression of how the data looks like:
> firm_id unit_id year    status  country countryname     pdi     idv     
> mas     uai     subyears        nearest nearest_id
> 100     15      1990    U       215     Japan           54      46      
> 95      92          2
> 100     44      1990    I       235     Russia          93      39      
> 36      95          0
> 100     4       1990    U       404     Belgium 65      75      54      
> 94      3
> 100     46      1990    I       408     Germany 35      67      66      
> 65      0
> 100     18      1990    U       408     Germany 35      67      66      
> 65      4
> 100     2       1990    U       408     Germany 35      67      66      
> 65      4
> 100     38      1990    I       434     Switzerland     34      68      
> 70      58      0
> 100     15      1991    U       215     Japan           54      46      
> 95      92      3
> 100     44      1991    U       235     Russia          93      39      
> 36      95      1
> 100     4       1991    U       404     Belgium 65      75      54      
> 94      4
> 100     46      1991    U       408     Germany 35      67      66      
> 65      7
> 100     18      1991    U       408     Germany 35      67      66      
> 65      7
> 100     2       1991    U       408     Germany 35      67      66      
> 65      7
> 100     38      1991    U       434     Switzerland     34      68      
> 70      58      1
> 100     54      1991    I       429     Poland          68      60      
> 64      93      0
> 100     53      1991    I       429     Poland          68      60      
> 64      93      0
> 100     51      1991    I       430     Portugal        63      27      
> 31      104     0 .       .       .       .       .       .       .       .       .       .       .       ...
> 101     181     1985    U       215     Japan           54      46      
> 95      92      1
> 101     150     1985    U       236     Saudi-Arabia    80      38      
> 52      68      1
> 101     146     1985    U       237     Singapur        74      20      
> 48      8       1
> 101     140     1985    U       404     Belgium 65      75      54      
> 94      2
> 101     155     1985    U       408     Germany 35      67      66      
> 65      3
> 101     83      1985    U       408     Germany 35      67      66      
> 65      3
> 101     84      1985    U       408     Germany 35      67      66      
> 65      3
> 101     133     1985    U       411     France          68      71      
> 43      86      2
> 101     147     1985    U       411     France          68      71      
> 43      86      2
> 101     222     1985    I       438     Spain           34      51      
> 42      86      0 .       .       .       .       .       .       .       .       .       .       .
>
> More precisely, this is what I would like to do:
>
> 1.  for each observation with status 'I' (Investment), I am looking for the closest country in terms of cultural dimensions (pdi, idv, mas, uai) in the firms   existing portfolio (observations with status 'U'). I suppose I could use the code for 'nearest'; however, I probably would have to change it slightly, because the 'nearest' command finds the closest neighbor in N; however, I am looking for the closest neighbor in _n which should be somehow specified as the existing portfolio (all subsidiary-year observations with status ==U).
> - Is it possible to modify the code of the command 'nearest' for that in the first place? Does someone have a suggestion?
> - How should I deal with the fact that I have multiple dimensions in 
> the code of the command 'nearest'? I want to use the Kogut&Singh index 
> for calculating
>  the distance based on these four dimensions. At some point I would have to indicate that, but I do not know where.
>
> 2. A slight modification of 1.: for each observation with status 'I' (Investment), I am looking for the closest country (in the firms existing portfolio)  in terms of cultural dimensions (pdi, idv, mas, uai) AND subyears. If subyears < 5, then the country should not qualify for being selected as the closest neighbor. In this case the second closest neighbor should be chosen and checked if subyears >= 5. Otherwise the third closest neighbor should be investigated, and so on.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: Nearest neighbor distance
  - From: Maarten Buis <[email protected]>

References:
- st: Nearest neighbor distance
  - From: "Lange, Sandra" <[email protected]>
- Re: st: Nearest neighbor distance
  - From: Nick Cox <[email protected]>

Prev by Date: Re: st: robust poisson regression vs. glm with log link
Next by Date: st: post hoc test categorical survey data
Previous by thread: Re: st: Nearest neighbor distance
Next by thread: Re: st: Nearest neighbor distance
Index(es):
- Date
- Thread