# Re: st: Weighted Euclidean distances with panel data

 From Austin Nichols <[email protected]> To [email protected] Subject Re: st: Weighted Euclidean distances with panel data Date Wed, 9 Sep 2009 14:56:46 -0400

```James Cross<[email protected]>:
You are just computing a quadratic form, with weights on the diagonal
of a weighting matrix, which you can do in a number of ways. Until you
specify how to treat missing values and where the weights come from,
it is hard to offer specific advice.  But try:

clear
input prop  i a p   r
00032 1 1 100 0
00032 2 1 50  1
00032 3 1 100 100
00032 4 1 100 0
end
g diff=p-r
mkmat diff, mat(x)
mat A=I(4)
mat d=x'*A*x
g d=(p-r)^2
egen wd=sum(d), by(a)
qui reshape wide p r d*, i(a) j(i)
g red=(p1-r1)^2+(p2-r2)^2+(p3-r3)^2+(p4-r4)^2
l wd red
mat li d

On Wed, Sep 9, 2009 at 12:16 PM, James Cross<[email protected]> wrote:
> The actual calculation for each element in the vector I am looking to
> produce is simply the reference point minus the position. Interesting
> that you suggest it might be possible using -reshape-. I am not sure
> if I can go down this route as the next step, once I have produced the
> vector is to multiply it by its transpose and a diagonal vector of
> dimensional saliencies (this is part of the formula for calculating
> weighted euclidean distances over multiple dimensions). I have not
> used Mata before so I think I better start familiarising myself with
> it!
>
> Thanks,
> James
>
> 2009/9/9 Austin Nichols <[email protected]>:
>> James Cross<[email protected]> :
>> This would be fairly easy to program in Mata, but you can also
>> -reshape- to wide format and calculate the distances using -generate-
>> if you know how you are going to treat missing values.  Once you can
>> specify the actual calculations for your example, it will be easier to
>> specify a method.
>>
>> On Wed, Sep 9, 2009 at 9:12 AM, James Cross<[email protected]> wrote:
>>> Hi all,
>>>
>>> I have a large panel dataset which contains information on different
>>> actor positions on different issues (dimensions) within different
>>> legislative proposals. Each actor position has a saliency score
>>> associated with the position by which I hope to weight the importance
>>> of the issue/dimension to that actor.  In essence, I am trying to
>>> calculate the weighted Euclidean distances between each actors'
>>> position and a reference point.
>>>
>>> In order to do this I first need to create submatrices of the dataset,
>>> structured as row vectors, that contain the distances between the two
>>> points of interest for each actor for each issue/dimension for each
>>> proposal. That is, I should end up with a row vector for each actor of
>>> distances between the actors' position and the reference point. The
>>> number of elements in this row vector is determined by the number of
>>> issues in each proposal in the panel data. While I can do this for
>>> each observation individually, I am wondering if it is possible to get
>>> stata to do it automatically to save me the effort.
>>>
>>> The resulting vector will then need to be multiplied by its transpose
>>> and a diagonal matrix of issue salience for each actor but that cannot
>>> be done until I have created the individual actor distance vectors.
>>> There is also an issue with missing data in that sometimes the
>>> reference point will be missing and some actors will not have
>>> positions on all of the issues/dimensions.
>>>
>>> The data is structured as follows:
>>>
>>>
>>> proposal    issue    actor    position    ref point
>>> 04163           1     1         0                   0
>>> 04163           2     1         0                   0
>>> 04163           1     2         0                   0
>>> 04163           2     2         0                   0
>>> 00032           1    1          100                0
>>> 00032           2    1          50                  n/a
>>> 00032           3    1          100                100
>>> 00032           4    1          100                0
>>> 00032           1    2          100                0
>>> 00032           2    2          0                   n/a
>>> 00032           3    2          0                   100
>>> 00032           4    2          0                    0
>>> 00032           1    3          40                  0
>>> 00032           2    3          100                n/a
>>> 00032           3    3          100                100
>>> 00032           4    3          100                0
>>>
>>> The resulting vector would look like this for proposal 00032 actor 1:
>>> [100 n/a 0 100], and for proposal 04163 actor 1: [0 0].
>>>
>>> I am not sure if this is even possible in stata or if it is, how much
>>> programming is involved.
>>>
>>> Any suggestions welcome.