 From wgould@stata.com (William Gould, Stata) To statalist@hsphsun2.harvard.edu Subject Re: st: Mata Structures Date Mon, 08 May 2006 13:11:59 -0500

```Uli and Magdalena Luniak <luniak@wz-berlin.de> asked about structures
in Mata.

They ran into a bug.  Had they coded a little more efficiently, they never
would have run into it, but that does not excuse the bug.

Where's what they did:  They had a vector of structures:  v[1] was the
first struct, v[2] was the second.  The filled in a third structure, mypoint,
and then stored mypoint in v[1]:

v[1] = mypoint.

All went well.  The then filled in mypoint with a different set of values,
and coded

v[2] = mypoint.

That worked well, too, except that v[1] also changed, and it changed to be
the same as v[2], namely, mypoint!

Uli and Magdalena made no errors; Mata did.  Rather than storing a copy of
mypoint in v[1], and then later, a copy of mypoint in v[2], Mata mistakenly
stored mypoint itself in v[1] and v[2].  v[1], v[2], and mypoint all became
the same object.

I have just examined this bug in detail.  It occurs when the RHS is a
structure and the LHS is an element of a structure vector or matrix, i.e.,
statements of the form,

v[i] = mypoint

v[i,j] = mypoint

It does *NOT* occur when the LHS is a scalar,

v = mypoint

Until the bug is fixed, the workaround is to make the copy that Mata forgot
to make:

Rather than code
v[i] = mypoint
code
v[i] = copyof(mypoint)

and rather than code
v[i,j] = mypoint
code
v[i,j] = copyof(mypoint)

where function copyof() is coded

transmorphic copyof(transmorphic original)
{
transmorphic         copy

copy = original
return(copy)
}

In Uli's and Magdalena's case, they have a second alternative.  They can make
their code more efficient and not provoke the bug.  Their original code reads,

struct point vector function help(real vector seq)
{
real scalar length
length = length(seq)
struct point vector v
v = point(length)
real scalar i
struct point scalar mypoint
for (i=1; i<=length; i++) {
mypoint.a=seq[i]
mypoint.b=seq[i]
v[i] = mypoint
}
return(v)
}

I prefer all the declarations up top.  It is just a matter of style, and not
even good style vs. bad style, but indulge me, and let me change their code to
my preferred style before getting to my point:

struct point vector function help(real vector seq)
{
real scalar          i
real scalar          length
struct point vector  v
struct point scalar  mypoint

length = length(seq)
v      = point(length)
for (i=1; i<=length; i++) {
mypoint.a=seq[i]
mypoint.b=seq[i]
v[i] = mypoint
}
return(v)
}

Style aside, a more efficient version of thier code reads,

struct point vector function help(real vector seq)
{
real scalar          i
real scalar          length
struct point vector  v

length = length(seq)
v      = point(length)
for (i=1; i<=length; i++) {
v[i].a = seq[i]
v[i].b = seq[i]
}
return(v)
}

Did you know you could do that?  Refer to v[i].a and v[i].b?  On the
left or on the right?

Pretend v[i] had a third element, a vector c.  Then you could refer to
v[i].c[j] and v[j].c[i] (which would be different things).

I know, I'm changing the subject.  We will fix the bug, but it will not be in
the next executable update.  It will be in the one after that.

-- Bill
wgould@stata.com
