# Re: st: Mata: Recursive structures and pointers

 From wgould@stata.com (William Gould, Stata) To statalist@hsphsun2.harvard.edu Subject Re: st: Mata: Recursive structures and pointers Date Thu, 18 May 2006 11:03:06 -0500

Uli Kohler and Magdalena Luniak, <kohler@wz-berlin.de> and
<luniak@wz-berlin.de>, want to implement "list processing" in Mata.
I put "list processing" in quotes only because some of you may not have
heard of the term.  In programming, a list is a set of elements, linked
one to the other.  In the jargon of programming, Uli and Magdalena
want to create a forward linked list.  This is a great *ADVANCED* programming
problem.

Uli and Magdalena included code and basically asked, "What's wrong?".
I have only glanced at their code, got the gist of it, and then set about
writing my own.  I apologize because that means more work for them, but
less for me.  I hope that I will answer all of their questions but, if I
do not, I ask them to point me in the right direction.

Statement of problem
Solution without explanation
Result of execution
Explanation

Statement of problem
--------------------

As an exercise, create a forward linked list of a set of n points.
Let vector seq contain real values.  The points to be linked are
(seq[1], seq[1), (seq[2], seq[2]), ...

Solution without explanation
----------------------------

mata:

mata set matastrict on

struct listelem {
real scalar                              a, b
pointer(struct listelem scalar) scalar   next
}

struct listelem scalar trythis(real vector seq)
{
real scalar                              i
struct listelem scalar                   first
pointer(struct listelem scalar) scalar   p

first.a = first.b = seq[1]

p = &first
for (i=2; i<=length(seq); i++) {
p->next = &(listelem())
p       = p->next
p->a = p->b = seq[i]
}

return(first)
}

void displaylist(struct listelem scalar li)
{
real scalar                              i
pointer(struct listelem scalar) scalar   p

p = &li
for (i=0; p!=NULL; i++) {
printf("%g.  (%g, %g)\n", i, p->a, p->b)
p = p->next
}
}

void testit()
{
struct listelem scalar    first

first = trythis((1,2,3,4))

displaylist(first)
}

testit()
end

Result of execution
-------------------

: testit()
0.  (1, 1)
1.  (2, 2)
2.  (3, 3)
3.  (4, 4)

Explanation
-----------

First, my solution is for demonstration purposes only.  It has all the
elements to understand and manipulate linked lists, but it is not
organized the way I would organize real code, where I would want easy-to-use
subroutines to perform common tasks for me.  I'll come back to that later.

My definition of the structure -listelem- is the same as U&M's:

struct listelem {
real scalar                              a, b
pointer(struct listelem scalar) scalar   next
}

Elements -a- and -b- contain the data, and element -next- points to
(links to) the next element.  -next- is NULL when there is no next element.

My program to define the list for a given vector -seq- is

struct listelem scalar trythis(real vector seq)
{
real scalar                              i
struct listelem scalar                   first
pointer(struct listelem scalar) scalar   p

first.a = first.b = seq[1]

p = &first
for (i=2; i<=length(seq); i++) {
p->next = &(listelem())
p       = (p->next)
p->a = p->b = seq[i]
}

return(first)
}

In this program, -first- is the head of the list.  The lines

first.a = first.b = seq[1]

define it.  The next part defines the remaining members:

p = &first
for (i=2; i<=length(seq); i++) {
p->next = &(listelem())
p       = p->next
p->a = p->b = seq[i]
}

In this loop, p is a pointer to the last structure defined.  I obtain the next
structure by coding

p->next = &(listelem())

and then I reset p to point to that and define it,

p       = p->next
p->a = p->b = seq[i]

The only tricky thing here is the line

p->next = &(listelem())

When I first coded it, I omitted the parantheses,

p->next = &listelem()

and my program did not work.  I was, for a time, convinced there was a bug
in Mata.  There was no bug, however.  Without the parens, the line stores
in p->next the address of the function listelem().  I wanted to record the
address of the result recorded by the function.

Understand the logic:

set p to point to the last structure defined
for (i=2; i<=length(seq); i++) {
set p->next to point to a new listelem structure
fill in the new listelem structure
set p = p->next
}

The line

p->next = &(listelem())

is not only tricky sytactically, the line

set p->next to point to a new listelem structure

is tricky conceptually.  Every element of the list must be a unique structure
with its own location in memory.  It would not do to use a preexisting
structure scalar for p->next, because in that case, we would be using the
same structure over and over again.

That went by too fast.  Let's pretend I made that error.  Let's pretend

p = &first
for (i=2; i<=length(seq); i++) {
p->next = &second
p       = p->next
p->a = p->b = seq[i]
}

where second was declared as a structure listelem scalar.  Now let's
trace the loop.  The first time, i==2.  We store the address of -second-
in p->next.  Say that is 0x50010.  We fill in the structure at 0x50010.
Now we move to i=3, and we are working on p=0x50010.  We fill in the address
of second in p->next.  That is 0x50010 (the same structure!).  We keep
doing this, and we end up with a linked list that is the same structure
over and over again:

first    ->  second  -> +
|              |
+--------------+

when what we wanted was

first -> second -> third -> fourth -> <nothing>

I hate things like

p->next = &(listelem())

because I forget to do them.

I already told you the above is not how I would have written this problem
for production code.  What I would have done is introduce an easy and
safe to use subroutine:

pointer(struct listelem scalar) scalar void insert_into_list(
pointer(struct listelem scalar) scalar where,
struct listelem scalar new)
{
struct listelem scalar     copy

copy  = new

if (el != NULL) {
copy.next  = el->next
el->next   = &copy
}
return(&copy)
}

To use this routine, the user codes

loc = insert_into_list(where, newstruct)

where -where- is a pointer and -newstruct- is the new structure istself.

insert_into_list() first copies -new- into -copy-, a struct listlem scalar.
I know I warned against using structure scalars in pointer contexts, but
what I meant was to warn against using them repeatedly, such as in a loop.
Using them once is perfectly safe.  Here "once" means "once per invocation of
the function" because variables declared inside a function are guaranteed by
Mata to be new on each on each invocation.

Anyway, the point of the line

copy  = new

is to guard against the CALLER of insert_into_list() using the same structure
over and over again, each time with different values.

The next neat thing about insert_into_list() is that -where- can be NULL
or a structure pointer.  I can use insert_into_list() to create the head
of the list, or a subsequent element.

The final neat thing about insert_into_list() is that, if -where- is a
structure pointer, it can be the last structure (add onto the tail of the
list) or a middle structure (add into the middle of the list).
(By the way, I do not bother to code -copy.next = NULL- because I know that
Mata already filled in all of its SCALAR values with missing, and missing for
a pointer is NULL.)

copy_into_list() returns the point at which -newstruct- is inserted.
In some cases, the caller will want that, and in others, the caller won't.
In those cases, the caller can code

(void) insert_into_list(where, newstruct)

Now with this neat and safe routine, I can rewrite trythis():

struct listelem scalar trythis(real vector seq)
{
real scalar                              i
struct listelem scalar                   el
pointer(struct listelem scalar) scalar   head, p

el.a = el.b = seq[1]

for (i=2; i<=length(el); i++) {
el.a = el.b = sew[i]
p = insert_into_list(p, el)
}