Uli Kohler and Magdalena Luniak, <kohler@wz-berlin.de> and
<luniak@wz-berlin.de>, want to implement "list processing" in Mata.
I put "list processing" in quotes only because some of you may not have
heard of the term. In programming, a list is a set of elements, linked
one to the other. In the jargon of programming, Uli and Magdalena
want to create a forward linked list. This is a great *ADVANCED* programming
problem.
Uli and Magdalena included code and basically asked, "What's wrong?".
I have only glanced at their code, got the gist of it, and then set about
writing my own. I apologize because that means more work for them, but
less for me. I hope that I will answer all of their questions but, if I
do not, I ask them to point me in the right direction.
My comments appear under the headings,
Statement of problem
Solution without explanation
Result of execution
Explanation
Statement of problem
--------------------
As an exercise, create a forward linked list of a set of n points.
Let vector seq contain real values. The points to be linked are
(seq[1], seq[1), (seq[2], seq[2]), ...
Solution without explanation
----------------------------
mata:
mata set matastrict on
struct listelem {
real scalar a, b
pointer(struct listelem scalar) scalar next
}
struct listelem scalar trythis(real vector seq)
{
real scalar i
struct listelem scalar first
pointer(struct listelem scalar) scalar p
first.a = first.b = seq[1]
p = &first
for (i=2; i<=length(seq); i++) {
p->next = &(listelem())
p = p->next
p->a = p->b = seq[i]
}
return(first)
}
void displaylist(struct listelem scalar li)
{
real scalar i
pointer(struct listelem scalar) scalar p
p = &li
for (i=0; p!=NULL; i++) {
printf("%g. (%g, %g)\n", i, p->a, p->b)
p = p->next
}
}
void testit()
{
struct listelem scalar first
first = trythis((1,2,3,4))
displaylist(first)
}
testit()
end
Result of execution
-------------------
: testit()
0. (1, 1)
1. (2, 2)
2. (3, 3)
3. (4, 4)
Explanation
-----------
First, my solution is for demonstration purposes only. It has all the
elements to understand and manipulate linked lists, but it is not
organized the way I would organize real code, where I would want easy-to-use
subroutines to perform common tasks for me. I'll come back to that later.
My definition of the structure -listelem- is the same as U&M's:
struct listelem {
real scalar a, b
pointer(struct listelem scalar) scalar next
}
Elements -a- and -b- contain the data, and element -next- points to
(links to) the next element. -next- is NULL when there is no next element.
My program to define the list for a given vector -seq- is
struct listelem scalar trythis(real vector seq)
{
real scalar i
struct listelem scalar first
pointer(struct listelem scalar) scalar p
first.a = first.b = seq[1]
p = &first
for (i=2; i<=length(seq); i++) {
p->next = &(listelem())
p = (p->next)
p->a = p->b = seq[i]
}
return(first)
}
In this program, -first- is the head of the list. The lines
first.a = first.b = seq[1]
define it. The next part defines the remaining members:
p = &first
for (i=2; i<=length(seq); i++) {
p->next = &(listelem())
p = p->next
p->a = p->b = seq[i]
}
In this loop, p is a pointer to the last structure defined. I obtain the next
structure by coding
p->next = &(listelem())
and then I reset p to point to that and define it,
p = p->next
p->a = p->b = seq[i]
The only tricky thing here is the line
p->next = &(listelem())
When I first coded it, I omitted the parantheses,
p->next = &listelem()
and my program did not work. I was, for a time, convinced there was a bug
in Mata. There was no bug, however. Without the parens, the line stores
in p->next the address of the function listelem(). I wanted to record the
address of the result recorded by the function.
Understand the logic:
set p to point to the last structure defined
for (i=2; i<=length(seq); i++) {
set p->next to point to a new listelem structure
fill in the new listelem structure
set p = p->next
}
The line
p->next = &(listelem())
is not only tricky sytactically, the line
set p->next to point to a new listelem structure
is tricky conceptually. Every element of the list must be a unique structure
with its own location in memory. It would not do to use a preexisting
structure scalar for p->next, because in that case, we would be using the
same structure over and over again.
That went by too fast. Let's pretend I made that error. Let's pretend
my code read:
p = &first
for (i=2; i<=length(seq); i++) {
p->next = &second
p = p->next
p->a = p->b = seq[i]
}
where second was declared as a structure listelem scalar. Now let's
trace the loop. The first time, i==2. We store the address of -second-
in p->next. Say that is 0x50010. We fill in the structure at 0x50010.
Now we move to i=3, and we are working on p=0x50010. We fill in the address
of second in p->next. That is 0x50010 (the same structure!). We keep
doing this, and we end up with a linked list that is the same structure
over and over again:
first -> second -> +
| |
+--------------+
when what we wanted was
first -> second -> third -> fourth -> <nothing>
I hate things like
p->next = &(listelem())
because I forget to do them.
I already told you the above is not how I would have written this problem
for production code. What I would have done is introduce an easy and
safe to use subroutine:
pointer(struct listelem scalar) scalar void insert_into_list(
pointer(struct listelem scalar) scalar where,
struct listelem scalar new)
{
struct listelem scalar copy
copy = new
if (el != NULL) {
copy.next = el->next
el->next = ©
}
return(©)
}
To use this routine, the user codes
loc = insert_into_list(where, newstruct)
where -where- is a pointer and -newstruct- is the new structure istself.
insert_into_list() first copies -new- into -copy-, a struct listlem scalar.
I know I warned against using structure scalars in pointer contexts, but
what I meant was to warn against using them repeatedly, such as in a loop.
Using them once is perfectly safe. Here "once" means "once per invocation of
the function" because variables declared inside a function are guaranteed by
Mata to be new on each on each invocation.
Anyway, the point of the line
copy = new
is to guard against the CALLER of insert_into_list() using the same structure
over and over again, each time with different values.
The next neat thing about insert_into_list() is that -where- can be NULL
or a structure pointer. I can use insert_into_list() to create the head
of the list, or a subsequent element.
The final neat thing about insert_into_list() is that, if -where- is a
structure pointer, it can be the last structure (add onto the tail of the
list) or a middle structure (add into the middle of the list).
(By the way, I do not bother to code -copy.next = NULL- because I know that
Mata already filled in all of its SCALAR values with missing, and missing for
a pointer is NULL.)
copy_into_list() returns the point at which -newstruct- is inserted.
In some cases, the caller will want that, and in others, the caller won't.
In those cases, the caller can code
(void) insert_into_list(where, newstruct)
Now with this neat and safe routine, I can rewrite trythis():
struct listelem scalar trythis(real vector seq)
{
real scalar i
struct listelem scalar el
pointer(struct listelem scalar) scalar head, p
el.a = el.b = seq[1]
head = insert_into_list(NULL, el)
p = head
for (i=2; i<=length(el); i++) {
el.a = el.b = sew[i]
p = insert_into_list(p, el)
}
return(*head)
}
-- Bill
wgould@stata.com
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/