Statalist The Stata Listserver

[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: mata courses?

From (William Gould, Stata)
Subject   Re: st: mata courses?
Date   Wed, 19 Apr 2006 09:52:41 -0500

David Airey <david.airey@Vanderbilt.Edu> asked, 

> [...] Will Stata offer Mata courses, and if so, what kind of prerequisite
> knowledge will be required?  [...]

Probably.  Yes.  I'm the one writing it and I keep going back and forth
on whether to make it a course first and then a book, or jump directly 
to the book.  As you can tell, I'm not as far along as I should be.

Prerequisites will be Stata do-file (not ado-file) programming.
NC-152 is *NOT* required; NC-151 would be more than sufficient, and 
the minimum is somwhere between NC-101 and NC-151.

In the meantime, I am writing a column on Mata in the Stata Journal, which 
I hope helps, and I'm answering questions here on Statalist, and I'm trying
to give more than the minimum answer.  

I invite questions on my Statalist answers, even when they might be considered
silly, such as "I saw you used corr(Variance(X,1)).  What is Variance() and
what is the 1 doing there?"

The details of Mata are well documented in the manual and the on-line help.
What is missing is motivation and application.

Mata serves thee purposes in Stata:  

    1.  Once you get the hang of it, there are some problems (such as 
        Marcello Pagano's pairwise correlation problem) for which 
        Mata provides the easiest solution.  It is important to realize
        that Mata can be used interactively (no real programming required),
        and that it can be combined with Stata, and with Stata's older 
        -matrix- programming language.

    2.  Mata is a full-fledged matrix programming language, with the emphasis
        on matrix.

    3.  Mata is a full-fledged programming language, and to heck with 

I have listed these in the order of importance to most users.  For us here at
StataCorp, the order is 3-2-1.  We intend to implement most future additions
to Stata using Mata.  I would say all, but I know there will be an exception.
If I could say all, then that would mean that finally, users would have
complete equality with developers here at StataCorp.  That has been a
long-term goal.

In terms of (3), Mata will be as important a development in Stata as ado-files
were.  You have already seen some of the payoff:  Commands -adoupdate- and
-hsearch- would never have happened were it not for Mata.

But I want to emphasize (1) and (2), and especially (1).  In terms of (1), 
Mata will not change your life, but it will make it easier.  Let me give 
one example.

I have data,

        . tabulate cat

                cat |      Freq.     Percent        Cum.
                  1 |         60       62.50       62.50
                  2 |         23       23.96       86.46
                  3 |          6        6.25       92.71
                  4 |          7        7.29      100.00
              Total |         96      100.00

I have theory that says half the data should be in category 1, half 
the remainder in category 2, half again in 3, and the remainder in 4.
That is, the expected counts in the cells are (48, 24, 12, 12).  

Can I reject at the 5% level that the data is from the distribution 
suggested by theory?

The chi-squared test is easy enough to perofrm but look around and you will
find nothing in Stata proper that will answer that question.  That's absurd,
but true.  Look around more and you'll find user-written routines.  Nick Cox,
I believe, has written one.

The chi-squared statisic is simple enough conceptually; it is

                         4    (obs_i-exp_i)^2
            chi^2(3) =  Sum   ---------------
                        i=1        exp_i

Which in this case is (60-48)^2/48 + (23-24)^2/24 + ...

In Mata, we could calculate thusly, 

        : obs = (60\ 23\ 6\ 7)

        : exp = (48\ 24\ 12\ 12)

        : sum( (obs-exp):^2 :/ exp )

        : chi2tail(3, 8.125)

That's a pretty easy solution.  The worse part of it was entering the data,
but we can solve that:

        . tabulate cat, matcell(obs)
          (output omitted)

        . mata:
	--------------------------------------- mata (type end to exit) -----
        : obs = st_matrix("obs")

        : exp = (48\ 24\ 12\ 12)

        : sum( (obs-exp):^2 :/ exp )

        : end

The -matcell()- option of -tabulate- saved the counts as a Stata matrix.
-st_matrix()- grabbed the Stata matrix and saved it as a Mata matrix.

We can also change our "program" to calculate the expected number.  Rather 
than counting on our fingers and then typing 

        : exp = (48\ 24\ 12\ 12)

We could do the following:

	: N = sum(obs)
	: exp = (N \ N/2 \ N/4 \ N-N/2-N/4)


	: N = sum(obs)
	: exp = (N \ N/2 \ N/4)
	: exp = exp \ N-sum(exp)

and then our entire solution would be, 
        . tabulate cat, matcell(obs)

	. mata:
        : obs = st_matrix("obs")

	: N = sum(obs)
	: exp = (N \ N/2 \ N/4)
	: exp = exp \ N-sum(exp)

        : sum( (obs-exp):^2 :/ exp )

        : end

Forgive the long aside.  The point of the above is that Mata is worth 
learning and that you do not have to be a superprogrammer to use it.

I apologize for the delay in the NetCourse/Book.  In the meantime, I recommend
my column in the SJ.  I think I might even use the example above as part of
the next one.  

In the meantime, the statements and functions of Mata are indeed powerful, 
and it is useful to read about colon-operators and sum().  To find the first
help file, type -help mata-, then click on [M-2], then click on op_colon.
To find sum(), type -help mata-, click on [M-4], then on utility, then on 

-- Bill
*   For searches and help try:

© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index