Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: mata courses?


From   "Kallimanis, Bellinda" <bkallimanis@fmhi.usf.edu>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: mata courses?
Date   Wed, 19 Apr 2006 11:28:33 -0400

Hi, 

Thank you William for the Mata example. I too have been wondering what
I'm missing out on by not using Mata. Seeing an example such as the one
provided has certainly provided me with motivation to give the Mata
manual another go. 

Thank you!!

Bellinda


-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of William
Gould, Stata
Sent: Wednesday, April 19, 2006 10:53 AM
To: statalist@hsphsun2.harvard.edu
Subject: Re: st: mata courses?

David Airey <david.airey@Vanderbilt.Edu> asked, 

> [...] Will Stata offer Mata courses, and if so, what kind of
prerequisite
> knowledge will be required?  [...]

Probably.  Yes.  I'm the one writing it and I keep going back and forth
on whether to make it a course first and then a book, or jump directly 
to the book.  As you can tell, I'm not as far along as I should be.

Prerequisites will be Stata do-file (not ado-file) programming.
NC-152 is *NOT* required; NC-151 would be more than sufficient, and 
the minimum is somwhere between NC-101 and NC-151.

In the meantime, I am writing a column on Mata in the Stata Journal,
which 
I hope helps, and I'm answering questions here on Statalist, and I'm
trying
to give more than the minimum answer.  

I invite questions on my Statalist answers, even when they might be
considered
silly, such as "I saw you used corr(Variance(X,1)).  What is Variance()
and
what is the 1 doing there?"

The details of Mata are well documented in the manual and the on-line
help.
What is missing is motivation and application.

Mata serves thee purposes in Stata:  

    1.  Once you get the hang of it, there are some problems (such as 
        Marcello Pagano's pairwise correlation problem) for which 
        Mata provides the easiest solution.  It is important to realize
        that Mata can be used interactively (no real programming
required),
        and that it can be combined with Stata, and with Stata's older 
        -matrix- programming language.

    2.  Mata is a full-fledged matrix programming language, with the
emphasis
        on matrix.

    3.  Mata is a full-fledged programming language, and to heck with 
        matrix.

I have listed these in the order of importance to most users.  For us
here at
StataCorp, the order is 3-2-1.  We intend to implement most future
additions
to Stata using Mata.  I would say all, but I know there will be an
exception.
If I could say all, then that would mean that finally, users would have
complete equality with developers here at StataCorp.  That has been a
long-term goal.

In terms of (3), Mata will be as important a development in Stata as
ado-files
were.  You have already seen some of the payoff:  Commands -adoupdate-
and
-hsearch- would never have happened were it not for Mata.

But I want to emphasize (1) and (2), and especially (1).  In terms of
(1), 
Mata will not change your life, but it will make it easier.  Let me give

one example.

I have data,

        . tabulate cat

                cat |      Freq.     Percent        Cum.
        ------------+-----------------------------------
                  1 |         60       62.50       62.50
                  2 |         23       23.96       86.46
                  3 |          6        6.25       92.71
                  4 |          7        7.29      100.00
        ------------+-----------------------------------
              Total |         96      100.00

I have theory that says half the data should be in category 1, half 
the remainder in category 2, half again in 3, and the remainder in 4.
That is, the expected counts in the cells are (48, 24, 12, 12).  

Can I reject at the 5% level that the data is from the distribution 
suggested by theory?

The chi-squared test is easy enough to perofrm but look around and you
will
find nothing in Stata proper that will answer that question.  That's
absurd,
but true.  Look around more and you'll find user-written routines.  Nick
Cox,
I believe, has written one.

The chi-squared statisic is simple enough conceptually; it is

                         4    (obs_i-exp_i)^2
            chi^2(3) =  Sum   ---------------
                        i=1        exp_i

Which in this case is (60-48)^2/48 + (23-24)^2/24 + ...

In Mata, we could calculate thusly, 

        : obs = (60\ 23\ 6\ 7)

        : exp = (48\ 24\ 12\ 12)

        : sum( (obs-exp):^2 :/ exp )
          8.125

        : chi2tail(3, 8.125)
          .0434977514

That's a pretty easy solution.  The worse part of it was entering the
data,
but we can solve that:


        . tabulate cat, matcell(obs)
          (output omitted)

        . mata:
	--------------------------------------- mata (type end to exit)
-----
        : obs = st_matrix("obs")

        : exp = (48\ 24\ 12\ 12)

        : sum( (obs-exp):^2 :/ exp )
          8.125

        : end
	
---------------------------------------------------------------------

The -matcell()- option of -tabulate- saved the counts as a Stata matrix.
-st_matrix()- grabbed the Stata matrix and saved it as a Mata matrix.

We can also change our "program" to calculate the expected number.
Rather 
than counting on our fingers and then typing 


        : exp = (48\ 24\ 12\ 12)

We could do the following:

	: N = sum(obs)
	: exp = (N \ N/2 \ N/4 \ N-N/2-N/4)

or

	: N = sum(obs)
	: exp = (N \ N/2 \ N/4)
	: exp = exp \ N-sum(exp)

and then our entire solution would be, 
	
        . tabulate cat, matcell(obs)

	. mata:
        : obs = st_matrix("obs")

	: N = sum(obs)
	: exp = (N \ N/2 \ N/4)
	: exp = exp \ N-sum(exp)

        : sum( (obs-exp):^2 :/ exp )

        : end

Forgive the long aside.  The point of the above is that Mata is worth 
learning and that you do not have to be a superprogrammer to use it.

I apologize for the delay in the NetCourse/Book.  In the meantime, I
recommend
my column in the SJ.  I think I might even use the example above as part
of
the next one.  

In the meantime, the statements and functions of Mata are indeed
powerful, 
and it is useful to read about colon-operators and sum().  To find the
first
help file, type -help mata-, then click on [M-2], then click on
op_colon.
To find sum(), type -help mata-, click on [M-4], then on utility, then
on 
sum(). 

-- Bill
wgould@stata.com
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index