Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Merge in Mata


From   Enzo Coviello <enzo.coviello@tin.it>
To   statalist@hsphsun2.harvard.edu
Subject   st: Merge in Mata
Date   Sat, 29 Dec 2012 11:48:57 +0100

Dear Users,

some month ago I submitted a similar question to the Stata Tech support. The reply has been very useful.
See below.

Enzo

***Question*******
I would like to merge the file containing the patient data set with the file containing
the survival probability of a reference population (population mortality
file). The population mortality file is usually stratified by sex, age and calendar year:

   | sex   _year   _age   survprob |
   |-------------------------------|
   |   1    2000     50     .99376 |
   |   1    2000     51     .99318 |
   |   2    2000     50     .99751 |
   |   2    2000     51     .99744 |

     and so on
   +-----------------------------

In this file sex, _year and _age identifies just one observation
Assume that we import this data set in the Mata matrix X.

With some simplification the patient data set looks like this:

   | id    _year    _age   sex    _d    _t0   _t |
   |----------------------------------------------|
   |   1    2000      50     1      1     1.5   2 |
   |   2    2000      54     2      0     1.5   2 |
   |   3    2000      50     2      1     1.5   2 |
   |   4    2000      78     1      1     1.5   2 |
     and so on
   +-----------------------------

In this file the values of sex, _year and _age can correspond to several
observations. We have to find all combinations of sex, _year and _age occurring in the
patient data file in just one record of the mortality data file.
Assume that we import the patient data set in the Mata matrix P.

Starting from the patient data file we have to do a m:1 merge with the
population mortality file.This way we attribute to each patient its expected survival based on the
mortality of a reference population.

   | id    _year    _age   sex    _d    _t0   _t    survprob |
   |----------------------------------------------------------|
   |   1    2000      50     1      1     1.5   2      .99376 |
   |   2    2000      54     2      0     1.5   2   something |
   |   3    2000      50     2      1     1.5   2      .99751 |
   |   4    2000      78     1      1     1.5   2   something |
     and so on
   +-----------------------------

Question.
How can we do in Mata the appropriate m:1 merge between matrix P and
matrix X by making use of the key variables sex _year and _age?

***End of question*******


***Reply from the Tech Support*******

I prefer to do my merging in Stata because the -merge- command is very easy to
setup and it has some very good error checking. If you would like to perform
this type of operation in Mata the following is an example of how you could go
about it:

--begin log--

. mata:
------------------------------------------------- mata (type end to exit) ------
: A
                 1             2             3             4
    +---------------------------------------------------------+
  1 |            1          2000            50   .9937599897  |
  2 |            1          2000            51   .9931799769  |
  3 |            2          2000            50    .997510016  |
  4 |            2          2000            51   .9974399805  |
    +---------------------------------------------------------+

: B
          1      2      3      4      5      6      7
    +--------------------------------------------------+
  1 |     1   2000     50      1      1    1.5      2  |
  2 |     2   2000     54      2      0    1.5      2  |
  3 |     3   2000     50      2      1    1.5      2  |
  4 |     4   2000     78      1      1    1.5      2  |
    +--------------------------------------------------+

: D = B,J(rows(B),1,.)

: for (i=1;i<=rows(A);i++) {

    match = rowsum(B[.,(4,2,3)]:==A[i,(1,2,3)]):==3
    rowmatch = select(1::rows(B),match)
    if (rows(rowmatch)>0) {
       D[rowmatch,8] = J(rows(rowmatch),1,A[i,4])
    }
}

: D
                 1             2             3             4             5
    +-----------------------------------------------------------------------
  1 |            1          2000            50             1             1
  2 |            2          2000            54             2             0
  3 |            3          2000            50             2             1
  4 |            4          2000            78             1             1
    +-----------------------------------------------------------------------
                 6             7             8
     -------------------------------------------+
  1            1.5             2   .9937599897  |
  2            1.5             2             .  |
  3            1.5             2    .997510016  |
  4            1.5             2             .  |
     -------------------------------------------+

: end
--------------------------------------------------------------------------------

--end of log--


***End of Reply*******


--
Enzo Coviello
Epidemiology Unit - Cancer Registry ASL BT
Piazza Umberto 1
76121 BARLETTA (BT)
Italy
mobile +39 347 5016016
tel +39 0883 577329
fax +39 0883 577288
Home +39 0883 695055

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index