st: RE: MSTDIZE

 From "Nick Cox" To Subject st: RE: MSTDIZE Date Thu, 8 Apr 2004 10:44:36 +0100

```The help for -mstdize- contains a detailed worked example,
which shows that Donnell needs a different data structure
from what he has to make use of -mstdize-. Each row total
must be repeated for every cell in that row, and similarly
for columns.

What's more, -mstdize- expects frequencies, not proportions.

Here's the worked example modernised to avoid use of the old-fashioned

. input freq age status

freq        age     status
1. 1306    1        1
2. 83      1        2
3. 0       1        3
4. 619     2        1
5. 765     2        2
6. 3       2        3
7. 263     3        1
8. 1194    3        2
9. 9       3        3
10. 173     4        1
11. 1372    4        2
12. 28      4        3
13. 171     5        1
14. 1393    5        2
15. 51      5        3
16. 159     6        1
17. 1372    6        2
18. 81      6        3
19. 208     7        1
20. 1350    7        2
21. 108     7        3
22. 1116    8        1
23. 4100    8        2
24. 2329    8        3
25. end

. gen rt = .
(24 missing values generated)

. tokenize 1412 1402 1450 1541 1681 1532 1662 7644

. qui forval i = 1/8 {
2.         replace rt = ``i'' if  age == `i'
3. }

. gen ct = .
(24 missing values generated)

. tokenize 3988 11702 2634

. qui forval i = 1/3 {
2.         replace ct = ``i'' if status == `i'
3. }

. list

+------------------------------------+
| freq   age   status     rt      ct |
|------------------------------------|
1. | 1306     1        1   1412    3988 |
2. |   83     1        2   1412   11702 |
3. |    0     1        3   1412    2634 |
4. |  619     2        1   1402    3988 |
5. |  765     2        2   1402   11702 |
|------------------------------------|
6. |    3     2        3   1402    2634 |
7. |  263     3        1   1450    3988 |
8. | 1194     3        2   1450   11702 |
9. |    9     3        3   1450    2634 |
10. |  173     4        1   1541    3988 |
|------------------------------------|
11. | 1372     4        2   1541   11702 |
12. |   28     4        3   1541    2634 |
13. |  171     5        1   1681    3988 |
14. | 1393     5        2   1681   11702 |
15. |   51     5        3   1681    2634 |
|------------------------------------|
16. |  159     6        1   1532    3988 |
17. | 1372     6        2   1532   11702 |
18. |   81     6        3   1532    2634 |
19. |  208     7        1   1662    3988 |
20. | 1350     7        2   1662   11702 |
|------------------------------------|
21. |  108     7        3   1662    2634 |
22. | 1116     8        1   7644    3988 |
23. | 4100     8        2   7644   11702 |
24. | 2329     8        3   7644    2634 |
+------------------------------------+

. mstdize freq rt ct , by(age status)

-------------------------------------
|          status
age |    1        2        3
----------+--------------------------
1 | 1325.27    86.73     0.00
2 |  615.56   783.39     3.05
3 |  253.94  1187.18     8.88
4 |  165.13  1348.55    27.32
5 |  173.41  1454.71    52.87
6 |  147.21  1308.12    76.67
7 |  202.33  1352.28   107.40
8 | 1105.16  4181.04  2357.81
-------------------------------------

There is a matrix version of -mstdize- called
-mstdizem- in the -matodd- package on SSC.

Alan Agresti explains how to use generalized
linear model software to get such estimates
in his "Categorical data analysis" text. As
I recall, the key is to use offsets.

Nick
n.j.cox@durham.ac.uk

Donnell Butler
>
> I am trying to update a 2000 two-way table using
> 2004 one-way
> information. I wanted to do it using IPF (iterative
> proportional fitting). I
> soon learned that Nick Cox created a program (MSTDIZE) that
> may be useful.
> However, I am obviously not framing the data correctly to
> obtain the desired goal.
>
> Here is a simplified version of the dilemma:
>
> (1) Imagine a two-way table of proportions:
>     HSize00   1    2  Totals
> Inc00  1    .55  .12     .67
> Inc00	 2    .20  .13     .33
> Totals      .75  .25    1.00
>
> (2) Imagine two one-way tables to be used to update the two-way table:
> Inc04    HSize04
> 1  .60   1  .65
> 2  .40   2  .35
>
> (3) To attempt MSTDIZE, I have entered the data into Stata as follows:
>
> Inc00  Hsize00 IbyH00  Inc04	HSize04
> 0.67	 0.75    0.55    0.60 	0.65
> 0.33 	 0.25    0.12    0.40 	0.35
>                0.20
>                0.13
>
> (4) So in Stata the data looks like this:
>  . list
>
>      +--------------------------------------------------+
>      | igroup00   hsize00   ibyh00   igroup04   hsize04 |
>      |--------------------------------------------------|
>   1. |      .33       .22      .13         .4       .35 |
>   2. |      .67       .78      .56         .6       .65 |
>   3. |        .         .      .19          .         . |
>   4. |        .         .      .12          .         . |
>      +--------------------------------------------------+
>
> (5) When I run MSTDIZE, this is the output:
> . mstdize   ibyh00 igroup04 hsize04, by(igroup00 hsize00)
> generate (ibyh04)
>
> ----------------------
>           |  Hsize00
>  Igroup00 |  .22   .78
> ----------+-----------
>       .33 | 0.35
>       .67 |       0.65
> ----------------------
> (2 missing values generated)
>
> (6) Well, that is not what I hoped for. I was hoping for a new table
> (ibyh04) with 4 observations (new row1/column1, new r1c2, new
> r2c1, and new
> r2c2). Instead, I ended up with 2 observations that were
> exactly the same as
> hsize04.
>
> (7) This is probably a case of not really understanding what
> MSTDIZE is
> designed to do. Does anyone have any suggestions on how I can
> get Stata (via
> MSTDIZE or another means) to obtain an IPF adjusted ibyh04
> (two-way table
> updated from original two-way and two one-ways)?
>
> (8) And, as a bonus. I gathered those proportions from tab
> hsize igroup,
> cell command. So, if anyone knows how I could easily turn
> those relative
> frequencies into a variable. I do know that the tab hsize igroup, cell
> matcell (matname) produces a 2x2 matrix of actual (not relative)
> frequencies. What I don't know is how to get relative
> frequencies in that
> matrix? Or, how to transfer or use matrices in a simpler way
> than what I did
> by hand above by transcribing the frequencies into excel for
> new data set
> variable generation?
>

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```