Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: MSTDIZE


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: MSTDIZE
Date   Thu, 8 Apr 2004 10:44:36 +0100

The help for -mstdize- contains a detailed worked example, 
which shows that Donnell needs a different data structure 
from what he has to make use of -mstdize-. Each row total 
must be repeated for every cell in that row, and similarly 
for columns.  

What's more, -mstdize- expects frequencies, not proportions. 

Here's the worked example modernised to avoid use of the old-fashioned
-for- (see concurrent thread). 

. input freq age status

          freq        age     status
  1. 1306    1        1
  2. 83      1        2 
  3. 0       1        3
  4. 619     2        1 
  5. 765     2        2
  6. 3       2        3
  7. 263     3        1
  8. 1194    3        2
  9. 9       3        3
 10. 173     4        1
 11. 1372    4        2
 12. 28      4        3
 13. 171     5        1
 14. 1393    5        2
 15. 51      5        3
 16. 159     6        1
 17. 1372    6        2
 18. 81      6        3
 19. 208     7        1
 20. 1350    7        2
 21. 108     7        3
 22. 1116    8        1
 23. 4100    8        2
 24. 2329    8        3
 25. end

. gen rt = . 
(24 missing values generated)

. tokenize 1412 1402 1450 1541 1681 1532 1662 7644

. qui forval i = 1/8 { 
  2.         replace rt = ``i'' if  age == `i' 
  3. } 

. gen ct = .
(24 missing values generated)

. tokenize 3988 11702 2634 

. qui forval i = 1/3 { 
  2.         replace ct = ``i'' if status == `i' 
  3. } 

. list 

     +------------------------------------+
     | freq   age   status     rt      ct |
     |------------------------------------|
  1. | 1306     1        1   1412    3988 |
  2. |   83     1        2   1412   11702 |
  3. |    0     1        3   1412    2634 |
  4. |  619     2        1   1402    3988 |
  5. |  765     2        2   1402   11702 |
     |------------------------------------|
  6. |    3     2        3   1402    2634 |
  7. |  263     3        1   1450    3988 |
  8. | 1194     3        2   1450   11702 |
  9. |    9     3        3   1450    2634 |
 10. |  173     4        1   1541    3988 |
     |------------------------------------|
 11. | 1372     4        2   1541   11702 |
 12. |   28     4        3   1541    2634 |
 13. |  171     5        1   1681    3988 |
 14. | 1393     5        2   1681   11702 |
 15. |   51     5        3   1681    2634 |
     |------------------------------------|
 16. |  159     6        1   1532    3988 |
 17. | 1372     6        2   1532   11702 |
 18. |   81     6        3   1532    2634 |
 19. |  208     7        1   1662    3988 |
 20. | 1350     7        2   1662   11702 |
     |------------------------------------|
 21. |  108     7        3   1662    2634 |
 22. | 1116     8        1   7644    3988 |
 23. | 4100     8        2   7644   11702 |
 24. | 2329     8        3   7644    2634 |
     +------------------------------------+

. mstdize freq rt ct , by(age status)

-------------------------------------
          |          status          
      age |    1        2        3   
----------+--------------------------
        1 | 1325.27    86.73     0.00
        2 |  615.56   783.39     3.05
        3 |  253.94  1187.18     8.88
        4 |  165.13  1348.55    27.32
        5 |  173.41  1454.71    52.87
        6 |  147.21  1308.12    76.67
        7 |  202.33  1352.28   107.40
        8 | 1105.16  4181.04  2357.81
-------------------------------------

There is a matrix version of -mstdize- called 
-mstdizem- in the -matodd- package on SSC. 

Alan Agresti explains how to use generalized 
linear model software to get such estimates 
in his "Categorical data analysis" text. As 
I recall, the key is to use offsets. 

Nick 
n.j.cox@durham.ac.uk 

Donnell Butler
> 
> I am trying to update a 2000 two-way table using 
> 2004 one-way
> information. I wanted to do it using IPF (iterative 
> proportional fitting). I
> soon learned that Nick Cox created a program (MSTDIZE) that 
> may be useful.
> However, I am obviously not framing the data correctly to 
> obtain the desired goal.
> 
> Here is a simplified version of the dilemma:
> 
> (1) Imagine a two-way table of proportions:
>     HSize00   1    2  Totals
> Inc00  1    .55  .12     .67
> Inc00	 2    .20  .13     .33
> Totals      .75  .25    1.00
> 
> (2) Imagine two one-way tables to be used to update the two-way table:
> Inc04    HSize04
> 1  .60   1  .65
> 2  .40   2  .35
> 
> (3) To attempt MSTDIZE, I have entered the data into Stata as follows:
> 
> Inc00  Hsize00 IbyH00  Inc04	HSize04
> 0.67	 0.75    0.55    0.60 	0.65
> 0.33 	 0.25    0.12    0.40 	0.35
>                0.20
>                0.13
> 
> (4) So in Stata the data looks like this:
>  . list
> 
>      +--------------------------------------------------+
>      | igroup00   hsize00   ibyh00   igroup04   hsize04 |
>      |--------------------------------------------------|
>   1. |      .33       .22      .13         .4       .35 |
>   2. |      .67       .78      .56         .6       .65 |
>   3. |        .         .      .19          .         . |
>   4. |        .         .      .12          .         . |
>      +--------------------------------------------------+
> 
> (5) When I run MSTDIZE, this is the output:
> . mstdize   ibyh00 igroup04 hsize04, by(igroup00 hsize00) 
> generate (ibyh04)
> 
> ----------------------
>           |  Hsize00
>  Igroup00 |  .22   .78
> ----------+-----------
>       .33 | 0.35
>       .67 |       0.65
> ----------------------
> (2 missing values generated)
> 
> (6) Well, that is not what I hoped for. I was hoping for a new table
> (ibyh04) with 4 observations (new row1/column1, new r1c2, new 
> r2c1, and new
> r2c2). Instead, I ended up with 2 observations that were 
> exactly the same as
> hsize04.
> 
> (7) This is probably a case of not really understanding what 
> MSTDIZE is
> designed to do. Does anyone have any suggestions on how I can 
> get Stata (via
> MSTDIZE or another means) to obtain an IPF adjusted ibyh04 
> (two-way table
> updated from original two-way and two one-ways)?
> 
> (8) And, as a bonus. I gathered those proportions from tab 
> hsize igroup,
> cell command. So, if anyone knows how I could easily turn 
> those relative
> frequencies into a variable. I do know that the tab hsize igroup, cell
> matcell (matname) produces a 2x2 matrix of actual (not relative)
> frequencies. What I don't know is how to get relative 
> frequencies in that
> matrix? Or, how to transfer or use matrices in a simpler way 
> than what I did
> by hand above by transcribing the frequencies into excel for 
> new data set
> variable generation?
> 

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index