Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: "table" showing summary of ~100 ternary variables?


From   Phil Schumm <pschumm@uchicago.edu>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: "table" showing summary of ~100 ternary variables?
Date   Sat, 30 Oct 2010 15:13:47 -0500

On Oct 30, 2010, at 12:40 PM, Michael Costello wrote:
I have about 100 ternary variables (0=incorrect, 1=correct, 2=No Response, .=Missing) and I would like to get a table of the responses that looks like this:

-------------------------------------------------------
|         |     Correct   Incorrect     No Response   .
----------+--------------------------------------------
     Var1 |          2          21             4     21
     Var2 |          8          19            13     18
     Var3 |         30          19             4     19
     Var4 |         18          21            47     22
     Var5 |         11          27             8     30

Maybe I'd even like to add in a proportion of correct or ratio of correct to incorrect into the table.

Is there a function to do this or something reasonable similar?


I expect that there is probably a user-written program to do this, but it's not too difficult to do from first principles. We'll start by generating a dataset similar to what you've described:


    set obs 100
    gen id = _n
    lab def mylab 1 "Correct" 2 "Incorrect" 3 "No response" 4 "Missing"
    set seed 123456789
    forv i = 1/5 {
        gen byte y`i' = cond(runiform()<0.95, ceil(runiform()*3), 4)
        lab val y`i' mylab
    }
    replace y5 = 2 if y5==1


This generates 5 variables y1-y5, each taking values 1, 2, 3 or 4. You'll notice I've used 4 for the "missing" values here, only because that'll give you more flexibility for where the corresponding column appears in the final table (i.e., if we left missing values as ".", we wouldn't be able to place a summary column after that one). As you can see, each variable takes values 1-3 with probability 1/3 each, and is missing in 5% of cases. Note that I've also modified y5 so that it doesn't contain any correct responses, because you want to make sure your code can handle such cases.

Now, the first trick is to reshape your data into long form:


    reshape long y, i(id) j(Var)


Note that we could have used -stack- here instead, and in fact, that would have been more convenient if our variables weren't named systematically as they are here. Since we want to add a column for the proportion of correct responses, we'll add a corresponding observation for each variable whose values we'll fill in later (if you wanted to add multiple summary columns, you could add additional observations here):


    set obs `=c(N) + 1'
    replace y = 5 if _n == _N
    lab def mylab 5 "Prop. correct", add


Next, we'll generate our cell counts by using -collapse-, but first we'll use -fillin- to make sure that all of our cells are represented (even if their observed counts are zero):


    fillin Var y
    collapse (count) cnt=id, by(Var y)


Note that we are also using -fillin- here to propagate the observation we added above to hold the proportion of correct responses across all of the variables.

Now, we'll compute the proportion correct (as a proportion of values 1-3) for each variable:


    egen correct = max((y==1)*cnt), by(Var)
    egen nonmiss = sum(inlist(y,1,2,3)*cnt), by(Var)
    bys Var (y): replace cnt = correct[1] / nonmiss[1] if y == 5


And finally, we can use -tabdisp- to create our table:


    . tabdisp Var y if !mi(Var), c(cnt) format(%9.2g)

--------------------------------------------------------------------
         |                             y
Var | Correct Incorrect No response Missing Prop. correct ----- +-------------------------------------------------------------- 1 | 34 37 25 4 . 35 2 | 26 35 33 6 . 28 3 | 25 32 40 3 . 26 4 | 32 27 35 6 . 34 5 | 0 62 34 4 0 --------------------------------------------------------------------


where I've used my text editor to narrow some of the columns so that the table doesn't get wrapped by people's mailers. Of course, there are several ways we might embellish this -- this merely illustrates one possible strategy for achieving the desired result.


-- Phil

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index