Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: "table" showing summary of ~100 ternary variables?

From	Phil Schumm <[email protected]>
To	[email protected]
Subject	Re: st: "table" showing summary of ~100 ternary variables?
Date	Sat, 30 Oct 2010 15:13:47 -0500

On Oct 30, 2010, at 12:40 PM, Michael Costello wrote:

I have about 100 ternary variables (0=incorrect, 1=correct, 2=NoResponse, .=Missing) and I would like to get a table of theresponses that looks like this:
-------------------------------------------------------
|         |     Correct   Incorrect     No Response   .
----------+--------------------------------------------
     Var1 |          2          21             4     21
     Var2 |          8          19            13     18
     Var3 |         30          19             4     19
     Var4 |         18          21            47     22
     Var5 |         11          27             8     30
Maybe I'd even like to add in a proportion of correct or ratio ofcorrect to incorrect into the table.
Is there a function to do this or something reasonable similar?

I expect that there is probably a user-written program to do this, butit's not too difficult to do from first principles. We'll start bygenerating a dataset similar to what you've described:



    set obs 100
    gen id = _n
    lab def mylab 1 "Correct" 2 "Incorrect" 3 "No response" 4 "Missing"
    set seed 123456789
    forv i = 1/5 {
        gen byte y`i' = cond(runiform()<0.95, ceil(runiform()*3), 4)
        lab val y`i' mylab
    }
    replace y5 = 2 if y5==1

This generates 5 variables y1-y5, each taking values 1, 2, 3 or 4.You'll notice I've used 4 for the "missing" values here, only becausethat'll give you more flexibility for where the corresponding columnappears in the final table (i.e., if we left missing values as ".", wewouldn't be able to place a summary column after that one). As youcan see, each variable takes values 1-3 with probability 1/3 each, andis missing in 5% of cases. Note that I've also modified y5 so that itdoesn't contain any correct responses, because you want to make sureyour code can handle such cases.


Now, the first trick is to reshape your data into long form:


    reshape long y, i(id) j(Var)

Note that we could have used -stack- here instead, and in fact, thatwould have been more convenient if our variables weren't namedsystematically as they are here. Since we want to add a column forthe proportion of correct responses, we'll add a correspondingobservation for each variable whose values we'll fill in later (if youwanted to add multiple summary columns, you could add additionalobservations here):



    set obs `=c(N) + 1'
    replace y = 5 if _n == _N
    lab def mylab 5 "Prop. correct", add

Next, we'll generate our cell counts by using -collapse-, but firstwe'll use -fillin- to make sure that all of our cells are represented(even if their observed counts are zero):



    fillin Var y
    collapse (count) cnt=id, by(Var y)

Note that we are also using -fillin- here to propagate the observationwe added above to hold the proportion of correct responses across allof the variables.

Now, we'll compute the proportion correct (as a proportion of values1-3) for each variable:



    egen correct = max((y==1)*cnt), by(Var)
    egen nonmiss = sum(inlist(y,1,2,3)*cnt), by(Var)
    bys Var (y): replace cnt = correct[1] / nonmiss[1] if y == 5


And finally, we can use -tabdisp- to create our table:


    . tabdisp Var y if !mi(Var), c(cnt) format(%9.2g)

--------------------------------------------------------------------

         |                             y

Var | Correct Incorrect No response Missing Prop.correct-----+--------------------------------------------------------------1 | 34 37 25 4 .352 | 26 35 33 6 .283 | 25 32 40 3 .264 | 32 27 35 6 .345 | 0 62 344 0--------------------------------------------------------------------

where I've used my text editor to narrow some of the columns so thatthe table doesn't get wrapped by people's mailers. Of course, thereare several ways we might embellish this -- this merely illustratesone possible strategy for achieving the desired result.



-- Phil

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- RE: st: "table" showing summary of ~100 ternary variables?
  - From: Nick Cox <[email protected]>
- Re: st: "table" showing summary of ~100 ternary variables?
  - From: Phil Schumm <[email protected]>

References:
- st: "table" showing summary of ~100 ternary variables?
  - From: Michael Costello <[email protected]>

Prev by Date: Re: st: R: death rate calculation - (flag: Stata 9/2 SE)
Next by Date: re: Re: Re: st: gaps in the Hodrick-Prescott filter
Previous by thread: st: "table" showing summary of ~100 ternary variables?
Next by thread: Re: st: "table" showing summary of ~100 ternary variables?
Index(es):
- Date
- Thread