Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <njcoxstata@gmail.com> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | st: graphing estimates and confidence intervals |
Date | Fri, 10 May 2013 13:52:23 +0100 |
Two recent threads both centred on graphical display of estimates together with confidence intervals: The start points were http://www.stata.com/statalist/archive/2013-05/msg00293.html http://www.stata.com/statalist/archive/2013-05/msg00310.html This post is intended mainly as a kind of broad-brush overview of the question. It also adds some detail omitted from those threads. In turn, naturally, please comment if I miss anything of importance or interest. The main idea is that while estimates can be plotted easily with -twoway scatter- or -graph dot- you are in practice going to find it difficult to show confidence intervals directly other than by -twoway rcap-. (It's only convention that might inhibit you from using -twoway rspike- instead.) It follows that you need to focus on using -twoway-. Bluntly, -graph dot- (or -graph bar- for those so inclined) is a dead end here. There are two broad strategies. 1. You can build your own command by assembling a composite -twoway- call using -scatter- for the point estimates and -rcap- for the intervals. This can be combined, with increasing difficulty, with showing different results for different groups on one or more levels. An example to explain levels here: using sex as a classifier gives one level and using race or region or both would add one or two more levels. With one level you will presumably just want to plot your grouping variable on one of the axes. With two or more levels, using -by()- is the easiest approach to add an extra level of classification, but just adding spacing can be as or more effective. Sometimes with -by()- there is too much scaffolding and too much loss of real estate. If you have any group variable that is string, things are easier if you -encode- it or use -egen, group()- to produce an equivalent numeric variable with value labels. 2. Alternatively, you can look for a command that does all that for you. The commands differ in whether they expect that you already have the estimates (point and interval) or they will undertake to do that calculation for you. The more standard the calculation, the more likely that a canned command already exists. -serrbar- is an old official command which doesn't do much but may match simple needs. My impression is that it is little known, but that may be because it is little mentioned, and that in turn because it is of little use. -dotplot- is an official command which supports display of mean +/- SD. It's worth knowing that, but it's unlikely to be what you want under this heading. -ciplot- is an oldish user-written command (SSC, Nick Cox). Its basic idea is to call up -ci- repeatedly and then plot the results. There is support for multiple groups and multiple variables. If it doesn't go as far as you want, the bad news is that I have no interest in developing it, but it's more flexible than any official command I can recall. For example, sysuse auto ciplot foreign , binomial jeffreys by(rep78) shows how you can reach through to -ci-. -stripplot- (SSC, Nick Cox) was mentioned in recent posts. Its display of confidence intervals is based on exactly the same idea as -ciplot-, to call up -ci- for the calculations. Its philosophy is to show the raw data too, although nothing beyond an ectoplasmic sense of my mild disapproval stops you suppressing the data display with e.g -ms(none)-. -eclplot- (SSC also SJ, Roger Newson) is another user-written command, and one characteristically well thought out, documented and maintained. It's not competing because it is focused on a different case, in which you already have estimates and confidence limits to hand; other programs of Roger's are of much help in assembling and analysing such results. I want to flag strongly the scope for using -statsby- in this territory, which I wrote up in SJ-10-1 gr0045 . . . . . . . . . . . . . Speaking Stata: The statsby strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox Q1/10 SJ 10(1):143--151 (no commands) demonstrates the use of statsby to prepare a reduced dataset for subsequent graphing .pdf freely available at http://www.stata-journal.com/sjpdf.html?articlenum=gr0045 Confidence intervals are a major example. (That paper was inspired by a single throw-away remark by Vince Wiggins. It was one of many occasions in which deciding to write about something made me aware of something in Stata I was underestimating.) I would also like to mention a general discussion of graphical technique in SJ-8-2 gr0034 . . . . . . . . . . Speaking Stata: Between tables and graphs (help labmask, seqvar if installed) . . . . . . . . . . . . N. J. Cox Q2/08 SJ 8(2):269--289 outlines techniques for producing table-like graphs .pdf freely available at http://www.stata-journal.com/sjpdf.html?articlenum=gr0034 Nick njcoxstata@gmail.com * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/