[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: -ciplot- and multiple by options

From   "Nick Cox" <>
To   <>
Subject   st: RE: -ciplot- and multiple by options
Date   Mon, 8 Oct 2007 19:48:25 +0100

The Statalist FAQ advises, as good practice, saying where
non-official commands come from. -ciplot- is a user-written
command on SSC. 

The help for -ciplot- starts like this: 

Plots of confidence intervals 

        ciplot varlist [if exp] [in range] [weight] [, by(byvar)
                 missing ci_options [horizontal | vertical]
                 rcapopts(rcap_options) scatter_options]


    -ciplot- produces a display of means and confidence intervals.  Means
    are shown by point symbols and intervals by capped bars.  -ci- is
    used for the calculations.  -aweights- and -fweights- are allowed; see
    help on -weights-.


    -by()- defines a grouping variable, which is treated as categorical,
        not measured.


    scatter_options refers to options of graph twoway scatter.

and inside -ciplot- there is a standard -syntax- statement 

	syntax varlist(numeric) [if] [in] [aweight fweight]        ///
	[ , BY(varname) LEVel(integer $S_level) Poisson BINomial   ///
	Exposure(varname) EXAct Jeffreys Wilson Agresti Total      ///
	Total2(str asis) MISSing INCLusive                         ///
	YTItle(str) XTItle(str)                                    ///
	HORizontal VERTical RCAPopts(str asis) plot(str asis) * ]

As far as -ciplot- is concerned -by()- needs a varname (not varlist), 
as Sergiy reports. 

So why can he get away with specifying two -by()- options? It
is not in general illegal to specify an option twice, and as from 
Stata 8 much use is made of that in graphics. 

In this case, what happens with two -by()- options is that the 
first becomes the -by()- specified explicitly in the -syntax- and
the second gets smuggled past the border wrapped up in the 
wildcard *. 

-ciplot, by()- was intended to echo -ci, by()- when the latter
was part of -ci-'s standard syntax. (Now the overt syntax is 
to use -by ...: ci- but a -by()- option is still allowed.) 
This option never had a graphical role. 

As the help above implies, -ciplot- does two main things, 
fire up -ci- to get confidence intervals and then fire
up -graph- to plot them. If you go with the auto data 

ciplot price, by(rep78) by(foreign)

then -by(rep78)- is used in calculating 
confidence intervals and -by(foreign) is used
in plotting them. The results will look odd 
and in effect you will get a silly answer 
to a silly question. 

I don't feel especially inclined, however, to regard this as a 
bug, as any user doing this is trying something way 
beyond the documentation and should expect the 
possibility of strange consequences. There may be nested
categorical variables for which this is
an undocumented way of getting a reasonable graph. 

I suspect that -ciplot- has climbed to the end 
of its branch of the evolutionary tree. A more
versatile command is -stripplot- from SSC. 
Alternatively, use -egen, group()- before
using -ciplot-. 


Sergiy Radyakin
> In -ciplot- I can not specify multiple variables in the "by()" option,
> but I can specify multiple "by()" options each with one variable. I
> wonder if ciplot is supposed to work like this?
> The results look quite strangely [Stata v9.2, Windows]:
>    sysuse auto
>    ciplot price, by(foreign) by(rep78)
> or even like this:
>    ciplot price, by(rep78) by(foreign)
> Is there any deep statistical motivation for not allowing such graphs?
> or is it just bug?
> In any case the above graphs look misleading, when confronted 
> with the data:
>    table rep78 foreign, c(mean price sd price)
> Am I missing something?

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index