Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: RE: -distplot- of multiple variables with variable used as symbol


From   Roger Harbord <Roger.Harbord@bristol.ac.uk>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: RE: -distplot- of multiple variables with variable used as symbol
Date   Wed, 04 Dec 2002 12:39:22 -0000

--On 04 December 2002 12:07 +0000 Nick Cox <n.j.cox@durham.ac.uk> wrote:

Roger Harbord
I've encountered some strange behaviour in Nick Cox's
program -distplot-
published in STB-51.  It won't let me plot the distribution
functions of
more than one variable AND use another variable to label
the points,
although I can do either alone.

I'll use the automobile data to demonstrate so others can
try to replicate
the problem, although there's little purpose to the
commands in this
context - with my real data I'm comparing two distributions
graphically and
hoped to label the points so I can identify outliers at the
same time
(mainly for presentation purposes):

-------------------------------------------
. use "C:\Program Files\StataSE\auto.dta"
(1978 Automobile Data)

. distplot length, c(J) s([rep78])

. * (worked fine)

. distplot length displacement, c(JJ) s(oo)

. * (also worked fine)

. distplot length displacement, c(JJ) s([rep78]o)
variable rep78 not found
r(111);

. * Just to check I've got the syntax right in the s() option :
. graph length displacement weight, c(JJ) s([rep78]o)

. * (worked fine though a mess of a graph!)

. * describe my setup :

. which distplot
c:\ado\stbplus\d\distplot.ado
*! version 1.5.0 NJC 24 March 1999        [STB-51: gr41]

Anyone any idea what's going wrong here?  That error
message seems most
bizarre to me - how can Stata suddenly not be able to find rep78 ??
No mystery here: although this behaviour is not
documented, it is a straightforward consequence
of the method used, as may be seen by looking at
the code.

-distplot- temporarily -preserve-s and then restructures
the data when you want a plot of more than one variable.
The rationale for doing this was, presumably, to allow
a plot of more than 20 variables. Occasionally one has
a large bundle of variables all of the same kind and
it is desired to see a spaghetti plot showing their
collective pattern.

The side-effect, however, is that some variables in the
data set may disappear from sight while the graph is being produced.

A work-around would be for me to add an option specifying
variables which must be carried along, or for me to
parse the -sy()- argument and automatically identify
variables which the user clearly needs for the purpose.

I'll put that on a to-think-about list.

Nick
n.j.cox@durham.ac.uk

So a possible work-around for users such as me in the meantime would be to :

-reshape long- and use the by() option to -distplot- instead of multiple variables.

Just tried this and it seems to work fine.

Thanks,
Roger.

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index