Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: -distplot- of multiple variables with variable used as symbol

From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   st: RE: -distplot- of multiple variables with variable used as symbol
Date   Wed, 4 Dec 2002 12:07:31 -0000

Roger Harbord
> I've encountered some strange behaviour in Nick Cox's 
> program -distplot- 
> published in STB-51.  It won't let me plot the distribution 
> functions of 
> more than one variable AND use another variable to label 
> the points, 
> although I can do either alone.
> I'll use the automobile data to demonstrate so others can 
> try to replicate 
> the problem, although there's little purpose to the 
> commands in this 
> context - with my real data I'm comparing two distributions 
> graphically and 
> hoped to label the points so I can identify outliers at the 
> same time 
> (mainly for presentation purposes):
> -------------------------------------------
> . use "C:\Program Files\StataSE\auto.dta"
> (1978 Automobile Data)
> . distplot length, c(J) s([rep78])
> . * (worked fine)
> . distplot length displacement, c(JJ) s(oo)
> . * (also worked fine)
> . distplot length displacement, c(JJ) s([rep78]o)
> variable rep78 not found
> r(111);
> . * Just to check I've got the syntax right in the s() option :
> . graph length displacement weight, c(JJ) s([rep78]o)
> . * (worked fine though a mess of a graph!)
> . * describe my setup :
> . which distplot
> c:\ado\stbplus\d\distplot.ado
> *! version 1.5.0 NJC 24 March 1999        [STB-51: gr41]
> Anyone any idea what's going wrong here?  That error 
> message seems most 
> bizarre to me - how can Stata suddenly not be able to find rep78 ??

No mystery here: although this behaviour is not 
documented, it is a straightforward consequence 
of the method used, as may be seen by looking at 
the code. 

-distplot- temporarily -preserve-s and then restructures 
the data when you want a plot of more than one variable. 
The rationale for doing this was, presumably, to allow 
a plot of more than 20 variables. Occasionally one has 
a large bundle of variables all of the same kind and 
it is desired to see a spaghetti plot showing their 
collective pattern. 

The side-effect, however, is that some variables in the 
data set may disappear from sight while the graph is being produced. 

A work-around would be for me to add an option specifying 
variables which must be carried along, or for me to 
parse the -sy()- argument and automatically identify 
variables which the user clearly needs for the purpose. 

I'll put that on a to-think-about list. 

[email protected] 

*   For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index