Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: RE: Histogram, by(var, total)


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: RE: Histogram, by(var, total)
Date   Mon, 8 Jun 2009 10:29:05 +0100

I should point out that treatment of missing values should get some
consideration. 

The code below includes observations with missing values on the -by()-
variable in the total category, but excludes them from the categories
shown separately. 

If you wanted to exclude observations with missing values from the
total, you could specify 

expand 2 if !missing(foreign) 

Nick 
n.j.cox@durham.ac.uk 


-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox
Sent: 07 June 2009 18:15
To: statalist@hsphsun2.harvard.edu
Subject: st: RE: Histogram, by(var, total)

I don't know a direct way to do this, but some trickery produces the
same result. It is best explained by example. 

sysuse auto, clear 
preserve 
local which = _N + 1 
expand 2 
replace foreign = -1 in `which'/L 
label def origin -1 "Total", add 
histogram mpg, by(foreign) 
restore 

Key points: 

1. -expand 2- doubles the dataset. The second half that is a copy of the
first half is to be used to work out the "Total" category. 

2. If the -by()- variable is integer with value labels, the extra
observations should be assigned an integer value for the -by()- that is
_lower_ than any other observed. You then need to define an appropriate
value label. (In this case, I know that the other values are 0 and 1.) 

3. You do _not_ then specify the -total- suboption, as you are using
your own subterfuge to replicate it. 

4. -preserve- and -restore- are optional, but note otherwise that this
is a major change to the dataset. 

Note that Stata in no sense "knows" that the extra category is a total
category, but that shouldn't matter. 

Now what would be done if the -by()- variable were string? At first
sight, we have a problem here because "Total" would not necessarily sort
first in a set of alphanumeric categories. We could use some label like
"All observations" but what then if we have "Aardvarks" as a category? 

Here is a better trick (not to rule out the possibility of an even
better trick):

sysuse auto, clear 
preserve 
decode foreign, gen(Foreign) 
local which = _N + 1 
expand 2 
replace Foreign = " Total " in `which'/L 
histogram mpg, by(Foreign) 
restore

The -decode- is just to produce an example with an appropriate string
variable. In practice it will exist already. Notice the two small parts
to the trick: 

(a) Putting a space before the "Total" makes it more likely to sort to
the beginning of any set of categories. The space " " is a character
too. 

(b) Putting a space afterwards ensures that the "Total" is still centred
on the graph (if you care about that). 

But we need not worry too much about the string case. If you can't get
the order you want, map the strings to integers with value labels. 

Naturally, nothing here is distinctive to histograms. 

Nick 
n.j.cox@durham.ac.uk 

*From:* Thoma, Marie E.

I would like to use the "histogram yvar, by(xvar, total)" command to 
produce a histogram of the total and stratified variable.  However, in 
Stata, it places the "total" graph as the last graph and I would like to

have it as the first graph (before the stratified graphs). 
 
Does anyone know how to change this either using this command or another

way to accomplish this same layout?


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index