Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: graph the percent missing fairly efficiently

From   "Nick Cox" <>
To   <>
Subject   st: RE: graph the percent missing fairly efficiently
Date   Fri, 15 Nov 2002 17:17:58 -0000

> I want to produce a report for all variables of the percent
> of observations
> that are missing.  I know about codebook and inspect, and
> 'nmissing' (from the
> web).  However, if possible, what is wanted is a percent of missing
> observations for each variable.   I could use 'tabulate'
> but ideally we want a
> pie chart showing the number missing.    For this I could
> use 'graph' and
> 'pie' but to make 'pie' work, I think I need to turn all
> the missings [recode
> or mvencode] to a recognizable coded missing like '-99' or
> something similar.
> This is one option, but perhaps there is an another option
> (that won't require
> variable recoding or generation, or if it does, certain
> solutions are fairly
> efficient).   Perhaps some variant of egen or egenodd is
> applicable, or
> perhaps the matter is simplified with a different type of
> graph chart.

Richard Goldstein has suggested various tabular outputs.

You are aware of -nmissing- (STB-49, STB-60). One possibility
is to scoop up the output of that and show it in a graph:
producing percents from the counts is naturally easy.

I am not clear what kind of pie graph you want, but
in any case I suspect there are better displays.
(Eyeballing 100 pies simultaneously is fairly ineffective.)

Also how many variables have you got? 10s 100s 1000s 10000s?
Presumably not many, as it is difficult to think of
a display which will not be unreadable unless the number
of variables is small. However, one possibility is to
omit all variables for which no values are missing.

On the assumption that there are no fewer observations
than variables, two quick graphs can be knitted in this
way, assuming no variables called -missing- -present-
or -varname-:

qui d, s
local nvars = r(k)
unab vars : *
gen missing = .
gen str1 varname = ""
local i = 1
qui foreach v of var `vars' {
	replace varname = "`v'" in `i'
	count if missing(`v')
	replace missing = r(N) in `i'
	local i = `i' + 1
replace missing = 100 * missing / _N
gen present = 100 - missing
hbar present missing in 1/`nvars', l(varname) xla(0(10)100) border
hbar present missing if missing in 1/`nvars', l(varname) xla(0(10)100)

Watch for mailers wrapping the two lines with -hbar-.

-hbar- is not in official Stata. If you don't have
it you need to install it from SSC.


*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index