[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: "dirty trick": metadata in extended missing value labels

From   "Nick Cox" <>
To   <>
Subject   st: RE: "dirty trick": metadata in extended missing value labels
Date   Mon, 16 Nov 2009 15:27:56 -0000

The Stata philosophy here, as indeed often elsewhere, is that it
provides low-level tools so that you can use them as you wish for a
variety of higher level purposes. 

In particular, Stata clearly has no concept of a variable that should be
displayed in percent terms. That's entirely a user preference. 

To me, the natural low-level tool here for recording such a preference
is that of characteristics. What you did worked for you, but I'd rather
reserve missing value support for missing values. Positively, using
characteristics is an example of attaching information to the variable,
exactly as you wish. 

You could define characteristics like this 

char foreign[showpc] "percent"

and then in a loop condition on such a characteristic being found 


foreach v of var <varlist> {
	if "`: char `v'[showpc]'" == "percent" {
	else {
		<whatever else>

Note that it is not an error to refer to a non-existent characteristic.
It is just treated as if it were an empty string. So, I don't need to
define this characteristic when I don't need it. 


the following is a "dirty" but - at least for me - useful trick:

I produce a lot of graphs in batch mode. Layout usually needs a lot of
tweaking (titles, labels, formats).
So I had to write many "foreach" loops with 6, 7, 8 or more parallel
lists to specify individual layout parameters.

One example: Some metric var (length) (not really metric but integer in
auto.dta) and a 0/1-var (foreign):

sysuse auto
foreach x of var length foreign {
graph bar `x',over(rep78) blabel(bar)
sleep 2000

The 0/1-mean is better displayed as a proportion. To make it look good,
I would multiply foreign by 100, label the axis "Percentage", write
"Percentage" into the title and set the label format to %3.1f.

It would be nice to be able to attach such display information to the
variable, so one could take these meta-parameters from the dataset
instead of specifying them by hand each time. There seems to be no
regular way to do so. 
As a workaround for this, labels for extended missing values (.a, .b, .c
... .z) came to my mind which can be set for all numerical variables. I
never use any more than ".a" or ".b" so why not store some information
in the value label of some (by me) unused missing value like ".l"?

The following code stores some basic information on type of display and
label format to value label ".l" of variables. Later two graphs are
produced using this information. 

*** Create example dataset
sysuse auto
*** Create metadata codes 
foreach var of varlist _all {
// only for numerical vars
local type : type `var' 
if inlist("`type'", "byte", "int", "float", "real", "double") == 0
local form "m21" // default : display as mean, label format 2.1

*** Example 1: m: display as mean, label format 3.0
if inlist("`var'", "length") == 1 local form "m30"

*** Example 2: p: display as percentage, format 3.1
if inlist("`var'", "foreign") == 1 local form "p31"

*** Each var gets a new value label (templbl`var'). Existing value
labels are copied.
local lbl`var' : value label `var'
if "`lbl`var''" == "" {
cap label drop templbl`var'
label define templbl`var' .l "`form'"
label values `var' templbl`var'
else {
cap label drop templbl`var' 
label copy `lbl`var'' templbl`var'  
label define templbl`var' .l "`form'" , add
label values `var' templbl`var'

*** Now set up the graphs:
local varover "rep78"
local varlab : variable label `varover'
foreach var of varlist length foreign {
local varlab2 : variable label `var'
local how :  label  templbl`var' .l // ".l" label content into a local
gen xvar = `var'  // in order not to alter the original var, xvar is
used in the graph 
if substr("`how'",1,1) == "p" replace xvar = `var' * 100 // multiply
with 100 if var displays percentage 
if substr("`how'",1,1) == "m" local value = "Mean" // Label for titles 
if substr("`how'",1,1) == "p" local value = "Percentage" // Label for
local form = "%" + substr("`how'",2,1) + "." + substr("`how'",3,1) + "f"
// local for label formatting
graph bar xvar, over(`varover') title("`varlab2' over `varlab'
(`value')") ytitle("`value'") blabel(bar,format(`form'))
sleep 2000
drop xvar

*   For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index