How can I collapse my dataset and keep the same variable labels?
|
Title
|
|
Keeping the same variable with collapse
|
|
Author
|
Nicholas J. Cox, Durham University, UK
|
|
Date
|
September 2002, updated February 2003
|
Question
I frequently use collapse on my datasets and find it frustrating that
I lose all my variable labels as they are replaced by something like
(mean) varname. Is it possible to save the old variable labels
so that they can be attached to the new variables?
Answer
collapse
replaces the dataset in memory with a new dataset of group statistics. As
reported, it has its own idea of suitable variable labels for the new
variables.
However, it is easy to save your variable labels before collapse so
that they can be used afterwards, with or without modification.
Copy variable labels before collapse
A systematic way to do this is with a foreach loop.
. foreach v of var * {
. local l`v' : variable label `v'
. if `"`l`v''"' == "" {
. local l`v' "`v'"
. }
. }
What this does, for each variable in the dataset, is to copy its variable
label to a local macro. If there is no variable label, we use the variable
name instead. For more information on the commands used, see the online
help for
foreach and forc
macro, or see the
tutorial in Cox (2002).
* is a wildcard for all variables in the current dataset; for other
ways of abbreviating variable lists, see [U] 11.4.1 Lists of existing
variables.
Attach the saved labels after collapse
In the simplest case, the new variables all have the same names as their
originals. After collapse, you can then just use the old labels:
. foreach v of var * {
. label var `v' "`l`v''"
. }
This relabeling must be done in the same session as the collapse, as
local macros do not survive beyond the end of a session.
Normally, it should matter little if the new dataset contains
fewer variables than the original dataset. The price of that is creating and
storing a bunch of local macros containing variable labels that are not
needed subsequently. If desired, you can avoid this by using a more explicit
list of variable names in place of the wildcard *, such as
. foreach v of var a b c d e {
. local l`v' : variable label `v'
. if `"`l`v''"' == "" {
. local l`v' "`v'"
. }
. }
Variations on this situation are easy to manage. Suppose, for example,
your convention was to name variables containing means with a prefix
mean and those containing medians with a prefix med. Then you
could use
substr() to strip
the prefix before referring to the saved macro,
. foreach v of var mean* {
. local o = substr("`v'",5,.)
. label var `v' "mean of `l`o''"
. }
. foreach v of var med* {
. local o = substr("`v'",4,.)
. label var `v' "median of `l`o''"
. }
assuming that no other variables have names beginning with these prefixes.
Reference
- Cox, N. J. 2002.
-
Speaking Stata: How to face lists with fortitude.
Stata Journal 2: 202–222.
|