Home  /  Resources & support  /  FAQs  /  Keeping the same variable with collapse

How can I collapse my dataset and keep the same variable labels?

Title   Keeping the same variable with collapse
Author Nicholas J. Cox, Durham University, UK

Question

I frequently use collapse on my datasets and find it frustrating that I lose all my variable labels as they are replaced by something like (mean) varname. Is it possible to save the old variable labels so that they can be attached to the new variables?

Answer

collapse replaces the dataset in memory with a new dataset of group statistics. As reported, it has its own idea of suitable variable labels for the new variables.

However, it is easy to save your variable labels before collapse so that they can be used afterwards, with or without modification.

Copy variable labels before collapse

A systematic way to do this is with a foreach loop.

 . foreach v of var * {
 .	local l`v' : variable label `v'
 .       if `"`l`v''"' == "" {
 .		local l`v' "`v'"
 . 	}
 . }

What this does, for each variable in the dataset, is to copy its variable label to a local macro. If there is no variable label, we use the variable name instead. For more information on the commands used, see [P] foreach and [P] macro, or see the tutorial in Cox (2002).

* is a wildcard for all variables in the current dataset; for other ways of abbreviating variable lists, see [U] 11.4.1 Lists of existing variables.

Attach the saved labels after collapse

In the simplest case, the new variables all have the same names as their originals. After collapse, you can then just use the old labels:

 . foreach v of var * {
 .	label var `v' `"`l`v''"'
 . }

This relabeling must be done in the same session as the collapse, as local macros do not survive beyond the end of a session.

Normally, it should matter little if the new dataset contains fewer variables than the original dataset. The price of that is creating and storing a bunch of local macros containing variable labels that are not needed subsequently. If desired, you can avoid this by using a more explicit list of variable names in place of the wildcard *, such as

 . foreach v of var a b c d e {
 .	local l`v' : variable label `v'
 .       if `"`l`v''"' == "" {
 .		local l`v' "`v'"
 . 	}
 . }

Variations on this situation are easy to manage. Suppose, for example, your convention was to name variables containing means with a prefix mean and those containing medians with a prefix med. Then you could use usubstr() to strip the prefix before referring to the saved macro,

 . foreach v of var mean* {
 .	local o = usubstr("`v'",5,.)
 .	label var `v' "mean of `l`o''"
 . }
 
 . foreach v of var med* {
 .	local o = usubstr("`v'",4,.)
 .	label var `v' "median of `l`o''"
 . }

assuming that no other variables have names beginning with these prefixes.

Reference

Cox, N. J. 2020.
Speaking Stata: Loops, again and again. Stata Journal 20: 999–1015.