|
Note: This FAQ is for users of Stata 5, an older version of Stata. It
is not relevant for more recent versions.
Stata 5: Why does reshape give a too-many-variables error?
|
Title
|
|
Stata 5: Avoiding “too many variables” when using reshape
|
|
Author
|
James Hardin, StataCorp
|
|
Date
|
January 1996; updated January 1998
|
You type
. reshape groups year 90-95
. reshape vars inc
. reshape cons id sex age race hgt wgt shoesize hatsize ms kids addr
too many variables
r(103);
reshape cons allows a maximum of 10 variables to be specified, and
there are 11 in the above example. This problem has been resolved in the
updated version of
reshape. The new version of reshape also has a simpler
syntax and other nice features. In addition, the new version of
reshape understands the old reshape syntax, so your prior
do-files and ado-files will still work.
Click here to learn about how to
obtain the updated version of reshape.
Nevertheless, here is how to work around the limitation in the old version
of reshape:
Step 1. Create demogs.dta containing the demographic variables.
. use yourdata
. keep id sex age race hgt wgt shoesize hatsize ms kids addr
. sort id
. quietly by id: keep if _n==1
. save demogs, replace
The quietly by id: keep if _n==1 is necessary only if your data are
in the long form, but it will not hurt in any case.
In the above example, we assume that variable id is enough to
uniquely identify each observation. If two variables are required (e.g.,
hospital-id and patient-id), substitute those two variable names for
id.
Step 2. Reshape the data using "reshape cons id"
. reshape groups year 90-95
. reshape vars inc
. reshape cons id
. reshape wide or reshape long
That is, reshape normally, but note the shorter reshape cons
statement.
Step 3. Merge the demographic data
. sort id
. merge id using demogs
. keep if _merge==3
. drop _merge
In these lines, we merge back in the nonvarying characteristics from
demogs.dta.
The keep if _merge==3 is not really necessary, but we recommend it.
In the solution as given, keep if _merge==3 will do nothing because
_merge must be 3. If you form demogs.dta one day, however, and then
reshape on a subset of your data another day, the keep if
_merge==3 is important.
That is all there is to dealing with dealing with this problem.
Why the problem arises, if you care
One of the steps performed internally by reshape is a match merge,
using the reshape cons variables to match observations. While Stata
allows you to sort on any number of variables, the maximum number of
key variables in a match merge is 10. Hence, the limitation.
The reshape cons variables include those variables that (1) uniquely
identify the subjects, and (2) do not vary within subject but that you want
carried along. To fix the problem on our end, we should change
reshape so that list (1) is given with some new reshape id
command, which would continue to be limited to 10 variables—and list,
and (2) is given by reshape cons and is not limited.
|