Note: This FAQ is for users of Stata 5, an older version of Stata. It is not relevant for more recent versions.
|Title||Stata 5: Avoiding “too many variables” when using reshape|
|Author||James Hardin, StataCorp|
|Date||January 1996; updated January 1998|
. reshape groups year 90-95 . reshape vars inc . reshape cons id sex age race hgt wgt shoesize hatsize ms kids addr too many variables r(103);
reshape cons allows a maximum of 10 variables to be specified, and there are 11 in the above example. This problem has been resolved in the updated version of reshape. The new version of reshape also has a simpler syntax and other nice features. In addition, the new version of reshape understands the old reshape syntax, so your prior do-files and ado-files will still work.
Nevertheless, here is how to work around the limitation in the old version of reshape:
. use yourdata . keep id sex age race hgt wgt shoesize hatsize ms kids addr . sort id . quietly by id: keep if _n==1 . save demogs, replace
The quietly by id: keep if _n==1 is necessary only if your data are in the long form, but it will not hurt in any case.
In the above example, we assume that variable id is enough to uniquely identify each observation. If two variables are required (e.g., hospital-id and patient-id), substitute those two variable names for id.
. reshape groups year 90-95 . reshape vars inc . reshape cons id . reshape wide or reshape long
That is, reshape normally, but note the shorter reshape cons statement.
. sort id . merge id using demogs . keep if _merge==3 . drop _merge
In these lines, we merge back in the nonvarying characteristics from demogs.dta.
The keep if _merge==3 is not really necessary, but we recommend it. In the solution as given, keep if _merge==3 will do nothing because _merge must be 3. If you form demogs.dta one day, however, and then reshape on a subset of your data another day, the keep if _merge==3 is important.
That is all there is to dealing with dealing with this problem.
One of the steps performed internally by reshape is a match merge, using the reshape cons variables to match observations. While Stata allows you to sort on any number of variables, the maximum number of key variables in a match merge is 10. Hence, the limitation.
The reshape cons variables include those variables that (1) uniquely identify the subjects, and (2) do not vary within subject but that you want carried along. To fix the problem on our end, we should change reshape so that list (1) is given with some new reshape id command, which would continue to be limited to 10 variables—and list, and (2) is given by reshape cons and is not limited.