[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: create many variables at once
on 22/7/02 9:01 PM, firstname.lastname@example.org at
> I have 19 variables that start with the prefix me_ and 19 variables that
> start with the prefix ppg_ . They all have other letters after the
> underscore, actually the same for each group of 19.
> I have to create the following 19 new variables: newvars=(me_
> * var1)+(ppg_ * var1) for each observation in the data set (510)
> Is there a way for me to do this in one step. If not, can you please tell me
The fact that the 19 pairs of variables can be identified is a bonus; a
variable me_xyz is paired with the variable ppg_xyz. So the quick way of
doing this is
1. Verify that the variables are in the same order. If not, issue the
-aorder- command to get them correctly ordered.
2. Use the -for- command to define two lists of variables. By default, the
lists are called X and Y (in capitals). There are ways of imposing your own
naming conventions, but you don't need that for a simple problem.
. for var me_* \ var ppg_* : gen X_Y = (X * var1) + (Y * var1)
The -for- part of the command defines two variable lists. These will be
called X and Y, so where these letters appear in the command subsequently,
Stata will substitute in the variable names.
The -generate- (abbreved to -gen-) command takes the first variable in list
X and the first in list Y and generates a new variable. Then it takes the
second items and makes another new variable.
The only slight problem is what to call the new variables so we can tell
which of the original variables they are derived from.* I used the original
variables to generate the new variable names. Stata understands X_Y as 'the
first item in the list, and underscore, and then the second item. So if the
first pair of variables is me_xyz and ppg_xyz, then the new variable will be
I could have defined a list of new variable names, but I'm lazy. Instead, I
would probably extend the command to label the new variables as they are
created (remember that -for- will execute multiple commands).
. for var me_* \ var ppg_* : gen X_Y = (X * var1) + (Y * var1) \ lab var X_Y
"X and Y combined"
The slosh (\) introduces a second command to be executed. This labels the
new variable (X_Y) with some text that identifies which variables were
combined. Again, Stata will read inside the variable label and substitute
the list items. So with me_xyz and ppg_xyz, Stata will interpret the command
lab var me_xyz_ppg_xyz "me_xyz and ppg_xyz combined"
Every minute you spend getting to know the -for- command and its cousins
translates into an extra day of your life spent drinking cappuccinos in the
*Note: The phrase "so we can tell which of the original variables they are
derived from" ends on a preposition. For those who find this syntactically
ugly, we could rewrite it as "so we can tell which of the original variables
they are derived from, pudding face".
Ronan M Conroy (email@example.com)
Lecturer in Biostatistics
Royal College of Surgeons
Dublin 2, Ireland
+353 1 402 2431 (fax 2329)
And now, Mr President, how about the global alliance against climate change?
* For searches and help try: