Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: create many variables at once


From   Ronan Conroy <rconroy@rcsi.ie>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: create many variables at once
Date   Tue, 23 Jul 2002 09:38:21 +0100

on 22/7/02 9:01 PM, owner-statalist@hsphsun2.harvard.edu at
owner-statalist@hsphsun2.harvard.edu wrote:

> I have 19 variables that start with the prefix  me_  and 19 variables that
> start with the prefix  ppg_ . They all have other letters after the
> underscore, actually the same for each group of 19.
> 
> I have to create the following 19 new variables:        newvars=(me_
> * var1)+(ppg_ * var1) for each observation in the data set (510)
> Is there a way for me to do this in one step. If not, can you please tell me
> how?

The fact that the 19 pairs of variables can be identified is a bonus; a
variable me_xyz is paired with the variable ppg_xyz. So the quick way of
doing this is

1. Verify that the variables are in the same order. If not, issue the
-aorder- command to get them correctly ordered.

2. Use the -for- command to define two lists of variables. By default, the
lists are called X and Y (in capitals). There are ways of imposing your own
naming conventions, but you don't need that for a simple problem.

. for var me_* \ var ppg_* : gen X_Y = (X * var1) + (Y * var1)

The -for- part of the command defines two variable lists. These will be
called X and Y, so where these letters appear in the command subsequently,
Stata will substitute in the variable names.

The -generate- (abbreved to -gen-) command takes the first variable in list
X and the first in list Y and generates a new variable. Then it takes the
second items and makes another new variable.

The only slight problem is what to call the new variables so we can tell
which of the original variables they are derived from.* I used the original
variables to generate the new variable names. Stata understands X_Y as 'the
first item in the list, and underscore, and then the second item. So if the
first pair of variables is me_xyz and ppg_xyz, then the new variable will be
called. 

I could have defined a list of new variable names, but I'm lazy. Instead, I
would probably extend the command to label the new variables as they are
created (remember that -for- will execute multiple commands).

. for var me_* \ var ppg_* : gen X_Y = (X * var1) + (Y * var1) \ lab var X_Y
"X and Y combined"

The slosh (\) introduces a second command to be executed. This labels the
new variable (X_Y) with some text that identifies which variables were
combined. Again, Stata will read inside the variable label and substitute
the list items. So with me_xyz and ppg_xyz, Stata will interpret the command
as meaning 

lab var me_xyz_ppg_xyz "me_xyz and ppg_xyz combined"

Every minute you spend getting to know the -for- command and its cousins
translates into an extra day of your life spent drinking cappuccinos in the
sunshine. 


*Note: The phrase "so we can tell which of the original variables they are
derived from" ends on a preposition. For those who find this syntactically
ugly, we could rewrite it as "so we can tell which of the original variables
they are derived from, pudding face".


Ronan M Conroy (rconroy@rcsi.ie)
Lecturer in Biostatistics
Royal College of Surgeons
Dublin 2, Ireland
+353 1 402 2431 (fax 2329)

--------------------
And now, Mr President, how about the global alliance against climate change?
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index