Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: How to make a code faster - alternatives to egen var = concat(vars) ?

From	Antoine Terracol <[email protected]>
To	[email protected]
Subject	Re: st: How to make a code faster - alternatives to egen var = concat(vars) ?
Date	Thu, 17 Jun 2010 21:04:16 +0200

If you just need summary statistics, you could use -bysort-:

bysort x1 x2 x3 x4... : su age

Antoine

On 17/06/2010 14:51, Tiago V. Pereira wrote:

Dear statalisters,

I am working on a stata code, and I need some advice.

I have n categorical variables that assumes values equal to 0, 1 or 2.  My
objective is to summarize a continuous variable (say, age) by all possible
combinations of these categorical variables.


For example, suppose I have 5 categorical variables (x1, x2, x3, x4 and x5):


sum age if x1==0&x2==0&x3==0&x4==0&x5==0

then

sum age if x1==0&x2==0&x3==0&x4==0&x5==1

then

sum age if x1==0&x2==0&x3==0&x4==0&x5==2

and so forth.


What I am doing is the following: (1) I generate a string of the
categorical variables

egen combination = concat(x1 x2 x3 x4 x5)

(2) convert them to numeric

encode combination, gene (y)

and loop over the values of the new variable y to summarize the continuous
variable

forvalues i = 1/`some_max_value' {

sum age if y=`i'

}

This naïve solution works very well for small samples (_N<1000) and small
number of categorical variables (5 to 7). But when I need investigate in a
larger sample with a larger number of categorical variables, this code is
highly inefficient (e.g. slow).

Do you have any suggestions to make this procedure faster in larger data
sets?

Thanks in advance!

Tiago




*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: How to make a code faster - alternatives to egen var = concat(vars) ?
  - From: "Tiago V. Pereira" <[email protected]>

Prev by Date: RE: st: Meta analysis of studies with varying duration of follow-up
Next by Date: st: About GLM, robust
Previous by thread: st: How to make a code faster - alternatives to egen var = concat(vars) ?
Next by thread: RE: st: Meta analysis of studies with varying duration of follow-up
Index(es):
- Date
- Thread