Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: How to make a code faster - alternatives to egen var = concat(vars) ?,

From   Antoine Terracol <>
Subject   Re: st: How to make a code faster - alternatives to egen var = concat(vars) ?,
Date   Thu, 17 Jun 2010 22:04:43 +0200


You would have to define a -byable- -program-, such as:

capture program drop mymean
program define mymean, byable(recall)
syntax varname
marksample touse
su `varlist' if `touse'
local a= _byindex()
scalar mean`a'=r(mean)

sysuse auto, clear
bysort foreign rep78 : mymean price
scalar dir


On 17/06/2010 15:47, Tiago V. Pereira wrote:
Thanks, Antoine!

But for each combinations, I want to save a local containing the r(mean).
Is it possible to do that using -bysort-?


Dear statalisters,

I am working on a stata code, and I need some advice.

I have n categorical variables that assumes values equal to 0, 1 or 2.  My
objective is to summarize a continuous variable (say, age) by all possible
combinations of these categorical variables.

For example, suppose I have 5 categorical variables (x1, x2, x3, x4 and x5):

sum age if x1==0&x2==0&x3==0&x4==0&x5==0


sum age if x1==0&x2==0&x3==0&x4==0&x5==1


sum age if x1==0&x2==0&x3==0&x4==0&x5==2

and so forth.

What I am doing is the following: (1) I generate a string of the
categorical variables

egen combination = concat(x1 x2 x3 x4 x5)

(2) convert them to numeric

encode combination, gene (y)

and loop over the values of the new variable y to summarize the continuous

forvalues i = 1/`some_max_value' {

sum age if y=`i'


This naïve solution works very well for small samples (_N<1000) and small
number of categorical variables (5 to 7). But when I need investigate in a
larger sample with a larger number of categorical variables, this code is
highly inefficient (e.g. slow).

Do you have any suggestions to make this procedure faster in larger data

Thanks in advance!


*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2015 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index