Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Antoine Terracol <terracol@univ-paris1.fr> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: How to make a code faster - alternatives to egen var = concat(vars) ?, |

Date |
Thu, 17 Jun 2010 23:23:59 +0200 |

Antoine On 17/06/2010 23:17, Antoine Terracol wrote:

This might not be the most efficient way, but it works: capture program drop mymean program define mymean, byable(recall) syntax varname marksample touse local groupname "" foreach var of local _byvars { local a =`var'[_byn1()] if "`a'"=="." { local a "missing" } local groupname "`groupname'`a'" } su `varlist' if `touse' scalar mean`groupname'=r(mean) end scalar drop _all sysuse auto, clear bysort foreign rep78 : mymean price scalar dir Antoine On 17/06/2010 16:56, Tiago V. Pereira wrote:Thank you so much again, Antoine! Yes, this is a very efficient way! However, I could not figure out how I can save the combination of the categorical variables that a specific meanX refers to. For example, the commands sysuse auto, clear bysort foreign rep78 : mymean price scalar dir show a list of scalars containing the mean of the ith combination, but I don't know if the mean10 refers to the combination "foreign = Foreign, rep78 =4" or "foreign = Foreign, rep78 = 5" [Actually I do in this specific case if I take a look at each value from the output.] Nevertheless, assuming a very large number of categorical variables (n>10), I cannot write a loop and say that mean2451 refers to the combination x1==0 x2==0 x3==2 x4==0 x5==2 and x6==1. I want to summarize the mean of this combination in group 1 and generate a separate variable for group 2. for example bysort x1 x2 x3 x4 x5 x6 : mymean score if group==1 */ yes, -mymean- needs further amendments to have this option replace score = mean2451 if group==2&x1==0&x2==0&x3==2&x4==0&x5==2&and&x6==1 So, In this case I know that mean2451 comes from the combination x1==0&x2==0&x3==2&x4==0&x5==2&and&x6==1 from group 1 and I replace its value for all subjects from group 2 having an identical combination. This is getting tough, but you have any additional tips, I will be really very grateful! Thanks again. TiagoTiago, You would have to define a -byable- -program-, such as: capture program drop mymean program define mymean, byable(recall) syntax varname marksample touse su `varlist' if `touse' local a= _byindex() scalar mean`a'=r(mean) end sysuse auto, clear bysort foreign rep78 : mymean price scalar dir Antoine On 17/06/2010 15:47, Tiago V. Pereira wrote:Thanks, Antoine! But for each combinations, I want to save a local containing the r(mean). Is it possible to do that using -bysort-? Tiago --------------- Dear statalisters, I am working on a stata code, and I need some advice. I have n categorical variables that assumes values equal to 0, 1 or 2. My objective is to summarize a continuous variable (say, age) by all possible combinations of these categorical variables. For example, suppose I have 5 categorical variables (x1, x2, x3, x4 and x5): sum age if x1==0&x2==0&x3==0&x4==0&x5==0 then sum age if x1==0&x2==0&x3==0&x4==0&x5==1 then sum age if x1==0&x2==0&x3==0&x4==0&x5==2 and so forth. What I am doing is the following: (1) I generate a string of the categorical variables egen combination = concat(x1 x2 x3 x4 x5) (2) convert them to numeric encode combination, gene (y) and loop over the values of the new variable y to summarize the continuous variable forvalues i = 1/`some_max_value' { sum age if y=`i' } This naïve solution works very well for small samples (_N<1000) and small number of categorical variables (5 to 7). But when I need investigate in a larger sample with a larger number of categorical variables, this code is highly inefficient (e.g. slow). Do you have any suggestions to make this procedure faster in larger data sets? Thanks in advance! Tiago * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/* * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/* * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**Re: RE: st: How to make a code faster - alternatives to egen var = concat(vars) ?,***From:*"Tiago V. Pereira" <tiago.pereira@mbe.bio.br>

**Re: st: How to make a code faster - alternatives to egen var = concat(vars) ?,***From:*Antoine Terracol <terracol@univ-paris1.fr>

**Re: st: How to make a code faster - alternatives to egen var = concat(vars) ?,***From:*"Tiago V. Pereira" <tiago.pereira@mbe.bio.br>

**Re: st: How to make a code faster - alternatives to egen var = concat(vars) ?,***From:*Antoine Terracol <terracol@univ-paris1.fr>

- Prev by Date:
**Re: st: How to make a code faster - alternatives to egen var = concat(vars) ?,** - Next by Date:
**RE: st: RE: Question about how inclusive values are calculated in nested logit** - Previous by thread:
**Re: st: How to make a code faster - alternatives to egen var = concat(vars) ?,** - Next by thread:
**st: Profile analysis in Stata 11** - Index(es):