Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Tiago V. Pereira" <tiago.pereira@mbe.bio.br> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: How to make a code faster - alternatives to egen var = concat(vars) ?, |

Date |
Thu, 17 Jun 2010 17:56:51 +0300 (BRT) |

Thank you so much again, Antoine! Yes, this is a very efficient way! However, I could not figure out how I can save the combination of the categorical variables that a specific meanX refers to. For example, the commands sysuse auto, clear bysort foreign rep78 : mymean price scalar dir show a list of scalars containing the mean of the ith combination, but I don't know if the mean10 refers to the combination "foreign = Foreign, rep78 =4" or "foreign = Foreign, rep78 = 5" [Actually I do in this specific case if I take a look at each value from the output.] Nevertheless, assuming a very large number of categorical variables (n>10), I cannot write a loop and say that mean2451 refers to the combination x1==0 x2==0 x3==2 x4==0 x5==2 and x6==1. I want to summarize the mean of this combination in group 1 and generate a separate variable for group 2. for example bysort x1 x2 x3 x4 x5 x6 : mymean score if group==1 */ yes, -mymean- needs further amendments to have this option replace score = mean2451 if group==2&x1==0&x2==0&x3==2&x4==0&x5==2&and&x6==1 So, In this case I know that mean2451 comes from the combination x1==0&x2==0&x3==2&x4==0&x5==2&and&x6==1 from group 1 and I replace its value for all subjects from group 2 having an identical combination. This is getting tough, but you have any additional tips, I will be really very grateful! Thanks again. Tiago > Tiago, > > You would have to define a -byable- -program-, such as: > > capture program drop mymean > program define mymean, byable(recall) > syntax varname > marksample touse > su `varlist' if `touse' > local a= _byindex() > scalar mean`a'=r(mean) > end > > sysuse auto, clear > bysort foreign rep78 : mymean price > scalar dir > > Antoine > > On 17/06/2010 15:47, Tiago V. Pereira wrote: >> Thanks, Antoine! >> >> But for each combinations, I want to save a local containing the >> r(mean). >> Is it possible to do that using -bysort-? >> >> Tiago >> >> >> --------------- >> Dear statalisters, >> >> I am working on a stata code, and I need some advice. >> >> I have n categorical variables that assumes values equal to 0, 1 or 2. >> My >> objective is to summarize a continuous variable (say, age) by all >> possible >> combinations of these categorical variables. >> >> >> For example, suppose I have 5 categorical variables (x1, x2, x3, x4 and >> x5): >> >> >> sum age if x1==0&x2==0&x3==0&x4==0&x5==0 >> >> then >> >> sum age if x1==0&x2==0&x3==0&x4==0&x5==1 >> >> then >> >> sum age if x1==0&x2==0&x3==0&x4==0&x5==2 >> >> and so forth. >> >> >> What I am doing is the following: (1) I generate a string of the >> categorical variables >> >> egen combination = concat(x1 x2 x3 x4 x5) >> >> (2) convert them to numeric >> >> encode combination, gene (y) >> >> and loop over the values of the new variable y to summarize the >> continuous >> variable >> >> forvalues i = 1/`some_max_value' { >> >> sum age if y=`i' >> >> } >> >> This naïve solution works very well for small samples (_N<1000) and >> small >> number of categorical variables (5 to 7). But when I need investigate in >> a >> larger sample with a larger number of categorical variables, this code >> is >> highly inefficient (e.g. slow). >> >> Do you have any suggestions to make this procedure faster in larger data >> sets? >> >> Thanks in advance! >> >> Tiago >> >> >> >> >> >> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/statalist/faq >> * http://www.ats.ucla.edu/stat/stata/ >> > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: How to make a code faster - alternatives to egen var = concat(vars) ?,***From:*Antoine Terracol <terracol@univ-paris1.fr>

**RE: st: How to make a code faster - alternatives to egen var = concat(vars) ?,***From:*"Nick Cox" <n.j.cox@durham.ac.uk>

**References**:**Re: RE: st: How to make a code faster - alternatives to egen var = concat(vars) ?,***From:*"Tiago V. Pereira" <tiago.pereira@mbe.bio.br>

**Re: st: How to make a code faster - alternatives to egen var = concat(vars) ?,***From:*Antoine Terracol <terracol@univ-paris1.fr>

- Prev by Date:
**Re: st: RE: Question about how inclusive values are calculated in nested logit** - Next by Date:
**Re: st: Identifying unique values with codebook** - Previous by thread:
**Re: st: How to make a code faster - alternatives to egen var = concat(vars) ?,** - Next by thread:
**RE: st: How to make a code faster - alternatives to egen var = concat(vars) ?,** - Index(es):