Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: How to make a code faster - alternatives to egen var = concat(vars) ?,

From	"Tiago V. Pereira" <[email protected]>
To	[email protected]
Subject	Re: st: How to make a code faster - alternatives to egen var = concat(vars) ?,
Date	Thu, 17 Jun 2010 17:56:51 +0300 (BRT)

Thank you so much again, Antoine!

Yes, this is a very efficient way! However, I could not figure out how I
can save the combination of the categorical variables that a specific
meanX refers to.

For example, the commands

 sysuse auto, clear
 bysort foreign rep78 : mymean price
 scalar dir

show a list of scalars containing the mean of the ith combination, but I
don't know if the mean10 refers to the combination  "foreign = Foreign,
rep78 =4"   or  "foreign = Foreign, rep78 = 5"


[Actually I do in this specific case if I take a look at each value from
the output.]

Nevertheless, assuming a very large number of categorical variables
(n>10), I cannot write a loop and say that mean2451 refers to the
combination x1==0 x2==0 x3==2 x4==0 x5==2 and x6==1. I want to summarize
the mean of this combination in group 1 and generate a separate variable
for group 2.


for example

bysort x1 x2 x3 x4 x5 x6 : mymean score if group==1

*/ yes, -mymean- needs further amendments to have this option

replace score = mean2451 if group==2&x1==0&x2==0&x3==2&x4==0&x5==2&and&x6==1


So, In this case I know that mean2451 comes from the combination
x1==0&x2==0&x3==2&x4==0&x5==2&and&x6==1 from group 1 and I replace its
value for all subjects from group 2 having an identical combination.


This is getting tough, but you have any additional tips, I will be really
very grateful!

Thanks again.

Tiago





















> Tiago,
>
> You would have to define a -byable- -program-, such as:
>
> capture program drop mymean
> program define mymean, byable(recall)
> syntax varname
> marksample touse
> su `varlist' if `touse'
> local a= _byindex()
> scalar mean`a'=r(mean)
> end
>
> sysuse auto, clear
> bysort foreign rep78 : mymean price
> scalar dir
>
> Antoine
>
> On 17/06/2010 15:47, Tiago V. Pereira wrote:
>> Thanks, Antoine!
>>
>> But for each combinations, I want to save a local containing the
>> r(mean).
>> Is it possible to do that using -bysort-?
>>
>> Tiago
>>
>>
>> ---------------
>> Dear statalisters,
>>
>> I am working on a stata code, and I need some advice.
>>
>> I have n categorical variables that assumes values equal to 0, 1 or 2.
>> My
>> objective is to summarize a continuous variable (say, age) by all
>> possible
>> combinations of these categorical variables.
>>
>>
>> For example, suppose I have 5 categorical variables (x1, x2, x3, x4 and
>> x5):
>>
>>
>> sum age if x1==0&x2==0&x3==0&x4==0&x5==0
>>
>> then
>>
>> sum age if x1==0&x2==0&x3==0&x4==0&x5==1
>>
>> then
>>
>> sum age if x1==0&x2==0&x3==0&x4==0&x5==2
>>
>> and so forth.
>>
>>
>> What I am doing is the following: (1) I generate a string of the
>> categorical variables
>>
>> egen combination = concat(x1 x2 x3 x4 x5)
>>
>> (2) convert them to numeric
>>
>> encode combination, gene (y)
>>
>> and loop over the values of the new variable y to summarize the
>> continuous
>> variable
>>
>> forvalues i = 1/`some_max_value' {
>>
>> sum age if y=`i'
>>
>> }
>>
>> This naïve solution works very well for small samples (_N<1000) and
>> small
>> number of categorical variables (5 to 7). But when I need investigate in
>> a
>> larger sample with a larger number of categorical variables, this code
>> is
>> highly inefficient (e.g. slow).
>>
>> Do you have any suggestions to make this procedure faster in larger data
>> sets?
>>
>> Thanks in advance!
>>
>> Tiago
>>
>>
>>
>>
>>
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: How to make a code faster - alternatives to egen var = concat(vars) ?,
  - From: Antoine Terracol <[email protected]>
- RE: st: How to make a code faster - alternatives to egen var = concat(vars) ?,
  - From: "Nick Cox" <[email protected]>

References:
- Re: RE: st: How to make a code faster - alternatives to egen var = concat(vars) ?,
  - From: "Tiago V. Pereira" <[email protected]>
- Re: st: How to make a code faster - alternatives to egen var = concat(vars) ?,
  - From: Antoine Terracol <[email protected]>

Prev by Date: Re: st: RE: Question about how inclusive values are calculated in nested logit
Next by Date: Re: st: Identifying unique values with codebook
Previous by thread: Re: st: How to make a code faster - alternatives to egen var = concat(vars) ?,
Next by thread: RE: st: How to make a code faster - alternatives to egen var = concat(vars) ?,
Index(es):
- Date
- Thread