Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: How to make a code faster - alternatives to egen var =           concat(vars) ?,
From 
 
Antoine Terracol <[email protected]> 
To 
 
[email protected] 
Subject 
 
Re: st: How to make a code faster - alternatives to egen var =           concat(vars) ?, 
Date 
 
Thu, 17 Jun 2010 23:23:59 +0200 
Actually it works if the group variables are all positive integers. 
otherwise you would have to deal with the "." for decimal values. 
Doable, but a bit tedious...
Antoine
On 17/06/2010 23:17, Antoine Terracol wrote:
This might not be the most efficient way, but it works:
capture program drop mymean
program define mymean, byable(recall)
syntax varname
marksample touse
local groupname ""
foreach var of local _byvars {
local a =`var'[_byn1()]
if "`a'"=="." {
local a "missing"
}
local groupname "`groupname'`a'"
}
su `varlist' if `touse'
scalar mean`groupname'=r(mean)
end
scalar drop _all
sysuse auto, clear
bysort foreign rep78 : mymean price
scalar dir
Antoine
On 17/06/2010 16:56, Tiago V. Pereira wrote:
Thank you so much again, Antoine!
Yes, this is a very efficient way! However, I could not figure out how I
can save the combination of the categorical variables that a specific
meanX refers to.
For example, the commands
sysuse auto, clear
bysort foreign rep78 : mymean price
scalar dir
show a list of scalars containing the mean of the ith combination, but I
don't know if the mean10 refers to the combination "foreign = Foreign,
rep78 =4" or "foreign = Foreign, rep78 = 5"
[Actually I do in this specific case if I take a look at each value from
the output.]
Nevertheless, assuming a very large number of categorical variables
(n>10), I cannot write a loop and say that mean2451 refers to the
combination x1==0 x2==0 x3==2 x4==0 x5==2 and x6==1. I want to summarize
the mean of this combination in group 1 and generate a separate variable
for group 2.
for example
bysort x1 x2 x3 x4 x5 x6 : mymean score if group==1
*/ yes, -mymean- needs further amendments to have this option
replace score = mean2451 if
group==2&x1==0&x2==0&x3==2&x4==0&x5==2&and&x6==1
So, In this case I know that mean2451 comes from the combination
x1==0&x2==0&x3==2&x4==0&x5==2&and&x6==1 from group 1 and I replace its
value for all subjects from group 2 having an identical combination.
This is getting tough, but you have any additional tips, I will be really
very grateful!
Thanks again.
Tiago
Tiago,
You would have to define a -byable- -program-, such as:
capture program drop mymean
program define mymean, byable(recall)
syntax varname
marksample touse
su `varlist' if `touse'
local a= _byindex()
scalar mean`a'=r(mean)
end
sysuse auto, clear
bysort foreign rep78 : mymean price
scalar dir
Antoine
On 17/06/2010 15:47, Tiago V. Pereira wrote:
Thanks, Antoine!
But for each combinations, I want to save a local containing the
r(mean).
Is it possible to do that using -bysort-?
Tiago
---------------
Dear statalisters,
I am working on a stata code, and I need some advice.
I have n categorical variables that assumes values equal to 0, 1 or 2.
My
objective is to summarize a continuous variable (say, age) by all
possible
combinations of these categorical variables.
For example, suppose I have 5 categorical variables (x1, x2, x3, x4 and
x5):
sum age if x1==0&x2==0&x3==0&x4==0&x5==0
then
sum age if x1==0&x2==0&x3==0&x4==0&x5==1
then
sum age if x1==0&x2==0&x3==0&x4==0&x5==2
and so forth.
What I am doing is the following: (1) I generate a string of the
categorical variables
egen combination = concat(x1 x2 x3 x4 x5)
(2) convert them to numeric
encode combination, gene (y)
and loop over the values of the new variable y to summarize the
continuous
variable
forvalues i = 1/`some_max_value' {
sum age if y=`i'
}
This naïve solution works very well for small samples (_N<1000) and
small
number of categorical variables (5 to 7). But when I need
investigate in
a
larger sample with a larger number of categorical variables, this code
is
highly inefficient (e.g. slow).
Do you have any suggestions to make this procedure faster in larger
data
sets?
Thanks in advance!
Tiago
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/