Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <njcoxstata@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: atribute values between lines of a variable/standardize data |
Date | Wed, 30 Mar 2011 13:25:44 +0100 |
Here are three ways to do it. 1. egen mean_y = mean(y / (year == 2003 | year == 2004)), by(group) 2. egen mean_y = mean(cond(year == 2003 | year == 2004, y, .)), by(group) 3. egen mean_y = mean(y) if year == 2003 | year == 2004, by(group) bysort group (mean_y) : replace mean_y = mean_y[1] Let's take these backwards: #3 "spreads" the non-missing results of the first command to replace missings. It hinges delicately on the sort order: sorting non-missings to first position in each group is needed so that we can use the first observation in each group. #2 and #1 hinge on the fact that Stata ignores missings in calculating means, but in the absence of an -if- condition assigns those means to all values. #2 and #1 also hinge on the fact that -egen, mean()- can take _expressions_, which can be (much) more complicated than variable names. #1 is a trick I stumbled on a few weeks ago, and it appears not to be widely known. The trick is that dividing by zero produces missing, which is exactly what is needed when it happens. #1 will be written up as a Tip for the Staa Journal. Nick On Wed, Mar 30, 2011 at 12:44 PM, Lucas Ferreira Mation <lucasmation@gmail.com> > My data is divided into several groups, with many observations in > each. For each group, I want to "standardize" my data based on a > specific subset of observations (in this case, divide the actual > values of Y by the means of a specific subgroup of Y). How can I do > that? > In the example bellow, for each group, I need to "standardize" the > values of Y based on the average of Y of the years 2003 and 2004. I > managed to create such means for those observations, but I don´t know > how to extend that value to the rest of the observations of that > subgroup. > > " > input year str20 group Y > 2001 G1 57 > 2002 G1 61 > 2003 G1 54 > 2004 G1 60 > 2005 G1 64 > 2001 G2 1543 > 2002 G2 1700 > 2003 G2 1532 > 2004 G2 1659 > 2005 G2 1800 > end > egen denominator=mean(Y) if(year==2003 | year==2004), by(group) > *this creates the desired mean(denominator for the "standardization") > *but only for the observations in years 2003 and 2004. > *how do I attribute that to the rest of the observations in that > group? Having this, I would run: > gen Y_standardized=Y/denominator > > " > In my actual data is quite long with many groups and many observations > (months) per group. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/