Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: atribute values between lines of a variable/standardize data


From   Lucas Ferreira Mation <[email protected]>
To   [email protected]
Subject   Re: st: atribute values between lines of a variable/standardize data
Date   Wed, 30 Mar 2011 10:17:00 -0300

Dear Nick,
great solution. Once again, thank you very much.
Lucas

On Wed, Mar 30, 2011 at 9:25 AM, Nick Cox <[email protected]> wrote:
>
> Here are three ways to do it.
>
> 1.
>
> egen mean_y = mean(y / (year == 2003 | year == 2004)), by(group)
>
> 2.
>
> egen mean_y = mean(cond(year == 2003 | year == 2004, y, .)), by(group)
>
> 3.
>
> egen mean_y = mean(y) if year == 2003 | year == 2004, by(group)
> bysort group (mean_y) : replace mean_y = mean_y[1]
>
> Let's take these backwards:
>
> #3 "spreads" the non-missing results of the first command to replace
> missings. It hinges delicately on the sort order: sorting non-missings
> to first position in each group is needed so that we can use the first
> observation in each group.
>
> #2 and #1 hinge on the fact that Stata ignores missings in calculating
> means, but in the absence of an -if- condition assigns those means to
> all values.
>
> #2 and #1 also hinge on the fact that -egen, mean()- can take
> _expressions_, which can be (much) more complicated than variable
> names.
>
> #1 is a trick I stumbled on a few weeks ago, and it appears not to be
> widely known. The trick is that dividing by zero produces missing,
> which is exactly what is needed when it happens.
>
> #1 will be written up as a Tip for the Staa Journal.
>
> Nick
>
> On Wed, Mar 30, 2011 at 12:44 PM, Lucas Ferreira Mation
> <[email protected]> >
>
> My data is divided into several groups, with many observations in
> > each. For each group, I want to "standardize" my data based on a
> > specific subset of observations (in this case, divide the actual
> > values of Y by the means of a specific subgroup of Y). How can I do
> > that?
> > In the example bellow, for each group, I need to "standardize" the
> > values of Y based on the average of Y of the years 2003 and 2004. I
> > managed to create such means for those observations, but I don´t know
> > how to extend that value to the rest of the observations of that
> > subgroup.
> >
> > "
> > input year str20 group Y
> > 2001 G1  57
> > 2002 G1  61
> > 2003 G1  54
> > 2004 G1  60
> > 2005 G1  64
> > 2001 G2  1543
> > 2002 G2  1700
> > 2003 G2  1532
> > 2004 G2  1659
> > 2005 G2  1800
> > end
> > egen denominator=mean(Y) if(year==2003 | year==2004), by(group)
> > *this creates the desired mean(denominator for the "standardization")
> > *but only for the observations in years 2003 and 2004.
> > *how do I attribute that to the rest of the observations in that
> > group? Having this, I would run:
> > gen Y_standardized=Y/denominator
> >
> > "
> > In my actual data is quite long with many groups and many observations
> > (months) per group.
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index