Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: atribute values between lines of a variable/standardize data

From	Nick Cox <njcoxstata@gmail.com>
To	statalist@hsphsun2.harvard.edu
Subject	Re: st: atribute values between lines of a variable/standardize data
Date	Wed, 30 Mar 2011 13:25:44 +0100

Here are three ways to do it.

1.

egen mean_y = mean(y / (year == 2003 | year == 2004)), by(group)

2.

egen mean_y = mean(cond(year == 2003 | year == 2004, y, .)), by(group)

3.

egen mean_y = mean(y) if year == 2003 | year == 2004, by(group)
bysort group (mean_y) : replace mean_y = mean_y[1]

Let's take these backwards:

#3 "spreads" the non-missing results of the first command to replace
missings. It hinges delicately on the sort order: sorting non-missings
to first position in each group is needed so that we can use the first
observation in each group.

#2 and #1 hinge on the fact that Stata ignores missings in calculating
means, but in the absence of an -if- condition assigns those means to
all values.

#2 and #1 also hinge on the fact that -egen, mean()- can take
_expressions_, which can be (much) more complicated than variable
names.

#1 is a trick I stumbled on a few weeks ago, and it appears not to be
widely known. The trick is that dividing by zero produces missing,
which is exactly what is needed when it happens.

#1 will be written up as a Tip for the Staa Journal.

Nick

On Wed, Mar 30, 2011 at 12:44 PM, Lucas Ferreira Mation
<lucasmation@gmail.com> >

My data is divided into several groups, with many observations in
> each. For each group, I want to "standardize" my data based on a
> specific subset of observations (in this case, divide the actual
> values of Y by the means of a specific subgroup of Y). How can I do
> that?
> In the example bellow, for each group, I need to "standardize" the
> values of Y based on the average of Y of the years 2003 and 2004. I
> managed to create such means for those observations, but I don´t know
> how to extend that value to the rest of the observations of that
> subgroup.
>
> "
> input year str20 group Y
> 2001 G1  57
> 2002 G1  61
> 2003 G1  54
> 2004 G1  60
> 2005 G1  64
> 2001 G2  1543
> 2002 G2  1700
> 2003 G2  1532
> 2004 G2  1659
> 2005 G2  1800
> end
> egen denominator=mean(Y) if(year==2003 | year==2004), by(group)
> *this creates the desired mean(denominator for the "standardization")
> *but only for the observations in years 2003 and 2004.
> *how do I attribute that to the rest of the observations in that
> group? Having this, I would run:
> gen Y_standardized=Y/denominator
>
> "
> In my actual data is quite long with many groups and many observations
> (months) per group.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: atribute values between lines of a variable/standardize data
  - From: Lucas Ferreira Mation <lucasmation@gmail.com>

References:
- st: atribute values between lines of a variable/standardize data
  - From: Lucas Ferreira Mation <lucasmation@gmail.com>

Prev by Date: st: insheet problem (Stata 10.1)
Next by Date: Re: st: RE: Fixed Effects Form of Quantile Regression
Previous by thread: st: atribute values between lines of a variable/standardize data
Next by thread: Re: st: atribute values between lines of a variable/standardize data
Index(es):
- Date
- Thread