Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: RE: Calculating variable-averages of time-spans (laid out case by case via variables)


From   Nick Cox <[email protected]>
To   "'[email protected]'" <[email protected]>
Subject   RE: st: RE: Calculating variable-averages of time-spans (laid out case by case via variables)
Date   Fri, 11 Mar 2011 14:57:20 +0000

Nice in style, but I don't think that this will satisfy the problem as originally formulated, as a solution is required  that does not assume regular spacing in time. But this looks modifiable to meet that. 

Nick 
[email protected] 

Robert Picard

Here's another one liner:

bysort idcode (year): egen m = ///
   sum(((wks_work + wks_work[_n-1]) / 2) * (year == burnoutyear))

On Fri, Mar 11, 2011 at 2:25 PM, Nick Cox <[email protected]> wrote:
> Here is another way to do it:
>
>
> egen mean_workload =
> mean(cond((year == burnoutyear) | (year == burnoutyear - 1), wks_work, .)
> , by(idcode)
>
> Nick
> [email protected]
>
> Nick Cox
>
> I have no idea what a "burnout" is here, but I guess I don't need to know. Just curious, though....
>
> The context is that in this dataset -idcode- and -year- are joint identifiers.
>
> You want to identify pairs of observations that (1) have the same -idcode- and (2) are the burnout-year or the one before:
>
> gen tag = (year == burnoutyear) | (year == (burnoutyear - 1))
>
> Then you need to average within those groups
>
> egen mean_workload = mean(wks_work) if tag, by(idcode)
>
> and spread to all values within that -idcode-
>
> bysort idcode (mean_workload) : replace mean_workload = mean_workload[1]
>
> This would work too (two lines instead of three)
>
> gen tag = (year == burnoutyear) | (year == (burnoutyear - 1))
> egen mean_workload2 = mean(wks_work/tag) , by(idcode)
>
> That's slightly cute or perverse, according to taste. If you divide by 0, the result is missing and will be ignored by -egen, mean()-. Dividing by 1 manifestly leaves values as they are.
>
> A possible reduction to one line follows, as an exercise!
>
> Note that what you asked for was the mean across all the observations that satisfied the criteria. You didn't spell out to Stata that you wanted the calculation done separately by -idcode- (as above).
>
> Looking at the data suggests that something else was wrong too with what you asked. Note that -egen- doesn't guarantee to keep the same -sort- order within its operations, just to return the data to the same -sort- order as when it started. So, it is unwise to assume otherwise.
>
> I note that the mean of a sum is that sum, not the mean of the constituent values.
>
> Nick
> [email protected]
>
> Wolfgang Feudenheim
>
> I am currently working on an analysis of economic data in OECD-countries. For each country, I separately fixed a key-year. For this specific year and the two preceding years I want to read out averages of economic indicators such as "GDP/capita" etc.
>
> In the following, I try to illustrate my problems with the help of the example dataset
> "National Longitudinal Survey.  Young Women 14-26 years of age in 1968" by pretending I was interested in the average workload before the occurance of a burnout (sorry, couldn't make up any more positive scenario ...). Unfortunately, the time data is not available on a year-by-year-basis but in irregular steps. Therefore, I just observe one specific year and its preceding year. I am running my analysis on
>
> -Stata/IC 11.1 for Mac (64-bit Intel)
> -Born 04 Nov 2010
>
> Here is the code:
>
> -use http://www.stata-press.com/data/r11/nlswork.dta, clear
> -*Add Burnout-Values to Dataset*
> -gen burnoutyear=.
> -replace burnoutyear=73 if idcode==1
> -replace burnoutyear=72 if idcode==2
>
> -*Generate Variable for all observations of one person (idcode) that presents the average of weeks worked in burnout-year and*
> -*burnout-preceding year*
> -egen avworkload_b=mean(wks_work[_n]+wks_work[_n-1]) if (year==burnoutyear)&(idcode[_n]==idcode[_n-1])
>
>
> The problems that occur are the following:
>
> 1. For both, "idcode==1" and "idcode==2", the wrong result, namely "25" is displayed. The average values should however be "27 and "17.5".
>
> 2. The variable "avworkload_b" is only inserted into the dataset for the year indicated by "burnoutyear" for the respective "idcode". I want to have this value displayed for all years of each "idcode".
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index