Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: average value among differing numbers of variables

From   Ernest Berkhout <>
Subject   Re: st: average value among differing numbers of variables
Date   Thu, 17 Jul 2003 11:31:52 +0200

That looks indeed like a rather complex problem, for which i don't have a golden key. The problem with your present setup is that for instance 'egen rmean(day1-day2) will not work because flag1 is in that list as well.

My approach would be to first reshape the data to long form (which is more comfortable to most datamanipulation issues), so that you have the variables day (which should better be renamed 'temperature' or something), flag, an indicator for day 1 to 31, and an indicator for a/b/c.

From there it should be possible to construct some group-variable that indicates each group of records where temperature should be filled in and their accumulated value from which it should be calculated. In your example on the second row you have a group where day1, day2 and day3 belong together, day4 is a group and day5 is a group. Let your grouping variable assign a value of 1 to the first 3 records (day1, day2, day3), a value of 2 to the record of day4, etc.

From there you could work somethinh out using the 'by groupvar:' construct and explicit subscripting, as in "by groupvar: replace temperature=temperature[_N] if temperature<." You might want to convert accumulated temperatures to averages before this.

There are probably a lot of other ways to tackle the problem, but i think the reshape-to-long is the most important thing here.

At 06:07 17-7-2003, Radu Ban wrote:

Dear listers,

This is a data management question. The data that I'm looking at (daily U.S. weather) has the following structure.

day1 flag1 day2 flag2 day3 flag3 day4 flag4 day5 flag5 ... day31 flag31
0 s a2 a 0 s 0 s a5 a a31
0 s 0 s b3 a b4 b5 b31
c1 0 s 0 s 0 s c5 a c31

the "s" flag means that the measured element (say inches of rain) is accumulated over those days, which are assigned a 0 value, and the accumulated amount is reported in the day flagged with "a". i would like to replace the 0 value for the accumulation days with the average of the accumulated value over those days.

given the notations above, specifically, i would like to replace 0, 0, b3 (in the second row) with b3/3; 0, 0, 0, c5 (in the third row) with c5/4, and so on. note that, as in the first row there can be more than one accumulation series per row.

i figured out that each type of accumulation, a_ij(starting at day i ending at day j) must be identified, so that in the end i can use:

forval j = 2/31 {
forval i = 1/`j' {
egen daymean = rmean(day`i'-day`j') if a_`i'`j' == 1
replace day`i' = daymean
drop daymean

but i'm not sure how to define all a_ij
Ernest Berkhout
SEO Amsterdam Economics
University of Amsterdam

Room 3.08
Roetersstraat 29
1018 WB Amsterdam
The Netherlands

tel.:+ 31 20 525 1657
fax:+ 31 20 525 1686
A statistician: someone who insists
on being certain about uncertainty

*   For searches and help try:

© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index