[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
Ernest Berkhout <ernestb@seo.fee.uva.nl> |

To |
statalist@hsphsun2.harvard.edu, statalist@hsphsun2.harvard.edu |

Subject |
Re: st: average value among differing numbers of variables |

Date |
Thu, 17 Jul 2003 11:31:52 +0200 |

That looks indeed like a rather complex problem, for which i don't have a golden key. The problem with your present setup is that for instance 'egen rmean(day1-day2) will not work because flag1 is in that list as well.

My approach would be to first reshape the data to long form (which is more comfortable to most datamanipulation issues), so that you have the variables day (which should better be renamed 'temperature' or something), flag, an indicator for day 1 to 31, and an indicator for a/b/c.

From there it should be possible to construct some group-variable that indicates each group of records where temperature should be filled in and their accumulated value from which it should be calculated. In your example on the second row you have a group where day1, day2 and day3 belong together, day4 is a group and day5 is a group. Let your grouping variable assign a value of 1 to the first 3 records (day1, day2, day3), a value of 2 to the record of day4, etc.

From there you could work somethinh out using the 'by groupvar:' construct and explicit subscripting, as in "by groupvar: replace temperature=temperature[_N] if temperature<." You might want to convert accumulated temperatures to averages before this.

There are probably a lot of other ways to tackle the problem, but i think the reshape-to-long is the most important thing here.

At 06:07 17-7-2003, Radu Ban wrote:

Dear listers,

This is a data management question. The data that I'm looking at (daily U.S. weather) has the following structure.

day1 flag1 day2 flag2 day3 flag3 day4 flag4 day5 flag5 ... day31 flag31

0 s a2 a 0 s 0 s a5 a a31

0 s 0 s b3 a b4 b5 b31

c1 0 s 0 s 0 s c5 a c31

the "s" flag means that the measured element (say inches of rain) is accumulated over those days, which are assigned a 0 value, and the accumulated amount is reported in the day flagged with "a". i would like to replace the 0 value for the accumulation days with the average of the accumulated value over those days.

given the notations above, specifically, i would like to replace 0, 0, b3 (in the second row) with b3/3; 0, 0, 0, c5 (in the third row) with c5/4, and so on. note that, as in the first row there can be more than one accumulation series per row.

i figured out that each type of accumulation, a_ij(starting at day i ending at day j) must be identified, so that in the end i can use:

forval j = 2/31 {

forval i = 1/`j' {

egen daymean = rmean(day`i'-day`j') if a_`i'`j' == 1

replace day`i' = daymean

drop daymean

}

}

but i'm not sure how to define all a_ij

Ernest Berkhout SEO Amsterdam Economics University of Amsterdam Room 3.08 Roetersstraat 29 1018 WB Amsterdam The Netherlands tel.:+ 31 20 525 1657 fax:+ 31 20 525 1686 http://www.seo.nl =========================== A statistician: someone who insists on being certain about uncertainty =========================== * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Re: xtsum -- meaning of within "min" and "max"***From:*"Scott Merryman" <smerryman@kc.rr.com>

**st: xtsum -- meaning of within "min" and "max"***From:*david reinstein <daaronr@yahoo.com>

**st: average value among differing numbers of variables***From:*Radu Ban <rban@nber.org>

- Prev by Date:
**st: average value among differing numbers of variables** - Next by Date:
**Re[2]: st: How to do trend test of OR?** - Previous by thread:
**st: average value among differing numbers of variables** - Next by thread:
**st: RE: average value among differing numbers of variables** - Index(es):

© Copyright 1996–2015 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |