[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Nick Cox" <n.j.cox@durham.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: RE: average value among differing numbers of variables |

Date |
Thu, 17 Jul 2003 12:56:25 +0100 |

Radu Ban > This is a data management question. The data that I'm > looking at (daily > U.S. weather) has the following structure. > > day1 flag1 day2 flag2 day3 flag3 day4 > flag4 day5 flag5 ... day31 flag31 > 0 s a2 a 0 s 0 > s a5 a a31 > 0 s 0 s b3 a b4 > b5 b31 > c1 0 s 0 s 0 > s c5 a c31 > > the "s" flag means that the measured element (say inches of > rain) is > accumulated over those days, which are assigned a 0 value, and the > accumulated amount is reported in the day flagged with "a". > i would like to > replace the 0 value for the accumulation days with the > average of the > accumulated value over those days. > > given the notations above, specifically, i would like to > replace 0, 0, b3 > (in the second row) with b3/3; 0, 0, 0, c5 (in the third > row) with c5/4, > and so on. note that, as in the first row there can be more > than one > accumulation series per row. > > i figured out that each type of accumulation, a_ij(starting > at day i ending > at day j) must be identified, so that in the end i can use: > > forval j = 2/31 { > forval i = 1/`j' { > egen daymean = rmean(day`i'-day`j') if a_`i'`j' == 1 > replace day`i' = daymean > drop daymean > } > } > > but i'm not sure how to define all a_ij > Ernest Berkhout That looks indeed like a rather complex problem, for which i don't have a golden key. The problem with your present setup is that for instance 'egen rmean(day1-day2) will not work because flag1 is in that list as well. My approach would be to first reshape the data to long form (which is more comfortable to most datamanipulation issues), so that you have the variables day (which should better be renamed 'temperature' or something), flag, an indicator for day 1 to 31, and an indicator for a/b/c. From there it should be possible to construct some group-variable that indicates each group of records where temperature should be filled in and their accumulated value from which it should be calculated. In your example on the second row you have a group where day1, day2 and day3 belong together, day4 is a group and day5 is a group. Let your grouping variable assign a value of 1 to the first 3 records (day1, day2, day3), a value of 2 to the record of day4, etc. From there you could work somethinh out using the 'by groupvar:' construct and explicit subscripting, as in "by groupvar: replace temperature=temperature[_N] if temperature<." You might want to convert accumulated temperatures to averages before this. There are probably a lot of other ways to tackle the problem, but i think the reshape-to-long is the most important thing here. NJC >>> I endorse Ernest's suggestion. The existence of the -egen, r*()- functions is just about the only exception to the Stata principle that doing it long is easier than doing it wide. I presume that each observation is a month's worth of rainfall data. I set on one side the small issue of what happens if the month contains 28, 29, 30 days, probably no issue. I created something with I think is the same structure like this: set obs 100 forval i = 1/31 { gen str flag`i' = cond(uniform() < 0.7, "s", "a") gen day`i' = cond(flag`i' == "s", 0, int(10*uniform())) } You need a unique identifier if you don't have one gen id = _n and then it's a common-or-garden -reshape- reshape long day flag , i(id) The variable names are now a little unclear, so I clean up rename day rainfall rename _j day Now the averaging is to be done for each block of observations ending with a "s" flag. I look at -tsspell-, which is my favourite tool for making spells, Hogwarts training notwithstanding, and the nearest equivalent is an option for identifying spells based on something happening in the _first_ observation of each spell (like a birth, earthquake, eruption, election, Stata release...). That will serve fine so long as we look _backwards_ through time: gen seq = - _n tsset seq -- the -tsset- being necessary for -tsspell- to apply -- tsspell, fcond(flag == "a") -tsspell- leaves behind three variables, of which _spell is of direct use here: egen mean = mean(rainfall), by(_spell) Now reverse time to its proper order sort id day The variables created by -tsspell- may have other uses. -tsspell- is on SSC. Nick n.j.cox@durham.ac.uk P.S. 1 inch = 25.4 mm. I suspect the USA is the only country still measuring rainfall in inches. * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: average value among differing numbers of variables***From:*Radu Ban <rban@nber.org>

- Prev by Date:
**Re[2]: st: How to do trend test of OR?** - Next by Date:
**st: xtreg fixed effect with time trend; constant** - Previous by thread:
**Re: st: average value among differing numbers of variables** - Next by thread:
**st: xtreg fixed effect with time trend; constant** - Index(es):

© Copyright 1996–2015 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |