Radu Ban <[email protected]> writes, 
> This is a data management question. The data that I'm looking at (daily U.S.
> weather) has the following structure.
>
>   day1 flag1 day2 flag2 day3 flag3 day4 flag4 day5 flag5 ... day31 flag31
>      0     s   a2     a    0     s    0     s   a5     a       a31  
>      0     s    0     s   b3     a   b4         b5             b31
>     c1          0     s    0     s    0     s   c5     a       c31
>
> the "s" flag means that the measured element (say inches of rain) is
> accumulated over those days, which are assigned a 0 value, and the
> accumulated amount is reported in the day flagged with "a". i would like to
> replace the 0 value for the accumulation days with the average of the
> accumulated value over those days.
>
> given the notations above, specifically, i would like to replace 0, 0, b3
> (in the second row) with b3/3; 0, 0, 0, c5 (in the third row) with c5/4, and
> so on. note that, as in the first row there can be more than one
> accumulation series per row.
I do now know what the solution to this problem is yet -- I will -- but I 
do know these kinds of problems are easier viewed the long way:
      orig_obs   i    day   flag
             1   1      0      s
             1   2     a2      a
             1   3      0      s
             1   4      0      s
             1   5     a5      a
           ...
             1  31    a31      
             2   1      0      s
             2   2      0      s
             2   3     b3      a
             2   4      0      s
             2   5     c5      a
           ...
             2  31    c31
           etc.
-reshape- can make the data look like that -- we'll worry about the details 
later.
I am unsure from Radu's description whether variable flag ever contains
anything on that "a" and "s".  Radu sort of implies " " is also possible, 
in which case the measurement would be for that day.  If so, I could 
view that as an "a" observation:  One accumulates the single day and 
divides by one.  So just in case there are any blanks, 
        . replace flag = "a" if flag=="s"
and then, just to verify that these data are as they have been explained to 
be, let's verify that flag is now always "s" or "a":
        . assert flag=="s" | flag=="a"
Now the problem is getting easier:
    1.  We start accumulation at the first observation for each orig_obs 
        group.
    2.  We continue accumulation up until we see an "a".
    3.  Then, we start accumulation again.
So let's add a new variable begin that will record 1 every time an accumulation
begins:
        . by orig_obs: gen begin = cond(_n==1 | flag[_n-1]=="a", 1, 0)
Now our dataset looks like, 
      orig_obs   i    day   flag   begin
             1   1      0      s       1
             1   2     a2      a       0
             1   3      0      s       1
             1   4      0      s       0
             1   5     a5      a       0
           ...
             1  31    a31      a       <could be 0 or 1>
             2   1      0      s       1
             2   2      0      s       0
             2   3     b3      a       0
             2   4      0      s       1
             2   5     c5      a       0
           ...
             2  31    c31      a       <could be 0 or 1>
Understand what I did:  I merely created a variable equal to 1 marking 
the beginning of each group in which we need to distribute sum.  If I now 
sum variable begin, I will have group numbers:
        . gen group = sum(begin)
The data set now looks like:
      orig_obs   i    day   flag   begin    group
             1   1      0      s       1        1
             1   2     a2      a       0        1
             1   3      0      s       1        2
             1   4      0      s       0        2
             1   5     a5      a       0        2
           ...
             1  31    a31      a       ?       10  <- i just made up 10
             2   1      0      s       1       11
             2   2      0      s       0       11
             2   3     b3      a       0       11
             2   4      0      s       1       12
             2   5     c5      a       0       12
           ...
             2  31    c31      a       ?        ?
           etc.
Now the problem is easy:  within group, replace all the observations 
with the value of the last observation in the group, divided by the 
number of observations in the group:
        . sort group
        . by group: gen newday = day[_N]/_N
I have my solution.  Actually, I could have typed 
         . by group: replace day = day[N]/_N
but I want to -list- the result and make sure it looks right to me.  Then 
I can replace day:
         . replace day = newday
         . drop newday
Now all I need to do is switch the data back to being in the wide format.
So here is the complete solution:
        . gen orig_obs = _n
        . reshape long day flag, i(i)
        . replace flag = "a" if flag=="s"
        . assert  flag=="s" | flag=="a"
        . sort orig_obs i
        . by orig_obs: gen begin = cond(_n==1 | flag[_n-1]=="a", 1, 0)
        . gen group = sum(begin)
        . sort group
        . by group: gen newday = day[_N]/_N
	. list                                <-- look to make sure right
        . replace day = newday
        . drop newday group
        . reshape wide
-- Bill
[email protected]
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/