Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: taking the average of duplicate observations

From   Nick Cox <>
To   "" <>
Subject   Re: st: taking the average of duplicate observations
Date   Fri, 3 May 2013 12:41:40 +0100

A traditional and still standard method of measuring rainfall is once
daily by measuring the depth of water accumulated in a gauge (which
may be zero, naturally). For centuries, somebody did the measurement.

An automated alternative now often seen is that a bucket is emptied
when full and the time of emptying recorded.

Your data does not sound like either form and from what you say its
quality is dubious.

There is no problem about including URLs in a Statalist post. (Look
again to see URLs automatically included in _every_ posting, at the
bottom.) The thread you alluded to appears to start at


On 3 May 2013 12:24, Michael Tekle Palm <> wrote:

> Sorry about the inexactness of my post, I was afraid the inclusion of a URL would alert spam filters or the like.
> -collapse- certainly did the trick, thank you so much. Also this shouldn't have any other implications since there are no other variables included in this data set so it works out elegantly.
> I can understand your puzzlement. I am using monthly rainfall data from a developing country where unfortunately and inexplicably, for a few stations, some months have several differing rainfall outcomes. Had it been daily data, I would probably have assumed that there had been multiple measurement times per day, but it being monthly data, my conclusion was that it was down to some kind of input error. I appreciate your comment and will definitely try to investigate more closely.

>> Your reference to another post lacks a URL, nor can we comment on code
>> that you don't show us, but there is a one-word solution: -collapse-.
>> collapse rainfall, by(station year month)
>> But I've worked a lot with rainfall data, and I'm puzzled at what you
>> are doing here. If these are daily data, the convention is to use
>> totals, not means. -collapse- can do that too.

On 3 May 2013 11:48, Michael Tekle Palm <> wrote:

>> > I have observations with identical time values but different outcome values. Instead of dropping all but the first observations for every two/three duplicates, I want to calculate and replace with the average of the observations, and then drop the duplicates.
>> >
>> > So my data is on rainfall for a given location and is disaggregated by year and month. E.g:
>> >
>> > Station | Year | Month | Rainfall
>> > ---------------------------------------
>> > 1 1980 1 5
>> > 1 1980 1 3
>> > 1 1980 2 4
>> > 1 1980 3 8
>> > 1 1980 3 1
>> >
>> >
>> > So for each duplicate by station year month, I would like to calculate the average value for the rainfall outcomes, use this value and drop all duplicates. I think the solution suggested in this ["RE: st: questions about duplicate observations"] Statalist reply may work, but I wasn't quite able to make it work.

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index