Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.

# Re: st: manual weighted average variable in panel data set

 From Nick Cox To statalist@hsphsun2.harvard.edu Subject Re: st: manual weighted average variable in panel data set Date Fri, 16 Nov 2012 09:22:21 +0000

```Those curious about a shorter version could think about

. bysort id : gen meanc = sum(conc * resid_time) / sum(resid_time)
. by id : replace meanc = meanc[_N]

Nick

On Fri, Nov 16, 2012 at 2:36 AM, hind lazrak <hindstata@gmail.com> wrote:
> The best things are the simplest, isn't it?
> Thanks for the help!
>
> Best,
> Hind
>
> On Thu, Nov 15, 2012 at 4:56 PM, Nick Cox <njcoxstata@gmail.com> wrote:
>> No loops are needed.
>>
>> . l id conc resid_time
>>
>>      +--------------------------+
>>      |    id    conc   resid_~e |
>>      |--------------------------|
>>   1. | 20059   15.96        380 |
>>   2. | 20059   21.17        100 |
>>   3. | 20059   18.07        480 |
>>   4. | 20060      30        181 |
>>   5. | 20060   16.68        292 |
>>      |--------------------------|
>>   6. | 20061   23.78        269 |
>>   7. | 20061   18.07        103 |
>>      +--------------------------+
>>
>> . bysort id : gen sumw = sum(resid_time)
>>
>> . by id : gen sumwc = sum(conc * resid_time)
>>
>> . by id : gen meanc = sumwc[_N] / sumw[_N]
>>
>> . l
>>
>>      +-------------------------------------------------------+
>>      |    id    conc   resid_~e   sumw      sumwc      meanc |
>>      |-------------------------------------------------------|
>>   1. | 20059   15.96        380    380     6064.8   17.55771 |
>>   2. | 20059   21.17        100    480     8181.8   17.55771 |
>>   3. | 20059   18.07        480    960    16855.4   17.55771 |
>>   4. | 20060      30        181    181       5430   21.77708 |
>>   5. | 20060   16.68        292    473   10300.56   21.77708 |
>>      |-------------------------------------------------------|
>>   6. | 20061   23.78        269    269    6396.82   22.19901 |
>>   7. | 20061   18.07        103    372    8258.03   22.19901 |
>>      +-------------------------------------------------------+
>>
>> This code is too long for efficiency, but shows what I believe you
>> want. You can also generate sumwc / sumc if you wish.
>>
>> I didn't try to follow your code, but you're missing how -sum()- does
>> cumulative sums.
>>
>> See also _gwtmean from SSC (David Kantor). (I don't agree with his
>> advice to make it a substitute for Stata's code for -gmean()-.)
>>
>> Nick
>>
>> On Fri, Nov 16, 2012 at 12:33 AM, hind lazrak <hindstata@gmail.com> wrote:
>>
>>> I have a panel data set with the following variables: ID time_resid  conc
>>> The repeated observations vary from 1 to 5 for each individual (ID).
>>> Each observation has a time period (time_resid) during which a
>>> pollutant concentration occurs (conc).
>>> Here is an excerpt of the data
>>>
>>> list    id conc resid_time counter in 1/7, sepby(id)
>>>
>>>     +-------------------------------+
>>>     id    conc   resid_~e   counter
>>>     ------------------------------------
>>> 1.    20059   15.96        380         1
>>> 2.    20059   21.17        100         2
>>> 3.    20059   18.07        480         3
>>>     ------------------------------------
>>> 4.    20060      30        181          1
>>> 5.    20060   16.68        292         2
>>>     ------------------------------------
>>> 6.    20061   23.78        269         1
>>> 7.    20061   18.07        103         2
>>>     +------------------------------------+
>>>
>>> I need to create a variable using  time_resid and conc that computes
>>> the time-weighted average concentration for each ID.
>>>
>>> The steps that I took were to create variable product (equal to
>>> resid_time * conc) and then I have been trying to come up with a loop
>>> that would do the following:
>>> for each person, and each observation compute the time weighted
>>> average concentration =( sum of product / total resid_time up) until
>>> the time at which the observation occurs.
>>>
>>> Here's my code:
>>> ***************************************
>>> bysort id: gen counter = _n
>>> bysort id: gen product= resid_time*conc
>>>
>>> bysort id: gen time = resid_time if _n==1 | counter[_N]==1
>>> bysort id: gen twa= conc if _n==1 | counter[_N]==1
>>> qui su counter
>>> forval i=1/`r(max)' {
>>> bysort id: replace product= product+ product[_n-`i'] if _n!=1
>>> bysort id: replace time= resid_time+ resid_time[_n-`i'] if _n=1
>>> }
>>> bysort id: gen twa = product/time
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
```