Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Average of last two previous observations conditional on another variable value


From   Nick Cox <[email protected]>
To   "[email protected]" <[email protected]>
Subject   Re: st: Average of last two previous observations conditional on another variable value
Date   Mon, 24 Mar 2014 14:06:37 +0000

I struggled with this mightily until I got a better way to think about it.

What's clear is that a loop over observations is the solution of last
resort, although long-term readers of this list should be able to
recall that that doesn't rule out such a solution being offered....

The trick is that the two previous values you want are easy to
identify *once you have a dataset that contains only positive values
for -dummy-*.

In your example -id- is redundant given -year- and is manifestly not a
panel identifier. So, I faked a panel identifier for a single panel.
The code should work with an equivalent real panel identifier.

. clear

. input id      year      dummy      y        average

            id       year      dummy          y    average
  1. 1      1990         0           10           .
  2. 2      1991         0           11           .
  3. 3      1992         1           12           .
  4. 4      1993         1           13           .
  5. 5      1994         0           14          12.5
  6. 6      1995         0           15          12.5
  7. 7      1996         1           16          12.5
  8. 8      1997         1            17         14.5
  9. 9      1998         1            18         16.5
 10. 10    1999          0           19          17.5
 11. end

. gen panelid = 42

. save mymaster
file mymaster.dta saved

. keep if dummy
(5 observations deleted)

. bysort panelid (year) : gen time = _n

. tsset panelid time
       panel variable:  panelid (strongly balanced)
        time variable:  time, 1 to 5
                delta:  1 unit

. gen work = (y + L.y)/2
(1 missing value generated)

. keep panelid year work

. merge 1:1 panelid year using mymaster

    Result                           # of obs.
    -----------------------------------------
    not matched                             5
        from master                         0  (_merge==1)
        from using                          5  (_merge==2)

    matched                                 5  (_merge==3)
    -----------------------------------------

. tsset panelid year
       panel variable:  panelid (strongly balanced)
        time variable:  year, 1990 to 1999
                delta:  1 unit

. replace work = L.work if missing(work)
(3 real changes made)

. gen newaverage = L.work
(4 missing values generated)

. drop _merge work

. list

     +-------------------------------------------------------+
     | year   panelid   id   dummy    y   average   newave~e |
     |-------------------------------------------------------|
  1. | 1990        42    1       0   10         .          . |
  2. | 1991        42    2       0   11         .          . |
  3. | 1992        42    3       1   12         .          . |
  4. | 1993        42    4       1   13         .          . |
  5. | 1994        42    5       0   14      12.5       12.5 |
     |-------------------------------------------------------|
  6. | 1995        42    6       0   15      12.5       12.5 |
  7. | 1996        42    7       1   16      12.5       12.5 |
  8. | 1997        42    8       1   17      14.5       14.5 |
  9. | 1998        42    9       1   18      16.5       16.5 |
 10. | 1999        42   10       0   19      17.5       17.5 |
     +-------------------------------------------------------+

For convenience here is the code as a block, with commentary.

* set up example
clear
input id      year      dummy      y        average
1      1990         0           10           .
2      1991         0           11           .
3      1992         1           12           .
4      1993         1           13           .
5      1994         0           14          12.5
6      1995         0           15          12.5
7      1996         1           16          12.5
8      1997         1            17         14.5
9      1998         1            18         16.5
10    1999          0           19          17.5
end

* we need a panel identifier, here just for a single panel
gen panelid = 42
save mymaster

* reduce the dataset to just -dummy==1- and -tsset- with pseudotime
keep if dummy
bysort panelid (year) : gen time = _n
tsset panelid time

* we want the average of this y and the previous y: works w/ panels too
gen work = (y + L.y)/2

* now -merge- back and -tsset- as usual
keep panelid year work
merge 1:1 panelid year using mymaster
tsset panelid year

* fill in gaps; it's just carry forward
replace work = L.work if missing(work)

* finishing tape in sight; mine's a red wine
gen newaverage = L.work
drop _merge work
list

Nick
[email protected]


On 24 March 2014 13:15, Irene Ferrari <[email protected]> wrote:
> Dear Statalisters,
>
> I am a new statalister and I would greatly appreciate if you could
> give me advice on something which seems trivial to explain, less
> trivial to translate in Stata code.
>
> I have a panel dataset, and I would like to perform the average
> (variable "average") of the last two previous in time observations of
> "y" whose corresponding dummy variable ("dummy") is equal to 1.
>
> Example:
>
> --------------------------------------------------------------
> id      year      dummy      y        average
> 1      1990         0           10           .
> 2      1991         0           11           .
> 3      1992         1           12           .
> 4      1993         1           13           .
> 5      1994         0           14          12.5
> 6      1995         0           15          12.5
> 7      1996         1           16          12.5
> 8      1997         1            17         14.5
> 9      1998         1            18         16.5
> 10    1999          0           19          17.5
> --------------------------------------------------------------
>
> So, for example, when computing the average for year 1996, I need the
> programme to search back the two previous y such that the
> corresponding dummy is equal to 1, which are y==12 and y==13, and then
> compute the average.
>
> Here is the latest code (of many) I was trying, with no success (it
> seems stuck in an infinite loop...):
>
> replace average = .
> local k=1
> local r=0
> local N = _N
> local z=1
> forvalues i = 1/`N' {
>                     while `k'<=2 {
>                           if dummy[`i'-`z'] == 1 {
>                                                             local r=
> `r'+ y[`i'-`z']
>                                                    local k=`k'+1
>                                                    local z=`z'+1
> }
>                            else {
>                                                        local z=`z'+1
> }
> }
>    replace average=(`r'/2) in `i'
>    local k=1
>             local r=0
>             local z=1
> }
>
>
> I have never needed to perform complicated programming in Stata, so it
> is totally possible I am missing something obvious.
> Thanks for your consideration.
>
>
> Irene Ferrari
> PhD Student - Department of Economics
> University of Bologna
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index