Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: running sum restarting after missing value


From   n j cox <n.j.cox@durham.ac.uk>
To   n j cox <n.j.cox@durham.ac.uk>
Subject   Re: st: running sum restarting after missing value
Date   Fri, 20 Jul 2007 15:09:41 +0100

Just as a follow-up: Given that you have -tsset- your data

.  tsset id year
       panel variable:  id, 1 to 1
        time variable:  year, 1984 to 1989

and installed -tsspell- from SSC, then spells are
defined by -var1- being non-missing:

. tsspell , cond(!missing(var1))

This automatically produces three variables, by default
_seq, _spell, _end.

. l

     +---------------------------------------------------+
     | id   year   var1   sum_var   _seq   _spell   _end |
     |---------------------------------------------------|
  1. |  1   1984      1         1      1        1      0 |
  2. |  1   1985      1         2      2        1      0 |
  3. |  1   1986      1         3      3        1      1 |
  4. |  1   1987      .         .      0        0      0 |
  5. |  1   1988      1         1      1        2      0 |
     |---------------------------------------------------|
  6. |  1   1989      1         2      2        2      1 |
     +---------------------------------------------------+

You could then type

. bysort id spell : gen sum_var1 = sum(var1)

n j cox wrote:


This is confusing. You say you want a running sum, but your example
does not show one. I am going to trust your initial statement and
ignore your example.

-sum()- is trained to ignore missings. Usually this is s feature,
but not for your purposes.

One take on this is that you evidently regard runs of non-missing values
as distinct spells. Identifying such spells explicitly would enable
you to do something like this:

by id spell : gen sum_var1 = sum(var1)

and with one bound you are then home free.

-tsspell- on SSC is one user-written tool for working with spells. I took a step back from that to explain, in excruciating detail, how to do it all (and more) from first principles in

Cox, N.J. 2007. Identifying spells. Stata Journal 7(2): 249-265.

In this particular problem there are various direct ways of
restarting, without identifying spells of non-missing values as
distinct spells. Here's one:

gen sum_var1 = .

bysort id (year) :
replace sum_var1 =
cond(missing(var1[_n-1]),
var1,
var1 + sum_var1[_n-1]))

For sources on information on -cond()-, -search cond()-.

In words, if the previous one is missing, the sum becomes the present value; otherwise add the present value to the sum so far. Missings
will map to missings on this rule.

As usual, note that -replace- entails a previous -generate-.

The connections between this work on "identifying spells" and work of any other past or present students of Hogwarts Academy is allusive, elusive and illusory.

By the way, it seems that you have yet to read to the end of the
Statalist FAQ:

http://www.stata.com/support/faqs/res/statalist.html#spell

Nick
n.j.cox@durham.ac.uk

Erasmo Giambona

I am trying to create a running sum using the sum function by group.
My problem is that I would like STATA to restart summing again after
each missing value and match the total with all previous observations.
For example:

id yeara var1
1 1984 1
1 1985 1
1 1986 1
1 1987 .
1 1988 1
1 1989 1

My output should look like

id year var1 sum_var1
1 1984 1 3
1 1985 1 3
1 1986 1 3
1 1987 . .
1 1988 1 2
1 1989 1 2

I tried the following, but it doesn't get me what I need.

by id: gen sum_var1=sum(var1) if var1!=.
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index