Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: var gen based on history

From   "Nick Cox" <>
To   <>
Subject   st: RE: var gen based on history
Date   Sun, 13 Jul 2003 17:40:28 +0100


> I have variables Category, Date and Amount and I need to
> generate two new
> variables described as follows.
> minAmount = smallest Amount in Category as of Date
> maxAmount = largest Amount in Category as of Date
> Example:
> Category		Date		Amount	min	max
> 1			1/1/2001	50		50	50
> 2			1/1/2001	100		100	100
> 1			1/2/2001	52		50	52
> 2			1/2/2001	98		98	100
> 1			1/3/2001	47		47	52
> 2			1/3/2001	105		98	105
> 1			1/4/2001	55		47	55
> 2			1/4/2001	93		93	105
> I've looked, but I can't come up with the right gen or egen
> commands and
> I've not yet tackled programming. I'm hoping someone can
> spot some tricky
> way of accomplishing this.

Solution 1. Use -record()- from -egenmore-

The -egen- function -record()- written by Kit Baum and
myself is aimed at this problem:

record(exp) [ , by(varlist) min order(varlist) ] produces the maximum
    (with min the minimum) value observed "to date" of the specified
    Thus record(wage), by(id) order(year) produces the maximum wage so
    far in worker's career, calculations being separate for each id
    records being determined within each id in year order. Although
    explanation and example here refer to dates, nothing in record()
    its use to data ordered in time. If not otherwise specified with
    and/or order(), records are determined with respect to the current
    order of observations. No special action is required for missing
    as internally record() uses either the max() or the min()
    both of which return results of missing only if all values are

In fact, the write-up is longer than the code:

*! 1.2.1 CFB/NJC 8 Oct 2001
* 1.2.0 CFB/NJC 8 Oct 2001
* 1.1.0 CFB 06 Oct 2001
program define _grecord
	version 6.0
	syntax newvarname =/exp [if] [in] [, BY(varlist) ORDER(varlist) MIN ]
	tempvar touse obsno
	local op = cond("`min'" == "min", "min", "max")
	quietly {
		mark `touse' `if' `in'
		gen `typlist' `varlist' = `exp' if `touse'
		gen long `obsno' = _n
		sort `touse' `by' `order' `obsno'
		by `touse' `by': /*
	*/ replace `varlist' = `op'(`varlist',`varlist'[_n-1]) if `touse'

This function is part of the -egenmore- package on SSC. You
can install it by

. ssc inst egenmore

Solution 2: interactively from first principles

Let's back up and tackle your problem from first principles.

We want to do calculations separately -by category:-
and naturally given -category- the appropriate -sort- order
is by -date-.

Also a simple, and perfectly adequate, starting point is that
the very first value observed is the minimum and the maximum
observed so far, so that is an appropriate initialisation:

bysort category (date) : gen min = amount[1]
by category : gen max = amount[1]

On that second line, we could, for symmetry, have

bysort category (date): gen max = amount[1]

but we know that the data are -sort-ed as we wish,
because we just arranged that.

A principle used here is that under the aegis of -by:-
subscripts are interpreted _within_ groups.

As each new value comes in, it is a new minimum if it is
less than the previous minimum, and a new maximum if it
is greater than the previous maximum.

by category : replace min = min(amount, min[_n-1])
by category : replace max = max(amount, max[_n-1])

And that's it, in essence.

What could go wrong, or what could complicate matters:

1. Missing values. Actually, this code is robust
against missing values. That's explained in the
write-up for -record()- above. In fact, I
could have initialised to

bysort category (date) : gen min = .
by category : gen max = .

or indeed to anything else.

2. You want to do this for some subset defined
by -if- and -in-. You could also do that interactively,
but it will be easier to use -record()-.

3. You want to do this for some expression, not
just a single variable. Same comment.


canned solutions:

egen min = record(amount), by(category) order(date) min
egen max = record(amount), by(category) order(date)

from first principles:

bysort category (date) : gen min = amount[1]
by category : gen max = amount[1]
by category : replace min = min(amount, min[_n-1])
by category : replace max = max(amount, max[_n-1])

John Hennen's solution

John Hennen posted a solution too:

 sort date category
 capture drop order
  gen byte order = _n
 capture drop min
 capture drop max
 sort category date
  foreach X of new min max {
    by category : gen `X' = amount[1]
 qui by category : replace min = amount if amount < min[_n-1]
 qui by category : replace min = min[_n-1] if min > min[_n-1]
 qui by category : replace max = amount if amount > max[_n-1]
 qui by category : replace max = max[_n-1] if max < max[_n-1] & _n >1
 sort order

This is almost the same idea, but not quite.

The code for the maximum will perpetuate any missing values:
after the first missing value is observed, that's the
record from now on. This is because missings count
as higher than any non-missing. In practice, this
might not be a problem, but watch out.


*   For searches and help try:

© Copyright 1996–2022 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index