[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Nick Cox" <n.j.cox@durham.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: RE: var gen based on history |

Date |
Sun, 13 Jul 2003 17:40:28 +0100 |

Webb.Bayard > I have variables Category, Date and Amount and I need to > generate two new > variables described as follows. > > minAmount = smallest Amount in Category as of Date > maxAmount = largest Amount in Category as of Date > > > Example: > Category Date Amount min max > 1 1/1/2001 50 50 50 > 2 1/1/2001 100 100 100 > 1 1/2/2001 52 50 52 > 2 1/2/2001 98 98 100 > 1 1/3/2001 47 47 52 > 2 1/3/2001 105 98 105 > 1 1/4/2001 55 47 55 > 2 1/4/2001 93 93 105 > > I've looked, but I can't come up with the right gen or egen > commands and > I've not yet tackled programming. I'm hoping someone can > spot some tricky > way of accomplishing this. Solution 1. Use -record()- from -egenmore- ========================================== The -egen- function -record()- written by Kit Baum and myself is aimed at this problem: record(exp) [ , by(varlist) min order(varlist) ] produces the maximum (with min the minimum) value observed "to date" of the specified exp. Thus record(wage), by(id) order(year) produces the maximum wage so far in worker's career, calculations being separate for each id and records being determined within each id in year order. Although explanation and example here refer to dates, nothing in record() restricts its use to data ordered in time. If not otherwise specified with by() and/or order(), records are determined with respect to the current order of observations. No special action is required for missing values, as internally record() uses either the max() or the min() function, both of which return results of missing only if all values are missing. In fact, the write-up is longer than the code: *! 1.2.1 CFB/NJC 8 Oct 2001 * 1.2.0 CFB/NJC 8 Oct 2001 * 1.1.0 CFB 06 Oct 2001 program define _grecord version 6.0 syntax newvarname =/exp [if] [in] [, BY(varlist) ORDER(varlist) MIN ] tempvar touse obsno local op = cond("`min'" == "min", "min", "max") quietly { mark `touse' `if' `in' gen `typlist' `varlist' = `exp' if `touse' gen long `obsno' = _n sort `touse' `by' `order' `obsno' by `touse' `by': /* */ replace `varlist' = `op'(`varlist',`varlist'[_n-1]) if `touse' } end This function is part of the -egenmore- package on SSC. You can install it by . ssc inst egenmore Solution 2: interactively from first principles =============================================== Let's back up and tackle your problem from first principles. We want to do calculations separately -by category:- and naturally given -category- the appropriate -sort- order is by -date-. Also a simple, and perfectly adequate, starting point is that the very first value observed is the minimum and the maximum observed so far, so that is an appropriate initialisation: bysort category (date) : gen min = amount[1] by category : gen max = amount[1] On that second line, we could, for symmetry, have written bysort category (date): gen max = amount[1] but we know that the data are -sort-ed as we wish, because we just arranged that. A principle used here is that under the aegis of -by:- subscripts are interpreted _within_ groups. As each new value comes in, it is a new minimum if it is less than the previous minimum, and a new maximum if it is greater than the previous maximum. by category : replace min = min(amount, min[_n-1]) by category : replace max = max(amount, max[_n-1]) And that's it, in essence. What could go wrong, or what could complicate matters: 1. Missing values. Actually, this code is robust against missing values. That's explained in the write-up for -record()- above. In fact, I could have initialised to bysort category (date) : gen min = . by category : gen max = . or indeed to anything else. 2. You want to do this for some subset defined by -if- and -in-. You could also do that interactively, but it will be easier to use -record()-. 3. You want to do this for some expression, not just a single variable. Same comment. Summary ======= canned solutions: egen min = record(amount), by(category) order(date) min egen max = record(amount), by(category) order(date) from first principles: bysort category (date) : gen min = amount[1] by category : gen max = amount[1] by category : replace min = min(amount, min[_n-1]) by category : replace max = max(amount, max[_n-1]) John Hennen's solution ====================== John Hennen posted a solution too: sort date category capture drop order gen byte order = _n capture drop min capture drop max sort category date foreach X of new min max { by category : gen `X' = amount[1] } qui by category : replace min = amount if amount < min[_n-1] qui by category : replace min = min[_n-1] if min > min[_n-1] qui by category : replace max = amount if amount > max[_n-1] qui by category : replace max = max[_n-1] if max < max[_n-1] & _n >1 sort order list This is almost the same idea, but not quite. The code for the maximum will perpetuate any missing values: after the first missing value is observed, that's the record from now on. This is because missings count as higher than any non-missing. In practice, this might not be a problem, but watch out. Nick n.j.cox@durham.ac.uk * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: var gen based on history***From:*"Webb.Bayard" <Bayard.Webb@igt.com>

- Prev by Date:
**st: comparisons of sample data with published table data (immediate ANOVA)** - Next by Date:
**st: RE: How to drop the entire case from the panel?** - Previous by thread:
**st: var gen based on history** - Next by thread:
**st: QR decomposition** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |