Thanks, Joseph, Jeph and Maarten for your helpful suggestions. I
incorporated these ideas into the following program which returns
scalars holding the observations with the min and max. I found the
easiest way to get those was to sort the original observation number
("ord" in the program below) along with the variable of interest. Then
after the bysorts, I resorted with respect to "ord" to get back the
original order.
program define argmax1
// finds observations with `x' = max and `x' = min
version 9.2
syntax anything [if] [in]
tokenize `anything'
args x kmin kmax
tempvar ind ord rep N
qui gen byte `ind'=0
qui replace `ind'=1 `if' `in'
qui gen int `ord'=_n
bysort `ind' (`x'): gen `rep'=_n
bysort `ind' (`x'): gen `N'=_N
summ `ord' if `ind'==1 & `rep'==1,meanonly
scalar `kmin'=r(mean)
summ `ord' if `ind'==1 & `rep'==N,meanonly
scalar `kmax'=r(mean)
sort ord
end
-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Joseph
Coveney
Sent: Tuesday, July 25, 2006 2:05 AM
To: Statalist
Subject: Re: st: FW: argmax for -summaraize-
Alan H. Feiveson wrote:
Hello - Does anyone know an efficient way to identify the observation at
which a particular variable is minimum or maximum (subject to `if'
and/or `in') ?
Apparently -summarize- does not return this value. I see nothing in
-egen- nor does "findit argmax" produce anything. I can program this
myself by looping through the observations but that is not efficient. In
particular one cannot gurantee that anything like
summ x
local xmax=r(max)
if x = `xmax' {
...
will work because of rounding. I also wish to avoid -preserve-,
-collapse-, etc
------------------------------------------------------------------------
--------
Couldn't you just -generate- a 0/1 indicator variable? Then just use
the indicator in your Boolean expression: -if indicator_variable . . .-
Generating such a variable (1) allows for both -if- and -in-, (2) won't
be affected by missing values in the target variable, and (3) doesn't
appear to be liable to rounding errors regardless of whether the target
variable is
single- or double-precision: one (and only one) maximum observation is
identified in each of 1000 200-observation datasets.
Joseph Coveney
clear
set more off
set seed `=date("2006-07-25", "ymd")'
set matsize 10000
tempname A
tempvar a max
set obs 200
generate double `a' = . // double-precision generate byte `max' = 0
forvalues i = 1/1000 {
quietly replace `a' = uniform()
summarize `a', meanonly
quietly replace `max' = (`a' == r(max)) if (1==1) in 1/200
summarize `max', meanonly
matrix define `A' = (nullmat(`A') \ r(sum)) } drop _all svmat byte
`A', names(col) assert c1 == 1
*
clear
set obs 200
generate float `a' = . // single-precision generate byte `max' = 0
forvalues i = 1/1000 {
quietly replace `a' = uniform()
summarize `a', meanonly
quietly replace `max' = (`a' == r(max)) if (1==1) in 1/200
summarize `max', meanonly
matrix define `A' = (nullmat(`A') \ r(sum)) } drop _all svmat byte
`A', names(col) assert c1 == 1 exit
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/