.- help for ^dups^ (STB-41: dm53) .- Detection and deletion of duplicate observations ------------------------------------------------ ^dups^ [varlist] [ ^, drop^ ^e^xpand^(^varname^)^ ^k^ey^(^varlist2^)^ ^u^nique ^t^erse ] Description ----------- ^dups^ provides information about unique and duplicate observations in the dataset and, optionally, drops all duplicate observations. ^varlist^ is an optional variable list that determines which observations are duplicates: observations must match exactly on all variables in the list to be duplicates. If no ^varlist^ is given, then all variables in the dataset are used to determine duplicates. Options ------- ^drop^ causes duplicate observations to be dropped from the dataset. ^drop^ must be spelled out completely. ^drop^ creates an expand variable (the default name is ^_expand^) to allow dropped data to be recreated. If ^_expand^ exists, an error message is reported and no data are dropped. The expand variable will contain the number of duplicate copies of the observations in the original dataset. A subsequent ^expand^ command will completely resurrect the original data only if ^varlist^ was not specified in the ^dups^ command (or, equivalently, if ^varlist^ contains all variables in the dataset), or if the unspecified variables are constant within the subgroups formed by the specified variables. The data can be partially, but not fully, resurrected if a limited ^varlist^ was used (unique information from the variables not in ^varlist^ cannot be recovered). ^expand(varname)^ specifies a ^varname^ to be used as the expand variable in place of the default name, ^_expand^. (This option has no effect unless option ^drop^ is also included.) If the specified ^varname^ exists, an error message is given and no data are dropped. ^key(varlist2)^ causes the value of the variables in ^varlist2^ to be added to the displayed output for each group. ^varlist2^ should be the same as, or a subset of, ^varlist^. If ^varlist2^ is assigned value ^*^ then ^varlist2^ will be set the same as ^varlist^. ^unique^ causes the default display and option ^key()^ to list information for unique observations also. ^terse^ limits the default display output. When specified, only the number of duplicate groups, total observations, number of observations in duplicates, and number of unique observations are shown. Without ^terse^, ^dups^ will number the duplicate groups and provide the observation count in each group, and will do the same for unique observations, if any, when ^unique^ is specified. Specifying ^terse^ cancels both ^key()^ and ^unique^. Authors ------- Thomas J. Steichen RJRT steicht@@rjrt.com Nicholas J. Cox University of Durham, UK n.j.cox@@durham.ac.uk Examples -------- . ^dups^ . ^dups, drop^ . ^dups foreign, key(*) unique^ . ^dups foreign, drop expand(ex) terse^ Also see -------- STB: STB-41 dm53 On-line: help for @expand@, @fillin@, and @chkdup@ (if installed)