Finding significant gaps in univariate distributions [STB-21: sed8] ---------------------------------------------------- ^wgap^ varlist [^if^ exp] [^in^ range] ^, w^gap^(^prefix^) z(^prefix^)^ measures the significance of gaps or separations in univariate distributions. The difference between adjacent observations is weighted to highlight gaps in the middle of the distribution. An approximate z-score also can be calculated for each weighted gap. These z-scores can be used during exploratory data analysis to identify unusual or suspicious separations in the data. Typically, z-scores greater than 2.25 are considered candidates for further study, although this cutoff can be adjusted higher if you are looking for multiple gaps. ^wgap^ is somewhat robust to heavy-tailed distributions, but it can be confused by very heavy-tails (for example, uniform distributions) and by numerous ties. Options ------- At least one of the two options, ^wgap()^ or ^z()^, must be specified. Each of these options specifies a prefix to use for storing results. For example, ^wgap price, wgap(pgap)^ will store the weighted gaps in the new variable ^pgap^. Typically, the z-scores are of more interest than the gaps. If the varlist contains multiple variables, a number is added to the prefix. For example, ^wgap a b c, wgap(gap) z(z)^ will store the weighted gaps and z-scores for variable ^a^ in ^gap1^ and ^z1^, respectively, the gaps and z-scores for variable ^b^ in ^gap2^ and ^z2^, and the gaps and z-scores for variable ^c^ in ^gap3^ and ^z3^. References ---------- Wainer, H. and S. Schacht. 1978. Gapping. ^Psychometrika^ 43: 203-212. Author ------ Richard Goldstein Qualitas, Inc. richgold@@netcom.com Also see -------- STB: sed8 (STB-21)