Non-linear, robust smoother (STB-8: sed7.1) --------------------------- ^nlsm^ compound_smoother[^,twice^] varname^, g^enerate^(^newvar^)^ [^nos^hift] where compound_smoother = Sm[Sm...] and S is {^1^|^2^|^3^|^4^|^5^|^6^|^7^|^8^|^9^}[^R^] ^3^[^R^]^S^[^S^|^R^][^S^|^R^]... ^E^ ^H^ For example ^3RSSH 3RSSH,twice 4253H 4253H,twice 43RSR2H,twice^ or lowercase ^3rssh 3rssh,twice 4253h 4253h,twice 43rsr2h,twice^ Description ----------- ^nlsm^ implements nonlinear data smoothers as originally described by Tukey (1971). varname may not contain missing values. Options ------- ^generate(^newvar^)^ is not optional; it specifies the name of the new variable to be created. ^noshift^ prevents shifting the data one time period forward for each pair of even smoothers applied. This option is useful only if subsequent smoothers are to be applied and, in that case, making required shifts of the data is the user's responsibility. Remarks ------- A smoother spearates a data sequence y[t], t = 1, 2, ..., N, into a smooth, z[t]=Sm{y[t]}, and a rough, r[t] = y[t]-z[t]. A compound smoother applies the smoothers sequentially; thus, if A and B are smoothers, the smoother AB is defined as A{B{y[t]}}. Running median smoothers of odd span ------------------------------------ The smoother ^3^ defines z[t] = median(y[t-1],y[t],y[t+1]). The smoother ^5^ defines z[t] = median(y[t-2],y[t-1],y[t],y[t+1],y[t+2]), and so on. The smoother ^1^ defines z[t] = median(y[t]) and so does nothing. In all cases, end-points are handled by using smoothers of shorter, odd span. Thus, in the case of ^3^, z[1] = y[1] z[2] = median(y[1],y[2],y[3]) . z[N-1] = median(z[N-2],z[N-1],z[N] Z[N] = y[N] In the case of ^5^, z[1] = y[1] z[2] = median(y[1],y[2],y[3]) z[3] = median(y[1],y[2],y[3],y[4],y[5]) z[4] = median(y[2],y[3],y[4],y[5],y[6]) . z[N-2] = median(y[N-4],y[N-3],y[N-2],y[N-1],y[N]) z[N-1] = median(y[N-2],y[N-1],y[N] Z[N] = y[N] and so on. Running median smoothers of even span ------------------------------------- Define the median() function as returning the linearly interpolated value when given an even number of arguments. Thus, the smoother ^2^ defines z[t+.5] = (y[t]+y[t+1])/2. The smoother ^4^ defines the z[t+.5] as the linearly interpolated median of (y[t-1],y[t],y[t+1],y[t+2]), and so on. In all cases, end-points are handled by using smoothers of shorter, even span. Thus, in the case of ^4^, z[1.5] = median(y[1],y[2]) = (y[1]+y[2])/2 z[2.5] = median(y[1],y[2],y[3],y[4]) . z[N-2.5] = median(y[N-4],y[N-3],y[N-2],y[N]) z[N-1.5] = median(y[N-2],y[N-1]) z[N-.5] = median(y[N-1],y[N]) z[N+.5] = y[N] (almost irrelevant) ^nlsm^ keeps track of the number of even smoothers applied to the data; it is recommended that such smoothers always be applied in pairs. After all smoothers have been applied, the data is then shifted forward one position for each pair of even smoothers. Thus, the smoother ^4253^ or ^4523^ would result in values for z[2] through z[N]; z[1] would be missing. The physical shifting of the data is not performed if ^noshift^ is specified. The repeat operator ------------------- ^R^ indicates that a smoother is to be repeated until convergence, that is, until repeated applications of the smoother produce the same series. Thus, ^3^ applies the smoother of running medians of span 3. ^33^ applies the smoother twice. ^3R^ applies produces the result of repeating ^3^ an infinite number of times. ^R^ should only be used with odd-span smoothers, since even-span smoothers are not guaranteed to converge. The smoother ^453R2^ applies a span-4 smoother, followed by a span-5 smoother, followed by repeated applications of a span-3 smoother, followed by a span-2 smoother. End-point rule -------------- The end-point rule ^E^ modifies the values z[1] and z[N] according to the formulas: z[1] = median( 3*z[2]-2*z[3], z[1], z[2] ) z[N] = median( 3*z[N-2]-2*z[N-1], z[N], z[N-1] ) When the end-point rule is not applied, end-points are typically "copied-in", i.e., z[1]=y[1] and z[N]=y[N]. Splitting operator ------------------ The smoothers ^3^ and ^3R^ can produce flat-topped hills and valleys. The split operator is an attempt to eliminate such hills and valleys by splitting the sequence, applying the end-point rule ^E^, rejoining the series, and then resmoothing by ^3R^. The ^S^ operator may be applied only after ^3^, ^3R^, or ^S^. It is recommended that the ^S^ operator be repeated once (^SS^) or until no further changes take places (^SR^). Hanning smoother ---------------- ^H^ is the Hanning linear smoother z[t] = (y[t-1]+2*y[t]+y[t+1])/4. End points are copied in, z[1]=y[1] and z[N]=y[N]. ^H^ should be applied only after all nonlinear smoothers. Twicing ------- A smoother devides the data into a smooth and a rough: observed = smooth + rough If the smoothing is successful, the rough should exhibit no pattern. Twicing refers to applying the smoother to the observed, calculating the rough, and then applying the smoother to the rough. The resulting "smoothed rough" is then added back to the smooth from the first step. Examples -------- . ^nlsm 3 coalprdn, gen(smcp)^ . ^nlsm 3r coalprdn, gen(smcp)^ . ^nlsm 3rss coalprdn, gen(smcp)^ . ^nlsm 3rssh3rssh3 coalprdn, gen(sm1)^ . ^nlsm 3rssh,twice coalprdn, gen(sm1)^ . ^nlsm 4253eh,twice gnp, gen(sgnp)^ References ---------- Tukey, J. W. 1977. ^Exploratory Data Analysis^. Reading, MA: Addison-Wesley Publishing Company. Velleman, P. F. 1977. Robust nonlinear data smoothers: Definitions and recommendations. ^Proc. Natl. Acad. Sci. USA^ 74(2): 434-436. Velleman, P. F. 1980. Definition and comparison of Robust Nonlinear data smoothing algorithms. ^JASA^ 75(371): 609-615. Also see -------- STB: sed7.1 (STB-8)