.- help for ^cutv5^ [STB-49: dm66; STB-50: dm66.1; STB-51: dm66.2] .- . Recoding variables using grouped values (Stata 5 version) --------------------------------------------------------- . ^egen^ newvar = ^cutv5(^varname^),^ ^br^eaks^(^x1,x2,...,xk^)^ [^ic^odes ^lab^el ^g^roup^(^k^)^] . Description ----------- . The option ^cutv5^ in ^egen^ creates a new categorical variable coded with the left-hand ends of the grouping intervals specified in breaks(). It allows short-cuts in specifying the breaks, (labelled) integer codes in place of the left-hand ends of the intervals, and can produce approximately equal frequency groups. . . Options ------- . ^breaks( )^ supplies the breaks for the groups, in ascending order. The list of break points may be simply a list of numbers separated by commas, but can also include the syntax a[b]c, meaning from a to c in steps of size b. If no breaks are specified the command expects the option ^group^. . ^icodes^ requests that the codes 0, 1, 2, etc. be used in place of the left-hand ends of the intervals. . ^label^ requests that the integer coded values of the grouped variable be labelled with the left--hand ends of the grouping intervals. Specifying this option automatically invokes ^icodes^. . ^group( )^ specifies the number of equal frequency grouping intervals to be used in the absence of ^breaks^. Specifying this option automatically invokes ^icodes^. The command works by first calculating the appropriate percentiles using the command ^pctile^ and then using the percentiles as break points. . . Example ------- . Using the variable ^length^ from the ^auto^ data, the commands . ^egen lgrp=cutv5( length),breaks(140,180,200,220,240)^ ^tab lgrp^ . produce the output . lgrp | Freq. Percent Cum. ------------+----------------------------------- 140 | 31 41.89 41.89 180 | 16 21.62 63.51 200 | 20 27.03 90.54 220 | 7 9.46 100.00 ------------+----------------------------------- Total | 74 100.00 . So will the command . ^egen lgrp=cutv5( length),breaks(140,180[20]240)^ . Values outside the range 140--240 are coded as missing. The command . ^egen lgrp = cutv5(length), breaks(140,180[20]240) label^ . will produce a variable coded 0, 1, 2, 3 but labelled 140-, 180-, 200-, 220-. . Thus ^tab lgrp^ produces . lgrp | Freq. Percent Cum. ------------+----------------------------------- 140- | 31 41.89 41.89 180- | 16 21.62 63.51 200- | 20 27.03 90.54 220- | 7 9.46 100.00 ------------+----------------------------------- Total | 74 100.00 . and ^tab lgrp, nolab^ produces . lgrp | Freq. Percent Cum. ------------+----------------------------------- 0 | 31 41.89 41.89 1 | 16 21.62 63.51 2 | 20 27.03 90.54 3 | 7 9.46 100.00 ------------+----------------------------------- Total | 74 100.00 . The commands . ^egen lgrp = cutv5(length), group(4) label^ ^tab lgrp^ . will produce . lgrp | Freq. Percent Cum. ------------+----------------------------------- 142- | 17 22.97 22.97 170- | 20 27.03 50.00 192.5- | 18 24.32 74.32 204- | 19 25.68 100.00 ------------+----------------------------------- Total | 74 100.00 . . Authors ------- . David Clayton, Biostatistical Research Unit, Cambridge, david.clayton@@mrc-bsu.cam.ac.uk . Michael Hills (retired) Imperial College School of Medicine, London, UK mhills@@regress.demon.co.uk