[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: Re: Re: st: drop redundant value labels

From   n j cox <>
Subject   Re: Re: Re: st: drop redundant value labels
Date   Sun, 17 Feb 2008 16:56:20 +0000

My analysis resembles Sergiy's. If the trimmed down dataset were much smaller than the original, using -decode- on all the variables with labels followed by dropping all the label definitions and then an -encode- on all the -decode-d variables might be one way to go. Not especially attractive, but might be worth consideration.


Sergiy Radyakin

Unless there is some information regarding the selection to the final
sample -- brute force is the only way. It may be direct ( cycle
for-each-value-check-if-it-is-there) or it could be more involved, but
with the same thing going on behind the scenes. One thing to concider
however is whether you have more deleted labels or those that are
kept. E.g. in some cases it might be more efficient to cycle through
the observations that are left, than through all the labels,
especially if they (observations) are unique. Example: you have
observations, each representing an occupation, each occupation has a
label, you want to keep only "dangerous" occupations (defined as you
like). There will likely be relatively few of them among all, so go
brute force by observations, and keep the labels, that they are using.

You can also define your labels as a dataset with two fields: numeric
code and string label. After the selection in the data occurred, you
can merge the two datasets to determine, which labels must be kept.

But the overhead from having the labels should not be very large.

* For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index