Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nick Cox <njcoxstata@gmail.com> |

To |
"statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |

Subject |
Re: st: Verify that all values of a variable are mapped after -label values- |

Date |
Fri, 21 Jun 2013 10:58:03 +0100 |

I support Robert Picard's advocacy of -decode- here. Here is a quick hack at a reporting program: *! 1.0.0 NJC 21 June 2013 program labellacking version 8.2 syntax varlist [if] [in] [, reportnovaluelabel ] quietly { ds `varlist', has(type numeric) local varlist `r(varlist)' marksample touse, novarlist count if `touse' if r(N) == 0 error 2000 } local length = 1 foreach v of local varlist { local length = max(`length', length("`v'")) } local col = `length' + 2 di foreach v of local varlist { if "`: value label `v''" == "" { if "`reportnovaluelabel'" != "" { di "`v'" _c di "{col `col'}(no value label)" } } else { di "`v'" _c tempvar work decode `v', gen(`work') qui levelsof `v' if `touse' & missing(`work'), local(levels) if "`levels'" == "" di "{col `col'}(none)" else di "{col `col'}`levels'" drop `work' } } end No help file, but this is how it works. . sysuse auto, clear (1978 Automobile Data) . label define rep78 1 abysmal 2 adequate . label val rep78 rep78 . labellacking price-foreign rep78 3 4 5 foreign (none) . labellacking price-foreign, reportnovaluelabel price (no value label) mpg (no value label) rep78 3 4 5 headroom (no value label) trunk (no value label) weight (no value label) length (no value label) turn (no value label) displacement (no value label) gear_ratio (no value label) foreign (none) Nick njcoxstata@gmail.com On 21 June 2013 02:09, Robert Picard <picard@netbox.com> wrote: > The -decode- solution is simple and quick, which is what you asked > for. If your dataset is large and your labels are long, use the > maxlength(#) option. Since you don't mind collapsing your original > data, you can also use -decode- on the reduced set of values and the > size of the data won't matter. > > * -------------- begin example ---------------------------- > > clear > set obs 50 > gen s = "this is a lame way to create long labels" > replace s = s + s + s + string(_n) > > encode s, gen(n) > drop s > > * make a large dataset with 1,000,000 observations > > expand 20000 > > * spike the first observations with values that are not > * mapped to a variable label > replace n = _n + 1000 in 1/5 > > * first method > > preserve > > decode n, gen(s) maxlen(4) > > tab n if mi(s) > > * second method, collapsing to unique values of n > > restore > sort n > by n: keep if _n == 1 > count > > decode n, gen(s) > > keep if mi(s) > list, noobs > > * -------------- end example ------------------------------ > > > > On Thu, Jun 20, 2013 at 5:38 PM, Toby Robertson > <toby.robertson@hotmail.com> wrote: >> In the absence of an answer I've come up with this solution to identify numeric values in the data file that are not present in the lookup file from numeric values to strings that I want to use as labels: >> >> use datafile >> collapse (count)N=somevar, by(myvar) >> merge myvar using lookupfile, nokeep keep() unique sort >> >> Which then permits checks like: >> >> tab _m >> list myvar N if _merge==1 >> assert _merge==3 >> >> etc. etc. >> >> In other words, do an empty left join of the lookups onto a table of the values in the data, and see which if any values are not in the lookups. >> >> Of the suggestions offered: >> >> - 'tab myvar, missing' and then eyeballing the data is OK for one-off interactive work but is not what I need in this context >> >> - 'decode myvar' defeats the whole purpose of not introducing long string variables into a very large dataset >> >> - 'labelbook mylabel, problem' seems to detect labels that are not used in the data, but not values in the data that are not present in the labels (and doesn't report what the values are anyway) >> >> >>> From: toby.robertson@hotmail.com >>> To: statalist@hsphsun2.harvard.edu >>> Subject: st: Verify that all values of a variable are mapped after -label values- >>> Date: Thu, 20 Jun 2013 19:48:57 +0000 >>> >>> What is the easiest way to check whether, having applied a value label to a variable... >>> >>> label values myvar mylabel >>> >>> ...every value of myvar in the dataset is mapped to mylabel? >>> >>> I am using Stata 10, creating value labels from lookup files from numeric code variables to string descriptor variables using -labmask- followed by -label save-, and then applying them to the values of the numeric variables in very large datasets. >>> >>> The obvious solution might be to merge the string variable itself from the lookup file into the target file (after which I could check the integrity of the merge, use -labmask-, and drop the string variable again) because I want to avoid that because of file size and read time considerations. >>> >>> Thanks in advance to anyone who knows the answer offhand! >>> >>> Toby >>> * >>> * For searches and help try: >>> * http://www.stata.com/help.cgi?search >>> * http://www.stata.com/support/faqs/resources/statalist-faq/ >>> * http://www.ats.ucla.edu/stat/stata/ >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/faqs/resources/statalist-faq/ >> * http://www.ats.ucla.edu/stat/stata/ > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Verify that all values of a variable are mapped after -label values-***From:*Toby Robertson <toby.robertson@hotmail.com>

**RE: st: Verify that all values of a variable are mapped after -label values-***From:*Toby Robertson <toby.robertson@hotmail.com>

**Re: st: Verify that all values of a variable are mapped after -label values-***From:*Robert Picard <picard@netbox.com>

- Prev by Date:
**Re: st: error codes returned by NL function evaluator program to use dummy variables** - Next by Date:
**st: Subsamples with fairlie method** - Previous by thread:
**Re: st: Verify that all values of a variable are mapped after -label values-** - Next by thread:
**Re: st: Verify that all values of a variable are mapped after -label values-** - Index(es):