Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Extended missing codes for string variables???


From   "Joseph Coveney" <[email protected]>
To   "Statalist" <[email protected]>
Subject   Re: st: Extended missing codes for string variables???
Date   Sat, 8 Nov 2008 01:44:01 +0900

Sergiy Radyakin wrote:

it has just occured to me that string variables do not have extended
missing codes. A colleague of mine argues that this is perfectly fine,
because:
 1) one can use any text to stand for particular situations ("not
applicable", "not responded",...)
 2) for numerical values there are operations defined, which require
that they yield missing values if any argument is missing.

In a situation when I classify, say firms, by first letter of their
name, I will have "Not applicable" and "No response" as instances in
section "N", which is not what I want. Hence every time I deal with
the strings like that is to specifically check for particular string
values (and hence a different data entry operator inevitably chooses a
different coding, the programs become highly oriented/dependent on a
particular dataset), it is also quite tedious and annoying. One
solution I see is to create a masking variable, which for each
observation will have a code with an agreed upon code, e.g. 0=not
applicable; 1= valid observation; 2=applicable, but refused to answer;
3=applicable, but respondent doesn't know; etc.

I don't see this as a good solution, and I wonder, whether there is
any technical possibility to instruct Stata that a particular string
value should be treated as a missing value in some operations. I see
it along the lines:
 char define make[extmiss_a] Not applicable
 char define make[extmiss_b] No response

And later
 gen make_group=substr(make,1,1)
will create empty values for those observations that had "Not
applicable" or "No response"

(however I still want to be able to distinguish between the two in
some cases, like -tabulate-)

What do you think about it? Are there extended missing string codes in
other statistical packages?

--------------------------------------------------------------------------------

I'm not aware of other statistical packages' having extended missing-value
codes for string variables.  The other two packages that have extended
missing-value codes for numerical variables, SAS and SPSS, also are able to
apply value labels to string variables.  But to my knowledge neither of them
has a string-variable analogue of their extended missing-value codes for
numerical variables.

I would approach the task along the lines of setting up alternative sets of
value labels after -encode-, something like what is illustrated below with a
dataset and set of desired missing value codes that are modeled after what
you show.  (One note about efficiency:  usually you'll have a lot of firms
and only a few missing-value codes, and so it would make sense to evert the
nested loops from what I have hastily done below.)  If I were doing this
sort of thing routinely, I would probably take advantage of Stata's class
programming to make life easier.

Joseph Coveney

clear *
set more off


/* Create demonstration dataset */
set obs 4
generate str make = "MyFirm"
replace make = "YourFirm" in 2
replace make = "Not Applicable" in 3
replace make = "No Response" in l


/* Create starting list of value labels */
encode make, generate(encoded_make) label(Makes)


/* Create list of list of desired missing-value labels
  and corresponding extended missing values
  to use (in alphabetical order) */
label define Missings .a "Not Applicable" .b "No Response"


/* Substitute extended missings for current
  value labels' values, and do same for encoded
  variable's integers */

// Create program for substitutions
program define MatchAndSubstitute
   version 10.1
   syntax varname, label_index(integer) ///
     extended_missing(string) ///
     missing_string(string)

   local value_labels : value label `varlist'
   if ("`: label `value_labels' `label_index''" == "`missing_string'") {
       label define `value_labels' `label_index' "", modify
       label define `value_labels' `extended_missing' ///
         "`missing_string'", modify
       quietly replace `varlist' = `extended_missing' ///
         if `varlist' == `label_index'
   }
   else {
       exit 0
   }
end

// Traverse both value label lists, substituting
// where matches are found
local label_index 1
local label_string : label Makes `label_index'
while ( indexnot("`label_string'", "123456789") == 1 ) {
   foreach letter in `c(alpha)' {
       local missing_string : label Missings .`letter'
       if ( "`missing_string'" != ".`letter'" ) {
           MatchAndSubstitute encoded_make, ///
             label_index(`label_index') ///
             extended_missing(.`letter') ///
             missing_string(`missing_string')
       }
       else {
           continue, break
       }
   }
   local ++label_index
   local label_string : label Makes `label_index'
}
label drop Missings

// Results
tabulate encoded_make
tabulate encoded_make, missing


/* Create initial value label list,
  preserving missing value labels */
label copy Makes InitialMakes
local --label_index
forvalues InitialMakes_index = `label_index'(-1)1 {
   local label_string : label InitialMakes `InitialMakes_index'
   if ( regexm("`label_string'", "^[1-9]+") == 0 ) {
       local label_initial = substr("`label_string'", 1, 1)
       label define InitialMakes `InitialMakes_index' ///
         `label_initial', modify
   }
}

// Results
label values encoded_make InitialMakes
tabulate encoded_make
tabulate encoded_make, missing

exit


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index