Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: RE: mapping a value from 2 variables


From   Joly.Patrick@ic.gc.ca
To   statalist@hsphsun2.harvard.edu
Subject   st: RE: RE: mapping a value from 2 variables
Date   Fri, 26 Jul 2002 07:29:55 -0400

To recap this thread:  In response to my posting where I expressed my wish
to avoid explicit looping over observations to create a value label mapping
between a numeric variable and a string variable, Michael Blasnik suggested
the following approach:

Michael Blasnik wrote
[...]
>
> gen str1 cmd=""
> replace cmd="label define mylab
> "+string(nvar)+char(34)+svar+char(34)+"
> ,modify"
> outsheet cmd using cmd.do, nonames noquote
> drop cmd
>
> Then you can -run cmd-
>
> This approach just builds a string variable that has the label define
> commands and uses char(34) to insert quotes.  Of course, if you have
> repetitive values of nvar you should first collapse the
> dataset down to one
> obs for each nvar

Nick Cox had previously suggested
> > > I don't know of anything quite like this, but
> > > for once a looping over observations would seem
> > > to solve the problem:
> > >
> > > local N = _N
> > > forval i = 1/`N' {
> > >     local val = naics[`i']
> > >     local label = labelnaics[`i']
> > >     label def naicslab `val' "`label'" , modify
> > > }
> > >
> > > Nick
> > > n.j.cox@durham.ac.uk
> >


Nick already commented on Michael's solution, stating that although the
latter avoids explicit looping over observations, it nevertheless requires
converting a numeric variable to a string, something which Stata must
accomplish one observation at a time.

Michael's line of attack is to concatenate the -label define # "..."- into a
single variable which may then be sent to a text file via -outsheet-.  It
works.  However, my first reaction to it was that, given (Intercooled)
Stata's 80-character limit for string variables, the line

   replace cmd="label define mylab "+string(nvar)+  /*
         */     char(34)+svar+char(34)+",modify"

will impose a significant constraint on the length of the value label
(contained in variable _svar_).

I then realised that by saving the data in tab-delimited format as Michael
did, we can avoid the concatenation altogether since Stata doesn't really
complain when it encounters a tab character in a .do file.  The only
drawback I see so far is that we lose 4-character places for the value label
since we must add compound double quotes to the string variable.

To compare the relative efficiency of both solutions, I put together two
separate routines, one using the -forvalue- approach, the other using the
-outsheet- line.  I then compared their relative performance (see table
below).

*! maplab1: define label using the mapping between two variables
program define maplab1
      syntax varlist(min=2 max=2), [ labname(str) ]
      tokenize `varlist'
      cap confirm numeric var `1'
      if _rc {
            di as err "`1' must be a numeric variable"
            exit 198
      }
      cap confirm str var `2'
      if _rc {
            di as err "`2' must be a string variable"
            exit 198
      }
      if "`labname'"=="" { local labname `1' }
      else {
            local wc : word count `labname'
            if `wc'!=1 {
                  di as err "labname() invalid"
                  exit 198
            }
      }
      tempfile labfile
      tempvar labdef labnam strvar mod
      qui {
            gen str8 `labdef' = "lab def "
            gen str1 `labnam' = ""
            replace  `labnam' = "`labname'"
            gen str1 `strvar' = ""
            replace  `strvar' = char(96)+char(34)+  /*
              */  substr(`2',1,76)+char(34)+char(39)
            gen str7  `mod'   = ",modify"
      }
      order `labdef' `labnam' `1' `strvar' `mod'
      outsheet `labdef' `labnam' `1' `strvar' `mod' /*
          */   using `labfile', nonames noquote
      run `labfile'
end


*! maplab2: define label using the mapping between two variables
program define maplab2
      syntax varlist(min=2 max=2), [ labname(str) ]
      tokenize `varlist'
      cap confirm numeric var `1'
      if _rc {
            di as err "`1' must be a numeric variable"
            exit 198
      }
      cap confirm str var `2'
      if _rc {
            di as err "`2' must be a string variable"
            exit 198
      }
      if "`labname'"=="" { local labname `1' }
      else {
            local wc : word count `labname'
            if `wc'!=1 {
                  di as err "labname() invalid"
                  exit 198
            }
      }
      local N = _N
      forval i = 1/`N' {
            local val = naics[`i']
            local label = labelnaics[`i']
            label def naicslab `val' "`label'" , modify
      }
end

maplab2 turned out to be much faster!  It's running the .do file in in
maplab1 which is time consuming.  Here's a quick tabulation

                  seconds elapsed
                    ----------
obs ('000s)    maplab1     maplab2
------------+----------------------
     1      |     .3         .2
     3      |    1.1        1.3
     5      |    4.7        2.7
     7      |   11.6        4.2
     9      |   21.8        5.6
    30      |  300.3       21.2

where we have a different value label for each observation.

I'll go with maplab2 then.  Thanks all.


Pat Joly
joly.patrick@ic.gc.ca
pat.joly@utoronto.ca
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index