Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Re: RE: RE: mapping a value from 2 variables


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: Re: RE: RE: mapping a value from 2 variables
Date   Thu, 25 Jul 2002 10:47:31 +0100

Patrick Joly responded to my suggestion:

> > n.j.cox@durham.ac.uk wrote
> > >
> > > I don't know of anything quite like this, but
> > > for once a looping over observations would seem
> > > to solve the problem:
> > >
> > > local N = _N
> > > forval i = 1/`N' {
> > >     local val = naics[`i']
> > >     local label = labelnaics[`i']
> > >     label def naicslab `val' "`label'" , modify
> > > }
> > >
> > > Nick
> > > n.j.cox@durham.ac.uk
> >
> > This will work, no doubt.  The reason I hadn't considered
> -forvalues- for
> > this purpose was that I wanted to avoid looping over
> observations.  Such
> > looping is not so bad with my current data which contains
> approx. 2000
> > observations but may be computationaly intensive and slow
> if I try to
> extend
> > the procedure to situations where labels may take up to
> 65,536 different
> > coding values -- the Stata limit for value labels.  I
> tested the loop on a
> > dataset of 30,000 observations and it took 2 minutes to
> complete, which is
> > not the end of the world for the use I'll make of it.
> >
> > But what escaped me in my own proposed solution below is
> that step 2
> (where
> > I would use -file- to substitute a space-character for
> the first comma)
> > would itself require looping over observations (!).  I
> will probably go
> with
> > Nick's solution as I don't see anything else for now.
> >
> >

and Michael Blasnik then added

> In trying to automate value label creation from a numeric
> and string var,
> Joly.Patrick@ic.gc.ca wants to avoid an explicit loop over
> n for large
> datasets.  Here is one approach that I think works
>
> gen str1 cmd=""
> replace cmd="label define mylab
> "+string(nvar)+char(34)+svar+char(34)+"
> ,modify"
> outsheet cmd using cmd.do, nonames noquote
> drop cmd
>
> Then you can -run cmd-
>
> This approach just builds a string variable that has the
> label define
> commands and uses char(34) to insert quotes.  Of course, if you have
> repetitive values of nvar you should first collapse the
> dataset down to one
> obs for each nvar
>

This is a nice trick, but it still looks a loop
over observations to me. In the code suggested
first

local N = _N
forval i = 1/`N' {
	local val = naics[`i']
	local label = labelnaics[`i']
	label def naicslab `val' "`label'" , modify
}

the problem for Stata includes the overhead of
interpreting and implementing (potentially) thousands of
-label- statements; and this is true for Michael's code too.

The trade-off between

(NJC)
managing a -forvalues- loop
putting individual values into -local-s

and

(MB)
-generate- string variable
-outsheet- a .do file
-run- a .do file

is hard for me to foresee, and it will vary somewhat
between platforms as I/O is entailed. Any advice from
Stata Corp? Anyone interested enough to experiment and report
on timings?

Nick
n.j.cox@durham.ac.uk


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index