Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: how to make xi dummies inherit labels


From   "Nick Winter" <nwinter@policystudies.com>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: how to make xi dummies inherit labels
Date   Thu, 3 Oct 2002 11:10:13 -0400

I remember something a while back about the -macro shift- usage not
scaling well.  I don't' know if that's what's going on here, but
-desmat- does use -macro shift-.

--Nick Winter


-----------------------------------------------------------
 Nicholas Winter, Ph.D.                     P 202.939.5343
 Policy Studies Associates                  F 202.939.5732
 1718 Connecticut Avenue, NW     nwinter@policystudies.com
 Washington, DC 20009-1148           www.policystudies.com
----------------------------------------------------------- 

> -----Original Message-----
> From: Roger Harbord [mailto:Roger.Harbord@bristol.ac.uk] 
> Sent: Thursday, October 03, 2002 11:00 AM
> To: statalist@hsphsun2.harvard.edu
> Subject: RE: st: how to make xi dummies inherit labels
> 
> 
> That syntax works now, thanks.
> Still seem to have this weird speed problem though.  Same 
> thing happens 
> using desmat as a command.  Again only the first time I run 
> desmat on my 
> dataset - even if I subsequently run it on a different 
> variable or drop the 
> _x_* variables it creates.  I don't understand that but then 
> I haven't 
> tried to understand what desmat is doing internally.  I guess 
> it must be 
> storing something extra somewhere.
> 
> Checked that if I -keep- only a few variables the problem 
> goes away.  It 
> may be the problem only occurs with the stupidly large number 
> of variables 
> (over 1000) I have in my dataset (I didn't create it myself and I'm 
> reluctant to spend any time on data management to cut it down).
> 
> This is in fact not the first time I've experienced strange scaling 
> behaviour in the time taken by stata to complete a command.  
> I've been 
> running some power simulations with 10000 simulations of a dataset 
> containing 5-60 records, and found that if I hold the whole 
> lot in memory 
> at once and do something like:
> 
> . forvalues i in 1(1)10000 { regress ... if simulation==i }
> (obviously a bit more to it than that to save the results)
> 
>  - things go *very* slowly - it only seemed to manage about 3 
> regressions a 
> second.  Cutting the 10000 down to 1000 means the command 
> completes not 10 
> times faster, as you might expect, but 100 times faster!  I ended up 
> analysing chunks of the dataset at a time and also using -in- 
> instead of
> -if-.  Now my simulations take an hour or two instead of a day or two.
> 
> I've been meaning to post something on  that for a while but 
> I haven't got 
> time to properly document the problem at the moment..  Just 
> to illustrate 
> that the problem may be more general than -desmat- and could 
> lie in deeper 
> in the internal workings of stata.
> 
> Maybe I really should drop all those variables I don't need and use
> -desmat-. It seems to do what I'm after (and a whole lot 
> more..) I'm sure 
> it would speed everything else up too (though other commands 
> I'm using at 
> present take a few seconds rather than a couple of minutes).
> 
> Roger.
> 
> 
> 
> --On 03 October 2002 06:37 -0700 John Hendrickx 
> <john_hendrickx@yahoo.com> 
> wrote:
> 
> > Hello once again,
> >
> > I've forgoten my own command syntax, it should be:
> >
> > desmat: logistic siweekT2 age10yy2, desrep(exp)
> >
> > There's an example on this in the help file although I 
> suppose you do
> > have to know where to find it.
> >
> > As for the speed problems, I'm mystified. I just tried a 
> dataset with
> > 20375 cases and 238 variables and that was no problem 
> (although I did
> > have to increase matsize and memory). You might want to try 
> desmat as
> > a command, see if that sheds some light on the problem:
> >
> > desmat age10yy2
> > logistic siweekT2 _x_*
> > desrep, exp
> > drop _x_*
> >
> > Of course, if you already have an alternative solution then there's
> > no need to waste any more time, but I'm curious about this speed
> > problem with desmat. Pretty strange.
> >
> > John Hendrickx
> >
> > --- Roger Harbord <Roger.Harbord@bristol.ac.uk> wrote:
> >> Hi John,
> >>
> >> I've just installed the latest version of desmat available on SSC -
> >>
> >> Distribution-Date: 20011111. (I had the STB-61: dm73.3 version
> >> before.)However an -exp- option still doesn't exist:
> >>
> >> . desmat: logistic siweekT2 age10yy2, exp
> >> exp invalid
> >> r(198);
> >>
> >> . which desmat
> >> c:\ado\stbplus\d\desmat.ado
> >> *! version 3.0, 30Mar2001, John_Hendrickx@yahoo.com
> >>
> >> And I'm not including any continuous covariates - only a single
> >> categorical
> >> one with 6 categories at present.  -desmat- takes around 2 minutes
> >> even if
> >> I give an outcome variable that doesn't exist so that all it gives
> >> is an
> >> error message to that effect.  (If given a non-existent covariate
> >> it
> >> complains straight away though.)
> >>
> >> I suppose I could drop all those variables corresponding to
> >> questions that
> >> we're not using (data is results of a survey with a *long*
> >> questionnaire)
> >> but that would be some extra work to create and maintain a 'keep
> >> list' of
> >> variables I'm actually interested in.
> >>
> >> Roger.
> >>
> >>
> >> --On 03 October 2002 04:33 -0700 John Hendrickx
> >> <john_hendrickx@yahoo.com>
> >> wrote:
> >>
> >> > Hi Roger,
> >> >
> >> > -desmat- should add a few seconds to your calculations but two
> >> > minutes is way too much. One explanation might be that a
> >> continuous
> >> > variable wasn't specified as such, then -desmat- will create
> >> dummies
> >> > for all 100+ categories and estimation will take a long time. Let
> >> me
> >> > know if -desmat- really slows things down that much on a large
> >> > dataset, maybe it would be worthwhile to create a lite version.
> >> >
> >> > As for exponential coefficients, use the -exp- option,
> >> >
> >> > desmat: logistic y x, exp
> >> >
> >> > will give the same results as
> >> >
> >> > xi: logistic y i.x
> >> >
> >> > -logistic- prints exponential coefficients but saves them as
> >> > loglinear values.
> >> >
> >> > Good luck,
> >> > John Hendrickx
> >> >
> >> > --- Roger Harbord <Roger.Harbord@bristol.ac.uk> wrote:
> >> >> What I was really after in the end was similar to the output of
> >> >> e.g.
> >> >> . xi: logistic y i.x
> >> >> . reformat, eform
> >> >>
> >> >> - but with the coefficients labelled using the value labels
> >> >> assigned to x.
> >> >> -desmat- does achieve this, but I had a couple of different
> >> >> problems when I
> >> >> tried -desmat-:
> >> >>
> >> >> 1) It takes over 2 minutes to run the first univariable logistic
> >> >> regression
> >> >> with -desmat- on my data , when -xi- is seemingly instant.  May
> >> be
> >> >> connected to the fact that my dataset has 1100 variables (and
> >> 2400
> >> >> observations).  Much quicker subsequently though, even run on
> >> >> different
> >> >> variables.
> >> >>
> >> >> 2) I can't see how to get -desmat- to exponentiate the
> >> coefficients
> >> >> (to
> >> >> give odds ratios with logistic regression) when used as a
> >> command
> >> >> prefix:
> >> >>
> >> >> . desmat: logistic y i.x
> >> >>
> >> >> gives the same output as:
> >> >>
> >> >> . desmat: logit    y i.x
> >> >>
> >> >>  - and there's no -eform- option as there is with -outreg- and
> >> >> -reformat-.
> >> >>
> >> >> Also I think -reformat- or -outreg- give me more flexibility in
> >> >> deciding
> >> >> what I want in the output, so I don't need to do so much work on
> >> >> the output
> >> >> before I present it to my client, which is ultimately my aim.
> >> >>
> >> >> In conclusion i'll probably use Nick's 'canned solution' for
> >> >> transferring
> >> >> value labels to variable labels of dummies, in combination with
> >> >> -reformat-
> >> >> or -outreg-.  But maybe it would be nice if there was an option
> >> for
> >> >> -xi- to
> >> >> tell it to inherit the labels in this way.  Put that on the wish
> >> >> list for
> >> >> Stata 8...
> >> >>
> >> >>
> >> >> Roger.
> >> >> ----------------------------------------------------
> >> >> Roger Harbord     mailto:roger.harbord@bristol.ac.uk
> >> >> Department of Social Medicine, University of Bristol
> >> >>
> >> >>
> >> >>
> >> >> --On 03 October 2002 09:33 +0100 Nick Cox <n.j.cox@durham.ac.uk>
> >> >> wrote:
> >> >>
> >> >> > John Hendrickx
> >> >> >
> >> >> >> -desmat- will do this. Try -ssc describe desmat-
> >> >> >
> >> >> > I tried -desmat- after my posting. I couldn't
> >> >> > see that it did quite this.
> >> >> >
> >> >> > Example:
> >> >> >
> >> >> >
> >> >>
> >> >
> >>
> > 
> ----------------------------------------------------------------------
> >> >> > -------------------------------------
> >> >> >        log:  C:\Stata7\desmat.log
> >> >> >   log type:  text
> >> >> >  opened on:   3 Oct 2002, 09:30:21
> >> >> >
> >> >> > . u auto
> >> >> > (1978 Automobile Data)
> >> >> >
> >> >> > . desmat : regress mpg foreign
> >> >> >
> >> >>
> >> >
> >>
> > 
> ----------------------------------------------------------------------
> >> >> > ---------
> >> >> >    regress
> >> >> >
> >> >>
> >> >
> >>
> > 
> ----------------------------------------------------------------------
> >> >> > ---------
> >> >> > < snip >
> >> >> >
> >> >>
> >> >
> >>
> > 
> ----------------------------------------------------------------------
> >> >> > ---------
> >> >> > nr Effect
> >> >> Coeff
> >> >> > s.e.
> >> >> >
> >> >>
> >> >
> >>
> > 
> ----------------------------------------------------------------------
> >> >> > ---------
> >> >> >    foreign
> >> >> > 1    Foreign
> >> >> 4.946**
> >> >> > 1.362
> >> >> > 2  _cons
> >> >> 19.827**
> >> >> > 0.743
> >> >> >
> >> >>
> >> >
> >>
> > 
> ----------------------------------------------------------------------
> >> >> > ---------
> >> >> > *  p < .05
> >> >> > ** p < .01
> >> >> >
> >> >> > . d _x_1
> >> >> >
> >> >> >               storage  display     value
> >> >> > variable name   type   format      label      variable label
> >> >> >
> >> >>
> >> >
> >>
> > 
> ----------------------------------------------------------------------
> >>
> > === message truncated ===
> >
> >
> > __________________________________________________
> > Do you Yahoo!?
> > New DSL Internet Access from SBC & Yahoo!
> > http://sbc.yahoo.com
> > *
> > *   For searches and help try:
> > *   http://www.stata.com/support/faqs/res/findit.html
> > *   http://www.stata.com/support/statalist/faq
> > *   http://www.ats.ucla.edu/stat/stata/
> 
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
> 
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index