Nick Winter <nwinter@policystudies.com>

statalist@hsphsun2.harvard.edu

RE: st: how to make xi dummies inherit labels

Thu, 3 Oct 2002 11:10:13 -0400

I remember something a while back about the -macro shift- usage not scaling well. I don't' know if that's what's going on here, but -desmat- does use -macro shift-. --Nick Winter ----------------------------------------------------------- Nicholas Winter, Ph.D. P 202.939.5343 Policy Studies Associates F 202.939.5732 1718 Connecticut Avenue, NW nwinter@policystudies.com Washington, DC 20009-1148 www.policystudies.com ----------------------------------------------------------- > -----Original Message----- > From: Roger Harbord [mailto:Roger.Harbord@bristol.ac.uk] > Sent: Thursday, October 03, 2002 11:00 AM > To: statalist@hsphsun2.harvard.edu > Subject: RE: st: how to make xi dummies inherit labels > > > That syntax works now, thanks. > Still seem to have this weird speed problem though. Same > thing happens > using desmat as a command. Again only the first time I run > desmat on my > dataset - even if I subsequently run it on a different > variable or drop the > _x_* variables it creates. I don't understand that but then > I haven't > tried to understand what desmat is doing internally. I guess > it must be > storing something extra somewhere. > > Checked that if I -keep- only a few variables the problem > goes away. It > may be the problem only occurs with the stupidly large number > of variables > (over 1000) I have in my dataset (I didn't create it myself and I'm > reluctant to spend any time on data management to cut it down). > > This is in fact not the first time I've experienced strange scaling > behaviour in the time taken by stata to complete a command. > I've been > running some power simulations with 10000 simulations of a dataset > containing 5-60 records, and found that if I hold the whole > lot in memory > at once and do something like: > > . forvalues i in 1(1)10000 { regress ... if simulation==i } > (obviously a bit more to it than that to save the results) > > - things go *very* slowly - it only seemed to manage about 3 > regressions a > second. Cutting the 10000 down to 1000 means the command > completes not 10 > times faster, as you might expect, but 100 times faster! I ended up > analysing chunks of the dataset at a time and also using -in- > instead of > -if-. Now my simulations take an hour or two instead of a day or two. > > I've been meaning to post something on that for a while but > I haven't got > time to properly document the problem at the moment.. Just > to illustrate > that the problem may be more general than -desmat- and could > lie in deeper > in the internal workings of stata. > > Maybe I really should drop all those variables I don't need and use > -desmat-. It seems to do what I'm after (and a whole lot > more..) I'm sure > it would speed everything else up too (though other commands > I'm using at > present take a few seconds rather than a couple of minutes). > > Roger. > > > > --On 03 October 2002 06:37 -0700 John Hendrickx > <john_hendrickx@yahoo.com> > wrote: > > > Hello once again, > > > > I've forgoten my own command syntax, it should be: > > > > desmat: logistic siweekT2 age10yy2, desrep(exp) > > > > There's an example on this in the help file although I > suppose you do > > have to know where to find it. > > > > As for the speed problems, I'm mystified. I just tried a > dataset with > > 20375 cases and 238 variables and that was no problem > (although I did > > have to increase matsize and memory). You might want to try > desmat as > > a command, see if that sheds some light on the problem: > > > > desmat age10yy2 > > logistic siweekT2 _x_* > > desrep, exp > > drop _x_* > > > > Of course, if you already have an alternative solution then there's > > no need to waste any more time, but I'm curious about this speed > > problem with desmat. Pretty strange. > > > > John Hendrickx > > > > --- Roger Harbord <Roger.Harbord@bristol.ac.uk> wrote: > >> Hi John, > >> > >> I've just installed the latest version of desmat available on SSC - > >> > >> Distribution-Date: 20011111. (I had the STB-61: dm73.3 version > >> before.)However an -exp- option still doesn't exist: > >> > >> . desmat: logistic siweekT2 age10yy2, exp > >> exp invalid > >> r(198); > >> > >> . which desmat > >> c:\ado\stbplus\d\desmat.ado > >> *! version 3.0, 30Mar2001, John_Hendrickx@yahoo.com > >> > >> And I'm not including any continuous covariates - only a single > >> categorical > >> one with 6 categories at present. -desmat- takes around 2 minutes > >> even if > >> I give an outcome variable that doesn't exist so that all it gives > >> is an > >> error message to that effect. (If given a non-existent covariate > >> it > >> complains straight away though.) > >> > >> I suppose I could drop all those variables corresponding to > >> questions that > >> we're not using (data is results of a survey with a *long* > >> questionnaire) > >> but that would be some extra work to create and maintain a 'keep > >> list' of > >> variables I'm actually interested in. > >> > >> Roger. > >> > >> > >> --On 03 October 2002 04:33 -0700 John Hendrickx > >> <john_hendrickx@yahoo.com> > >> wrote: > >> > >> > Hi Roger, > >> > > >> > -desmat- should add a few seconds to your calculations but two > >> > minutes is way too much. One explanation might be that a > >> continuous > >> > variable wasn't specified as such, then -desmat- will create > >> dummies > >> > for all 100+ categories and estimation will take a long time. Let > >> me > >> > know if -desmat- really slows things down that much on a large > >> > dataset, maybe it would be worthwhile to create a lite version. > >> > > >> > As for exponential coefficients, use the -exp- option, > >> > > >> > desmat: logistic y x, exp > >> > > >> > will give the same results as > >> > > >> > xi: logistic y i.x > >> > > >> > -logistic- prints exponential coefficients but saves them as > >> > loglinear values. > >> > > >> > Good luck, > >> > John Hendrickx > >> > > >> > --- Roger Harbord <Roger.Harbord@bristol.ac.uk> wrote: > >> >> What I was really after in the end was similar to the output of > >> >> e.g. > >> >> . xi: logistic y i.x > >> >> . reformat, eform > >> >> > >> >> - but with the coefficients labelled using the value labels > >> >> assigned to x. > >> >> -desmat- does achieve this, but I had a couple of different > >> >> problems when I > >> >> tried -desmat-: > >> >> > >> >> 1) It takes over 2 minutes to run the first univariable logistic > >> >> regression > >> >> with -desmat- on my data , when -xi- is seemingly instant. May > >> be > >> >> connected to the fact that my dataset has 1100 variables (and > >> 2400 > >> >> observations). Much quicker subsequently though, even run on > >> >> different > >> >> variables. > >> >> > >> >> 2) I can't see how to get -desmat- to exponentiate the > >> coefficients > >> >> (to > >> >> give odds ratios with logistic regression) when used as a > >> command > >> >> prefix: > >> >> > >> >> . desmat: logistic y i.x > >> >> > >> >> gives the same output as: > >> >> > >> >> . desmat: logit y i.x > >> >> > >> >> - and there's no -eform- option as there is with -outreg- and > >> >> -reformat-. > >> >> > >> >> Also I think -reformat- or -outreg- give me more flexibility in > >> >> deciding > >> >> what I want in the output, so I don't need to do so much work on > >> >> the output > >> >> before I present it to my client, which is ultimately my aim. > >> >> > >> >> In conclusion i'll probably use Nick's 'canned solution' for > >> >> transferring > >> >> value labels to variable labels of dummies, in combination with > >> >> -reformat- > >> >> or -outreg-. But maybe it would be nice if there was an option > >> for > >> >> -xi- to > >> >> tell it to inherit the labels in this way. Put that on the wish > >> >> list for > >> >> Stata 8... > >> >> > >> >> > >> >> Roger. > >> >> ---------------------------------------------------- > >> >> Roger Harbord mailto:roger.harbord@bristol.ac.uk > >> >> Department of Social Medicine, University of Bristol > >> >> > >> >> > >> >> > >> >> --On 03 October 2002 09:33 +0100 Nick Cox <n.j.cox@durham.ac.uk> > >> >> wrote: > >> >> > >> >> > John Hendrickx > >> >> > > >> >> >> -desmat- will do this. Try -ssc describe desmat- > >> >> > > >> >> > I tried -desmat- after my posting. I couldn't > >> >> > see that it did quite this. > >> >> > > >> >> > Example: > >> >> > > >> >> > > >> >> > >> > > >> > > > ---------------------------------------------------------------------- > >> >> > ------------------------------------- > >> >> > log: C:\Stata7\desmat.log > >> >> > log type: text > >> >> > opened on: 3 Oct 2002, 09:30:21 > >> >> > > >> >> > . u auto > >> >> > (1978 Automobile Data) > >> >> > > >> >> > . desmat : regress mpg foreign > >> >> > > >> >> > >> > > >> > > > ---------------------------------------------------------------------- > >> >> > --------- > >> >> > regress > >> >> > > >> >> > >> > > >> > > > ---------------------------------------------------------------------- > >> >> > --------- > >> >> > < snip > > >> >> > > >> >> > >> > > >> > > > ---------------------------------------------------------------------- > >> >> > --------- > >> >> > nr Effect > >> >> Coeff > >> >> > s.e. > >> >> > > >> >> > >> > > >> > > > ---------------------------------------------------------------------- > >> >> > --------- > >> >> > foreign > >> >> > 1 Foreign > >> >> 4.946** > >> >> > 1.362 > >> >> > 2 _cons > >> >> 19.827** > >> >> > 0.743 > >> >> > > >> >> > >> > > >> > > > ---------------------------------------------------------------------- > >> >> > --------- > >> >> > * p < .05 > >> >> > ** p < .01 > >> >> > > >> >> > . d _x_1 > >> >> > > >> >> > storage display value > >> >> > variable name type format label variable label > >> >> > > >> >> > >> > > >> > > > ---------------------------------------------------------------------- > >> > > === message truncated === > > > > > > __________________________________________________ > > Do you Yahoo!? > > New DSL Internet Access from SBC & Yahoo! > > http://sbc.yahoo.com > > * > > * For searches and help try: > > * http://www.stata.com/support/faqs/res/findit.html > > * http://www.stata.com/support/statalist/faq > > * http://www.ats.ucla.edu/stat/stata/ > > > * > * For searches and help try: > * http://www.stata.com/support/faqs/res/findit.html > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

