Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: how to make xi dummies inherit labels


From   Roger Harbord <Roger.Harbord@bristol.ac.uk>
To   statalist@hsphsun2.harvard.edu
Subject   RE: st: how to make xi dummies inherit labels
Date   Thu, 03 Oct 2002 16:00:09 +0100

That syntax works now, thanks.
Still seem to have this weird speed problem though. Same thing happens using desmat as a command. Again only the first time I run desmat on my dataset - even if I subsequently run it on a different variable or drop the _x_* variables it creates. I don't understand that but then I haven't tried to understand what desmat is doing internally. I guess it must be storing something extra somewhere.

Checked that if I -keep- only a few variables the problem goes away. It may be the problem only occurs with the stupidly large number of variables (over 1000) I have in my dataset (I didn't create it myself and I'm reluctant to spend any time on data management to cut it down).

This is in fact not the first time I've experienced strange scaling behaviour in the time taken by stata to complete a command. I've been running some power simulations with 10000 simulations of a dataset containing 5-60 records, and found that if I hold the whole lot in memory at once and do something like:

. forvalues i in 1(1)10000 { regress ... if simulation==i }
(obviously a bit more to it than that to save the results)

- things go *very* slowly - it only seemed to manage about 3 regressions a second. Cutting the 10000 down to 1000 means the command completes not 10 times faster, as you might expect, but 100 times faster! I ended up analysing chunks of the dataset at a time and also using -in- instead of
-if-. Now my simulations take an hour or two instead of a day or two.

I've been meaning to post something on that for a while but I haven't got time to properly document the problem at the moment.. Just to illustrate that the problem may be more general than -desmat- and could lie in deeper in the internal workings of stata.

Maybe I really should drop all those variables I don't need and use
-desmat-. It seems to do what I'm after (and a whole lot more..) I'm sure it would speed everything else up too (though other commands I'm using at present take a few seconds rather than a couple of minutes).

Roger.



--On 03 October 2002 06:37 -0700 John Hendrickx <john_hendrickx@yahoo.com> wrote:


Hello once again,

I've forgoten my own command syntax, it should be:

desmat: logistic siweekT2 age10yy2, desrep(exp)

There's an example on this in the help file although I suppose you do
have to know where to find it.

As for the speed problems, I'm mystified. I just tried a dataset with
20375 cases and 238 variables and that was no problem (although I did
have to increase matsize and memory). You might want to try desmat as
a command, see if that sheds some light on the problem:

desmat age10yy2
logistic siweekT2 _x_*
desrep, exp
drop _x_*

Of course, if you already have an alternative solution then there's
no need to waste any more time, but I'm curious about this speed
problem with desmat. Pretty strange.

John Hendrickx

--- Roger Harbord <Roger.Harbord@bristol.ac.uk> wrote:
Hi John,

I've just installed the latest version of desmat available on SSC -

Distribution-Date: 20011111. (I had the STB-61: dm73.3 version
before.)However an -exp- option still doesn't exist:

. desmat: logistic siweekT2 age10yy2, exp
exp invalid
r(198);

. which desmat
c:\ado\stbplus\d\desmat.ado
*! version 3.0, 30Mar2001, John_Hendrickx@yahoo.com

And I'm not including any continuous covariates - only a single
categorical
one with 6 categories at present.  -desmat- takes around 2 minutes
even if
I give an outcome variable that doesn't exist so that all it gives
is an
error message to that effect.  (If given a non-existent covariate
it
complains straight away though.)

I suppose I could drop all those variables corresponding to
questions that
we're not using (data is results of a survey with a *long*
questionnaire)
but that would be some extra work to create and maintain a 'keep
list' of
variables I'm actually interested in.

Roger.


--On 03 October 2002 04:33 -0700 John Hendrickx
<john_hendrickx@yahoo.com>
wrote:

> Hi Roger,
>
> -desmat- should add a few seconds to your calculations but two
> minutes is way too much. One explanation might be that a
continuous
> variable wasn't specified as such, then -desmat- will create
dummies
> for all 100+ categories and estimation will take a long time. Let
me
> know if -desmat- really slows things down that much on a large
> dataset, maybe it would be worthwhile to create a lite version.
>
> As for exponential coefficients, use the -exp- option,
>
> desmat: logistic y x, exp
>
> will give the same results as
>
> xi: logistic y i.x
>
> -logistic- prints exponential coefficients but saves them as
> loglinear values.
>
> Good luck,
> John Hendrickx
>
> --- Roger Harbord <Roger.Harbord@bristol.ac.uk> wrote:
>> What I was really after in the end was similar to the output of
>> e.g.
>> . xi: logistic y i.x
>> . reformat, eform
>>
>> - but with the coefficients labelled using the value labels
>> assigned to x.
>> -desmat- does achieve this, but I had a couple of different
>> problems when I
>> tried -desmat-:
>>
>> 1) It takes over 2 minutes to run the first univariable logistic
>> regression
>> with -desmat- on my data , when -xi- is seemingly instant.  May
be
>> connected to the fact that my dataset has 1100 variables (and
2400
>> observations).  Much quicker subsequently though, even run on
>> different
>> variables.
>>
>> 2) I can't see how to get -desmat- to exponentiate the
coefficients
>> (to
>> give odds ratios with logistic regression) when used as a
command
>> prefix:
>>
>> . desmat: logistic y i.x
>>
>> gives the same output as:
>>
>> . desmat: logit    y i.x
>>
>>  - and there's no -eform- option as there is with -outreg- and
>> -reformat-.
>>
>> Also I think -reformat- or -outreg- give me more flexibility in
>> deciding
>> what I want in the output, so I don't need to do so much work on
>> the output
>> before I present it to my client, which is ultimately my aim.
>>
>> In conclusion i'll probably use Nick's 'canned solution' for
>> transferring
>> value labels to variable labels of dummies, in combination with
>> -reformat-
>> or -outreg-.  But maybe it would be nice if there was an option
for
>> -xi- to
>> tell it to inherit the labels in this way.  Put that on the wish
>> list for
>> Stata 8...
>>
>>
>> Roger.
>> ----------------------------------------------------
>> Roger Harbord     mailto:roger.harbord@bristol.ac.uk
>> Department of Social Medicine, University of Bristol
>>
>>
>>
>> --On 03 October 2002 09:33 +0100 Nick Cox <n.j.cox@durham.ac.uk>
>> wrote:
>>
>> > John Hendrickx
>> >
>> >> -desmat- will do this. Try -ssc describe desmat-
>> >
>> > I tried -desmat- after my posting. I couldn't
>> > see that it did quite this.
>> >
>> > Example:
>> >
>> >
>>
>

----------------------------------------------------------------------
>> > -------------------------------------
>> >        log:  C:\Stata7\desmat.log
>> >   log type:  text
>> >  opened on:   3 Oct 2002, 09:30:21
>> >
>> > . u auto
>> > (1978 Automobile Data)
>> >
>> > . desmat : regress mpg foreign
>> >
>>
>

----------------------------------------------------------------------
>> > ---------
>> >    regress
>> >
>>
>

----------------------------------------------------------------------
>> > ---------
>> > < snip >
>> >
>>
>

----------------------------------------------------------------------
>> > ---------
>> > nr Effect
>> Coeff
>> > s.e.
>> >
>>
>

----------------------------------------------------------------------
>> > ---------
>> >    foreign
>> > 1    Foreign
>> 4.946**
>> > 1.362
>> > 2  _cons
>> 19.827**
>> > 0.743
>> >
>>
>

----------------------------------------------------------------------
>> > ---------
>> > *  p < .05
>> > ** p < .01
>> >
>> > . d _x_1
>> >
>> >               storage  display     value
>> > variable name   type   format      label      variable label
>> >
>>
>

----------------------------------------------------------------------

=== message truncated ===


__________________________________________________
Do you Yahoo!?
New DSL Internet Access from SBC & Yahoo!
http://sbc.yahoo.com
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index