[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
Roger Newson <roger.newson@kcl.ac.uk> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
st: Re: How do you drop the variable -(e)- from the data? |

Date |
Wed, 05 Nov 2003 20:49:57 +0000 |

Hello All

Thanks to Bill Gould for explaining about the variable "(e)". This is very helpful.

The reason I wanted to drop "(e)" was that it was causing confusing output to come from my -xcollapse- and -xcontract- packages, downloadable from SSC or from my website. These are extended versions of -collapse- and -contract-, respectively, and create output data sets that can be listed and/or saved to disk and/or saved to memory (overwriting any existing data). If the user types, in Stata 8,

sysuse auto, clear

regress mpg weight

xcontract foreign rep78, list(*, sepby(foreign))

then -xcontract- lists the frequencies of each combination of -foreign- and -rep78-, both as numbers and as a percentage of all cars, and also lists the variable "(e)", which could be confusing for naive users. It is helpful to know that this will be fixed, and that, in the meantime, "(e)" will not be saved to the output data set produced by the -saving()- option of -xcontract- or -xcollapse-.

Thanks also to Nick Cox for his comments, and for taking the time to do some further experiments with "(e)".

Best wishes

Roger

At 09:37 05/11/03 -0600, you wrote:

Roger Newson <roger.newson@kcl.ac.uk> noticed that if, after estimation,

he types -list *-, in addition to all the expected variables, a variable

named "(e)" also appears in the output. He writes,

> I am having a problem with the variable whose name is (e), which appears to

> be generated whenever an estimation command is executed, and which contains

> the results of the function -e(sample)-.

It is a bug that Roger ever saw the variable "(e)", so let me explain:

1. Roger is right: Variable "(e)" has to do with e(sample) and, in

fact, is e(sample).

2. The existance of variable "(e)" was supposed to be completely hidden.

Had we done that right, I would not now be writing this email.

3. There is no bug except that Roger saw the variable "(e)" (and

found some other ways to access it).

So we will fix that bug but, until we do, it is not a bug that should bother

anybody.

For those who are curious, here is what "(e)" is about:

T1. When you run an estimation command, Stata needs to store e(sample) --

the function that identifies which observations were used. That

information is stored in the dataset in the secret variable named

"(e)".

T2. The name "(e)" (note the parens) was chosen carefully to be an

invalid name. It should not surprise you that inside Stata, we have

the ability to create variables named anything we want. We chose an

invalid name so that it would never conflict with a valid name a user

might want to create. In addition, an invalid name would be rejected

by the parser and so make it more difficult that any user would ever

discover the secret variable.

T3. When you -save- a datwaset, variable "(e)" is *NOT* stored in the

dataset. Stata knows to skip that variable. More correctly, variable

"(e)" is not stored unless you specify -save-'s -all- option. As it

says in the on-line help, "-all- is for use by programmers. If

specified, e(sample) will be saved with the dataset. You could run a

regression, -save mydata, all-, -use mydata-, and -predict yhat if

e(sample)-.

T4. The variable "(e)" is dropped (1) whenever a new estimation command

is run (in which case a new "(e)" is created), and (2) whenever

you type -discard- (which eliminates previous estimation results),

and (3) whenever a -drop- command results in a dataset that contains

only "(e)".

So what happened? Where did we go wrong? In fact, "(e)" has been in Stata

for sometime without anyone knowing, but when we added fancier pattern

matching for varlists (so that you can type things like "*e*", something that

used not to be allowed), we forgot to exclude "(e)", and that opened to the

door to Roger's discovery.

It was just as Nick Cox <n.j.cox@durham.ac.uk> suspected: "This raises the

question of whether it's been there for ages, or it's only recently become

visible as a result of some other change in Stata."

-- Bill

wgould@stata.com

*

* For searches and help try:

* http://www.stata.com/support/faqs/res/findit.html

* http://www.stata.com/support/statalist/faq

* http://www.ats.ucla.edu/stat/stata/

-- Roger Newson Lecturer in Medical Statistics Department of Public Health Sciences King's College London 5th Floor, Capital House 42 Weston Street London SE1 3QD United Kingdom Tel: 020 7848 6648 International +44 20 7848 6648 Fax: 020 7848 6620 International +44 20 7848 6620 or 020 7848 6605 International +44 20 7848 6605 Email: roger.newson@kcl.ac.uk Website: http://www.kcl-phs.org.uk/rogernewson Opinions expressed are those of the author, not the institution. * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**st: RE: Re: How do you drop the variable -(e)- from the data?***From:*"Nick Cox" <n.j.cox@durham.ac.uk>

**References**:**Re: st: How do you drop the variable -(e)- from the data?***From:*wgould@stata.com (William Gould, Stata)

- Prev by Date:
**Re: re: st: permutation, strata() for empirical statistic distributions** - Next by Date:
**Re: st: k-sample tests for differences in proportions** - Previous by thread:
**Re: st: How do you drop the variable -(e)- from the data?** - Next by thread:
**st: RE: Re: How do you drop the variable -(e)- from the data?** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |