Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: RE: Knowing how a variable was generated

From   Nick Cox <[email protected]>
To   "'[email protected]'" <[email protected]>
Subject   RE: st: RE: Knowing how a variable was generated
Date   Mon, 1 Nov 2010 16:26:42 +0000

In reply to Rich Goldstein: 

-defv- (John R. Gleason, STB) was one of the older programs referenced in the help for -labgen- (SSC), referred to in one of my replies. John jumped to R without looking back around 2001, so no update is likely there. 

But fair's fair: -defv- explains itself as a wrapper for -generate- and -replace- and does not purport to work with -egen-. Still less will it, or any other program of this kind, work for variables generated as side-effects of many commands. 

In reply to Uli Kohler: 

Just to spell out for anyone confused that your approach using -notes- is the easiest way to define characteristics, as mentioned earlier in the thread. That is, "using -notes-" and "using characteristics" are particular and general versions of the same strategy. 

[email protected] 

Richard Goldstein

note that there is a program for this, -defv-; use -findit-

however, this program does not work with -egen- and does not work with
-by- (and does not always work with Stata 11 either)

Ulrich Kohler

> In principle it is also possible to store this information as note: 
> . sysuse auto
> . gen x = weight - 1
> . note x: gen x = weight - 1
> . replace x = weight +1
> . note x : replace x = weight + 1
> . note x
> Clearly it is possible to write programms (i.e. -gennote- and
> -replacenote-) that do this automatically. The question however arise
> why someone who is not willing to give away his do-files should use
> these programs when creating a data set ...

Louis Boakye-Yiadom

That's correct. I'm looking at a situation where the do-file is not available. Indeed, often you may have to work with a dataset for which you played no role in the generation of the variables. Thanks.

Nick Cox 

>>> Indeed. But Louis' question, and my
>>> answers, presuppose that was not done. 

Michael McCulloch

>>> Wouldn't it be sufficient to simple record the work in a
>>> do-file that documents the command:
>>>     gen B = (A*C) + D, or
>>>     gen B = A*(C + D)?
>>> On Oct 31, 2010, at 9:46 AM, Nick Cox wrote:
>>>> There are programs that enable users to record
>>> definitions of variables as they generate or replace them.
>>> See e.g. -labgen- from SSC and especially its references. 
>>>> More generally, if users employed variable labels or
>>> characteristics to record the definition of variables --
>>> then your problem is indeed soluble. 
>>>> I didn't imagine that's what you had in mind, as if
>>> you knew that definitions were stored that way it's hard to
>>> see why your question arises. 
Louis Boakye-Yiadom
>>>  Nick, thanks for the reply. I was thinking that if it's
>>> possible for Stata to store information on the generation of
>>> the variable (at least in simple cases), it might be
>>> possible to have this feature in Stata.
Nick Cox
>>>>> In general, no. How could there be? 
>>>>> However, in simple cases for Y calculated somehow
>>> from X,
>>>>> looking at graphs of Y vs  X might give a
>>> clue. 
Louis Boakye-Yiadom
>>>>> If some of the variables in a dataset were
>>> generated by a
>>>>> transformation or combination of some other
>>> variable(s) in
>>>>> the data, is it possible to know this without
>>> seeing the
>>>>> relevant log or do file? For example, consider a
>>> situation
>>>>> where the variables in the data include A, B, C,
>>> and D, and
>>>>> B was generated as follows:
>>>>> B = A*C + D
>>>>> Is there a command for determining how B was
>>> generated?

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index