Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: ambiguity in -if- qualifier


From   Joe Canner <[email protected]>
To   "[email protected]" <[email protected]>
Subject   RE: st: ambiguity in -if- qualifier
Date   Sun, 23 Mar 2014 01:31:12 +0000

Yu,

I think I understand what you're asking and perhaps I can explain it in a different way that might be helpful.

Think about what the purpose of the -generate- command is.  As per the documentation the purpose of -generate- is to "create a new variable".  If there is an -if- qualifier this variable is only created for observations included in the -if- condition. (Well, technically it is created for all observations, but it is missing for every observation not in the -if- condition.) The fact that Stata has to do some calculations to put something into the new variable is irrelevant.  From the standpoint of the -generate- statement it is going to create a variable and put values in it for every observation in the -if- condition, regardless of what it has to do to achieve that goal.

I would also point out that you can't say that Stata is evaluating the right hand side of a -generate- statement on the entire data set.  -generate- is a built-in command, so I can't say for sure, either, but I doubt that this is what it does, as that would be very inefficient.  As implied above, I suspect that Stata identifies which observations it needs to use and then only attempts to assign values for those observations.  If Stata needs to go outside of the -if- condition to do that, so be it.

Regards,
Joe Canner
Johns Hopkins University School of Medicine
________________________________________
From: [email protected] [[email protected]] on behalf of Nick Cox [[email protected]]
Sent: Saturday, March 22, 2014 9:09 PM
To: [email protected]
Subject: Re: st: ambiguity in -if- qualifier

Comments below.

Nick
[email protected]


On 23 March 2014 00:44, Yu Chen, PhD <[email protected]> wrote:
> Hi, Nick,
> Let me clarify. For any assignment to a new variable, there are two
> steps. Step 1, the expression should be evaluated; and Step2, the
> result of the evaluation is assigned to the new variable. My question
> is, what is the sample used in each step?
> For -generate-, Step 1 uses the full sample. In other words, all
> observations, regardless whether they meet the -if- condition, can be
> used. But in Step 2, -generate- uses the subsample that meets the -if-
> condition.

I don't think this word treatment helps understanding. In your
-generate- example two things are happening simultaneously:

A. Stata is being instructed to put previous values of -mpg- in a new variable.

B. Stata is being instructed to do that only if -foreign- is 1.

You are surmising that A is done in a Step 1, which is followed by B
in a Step 2. But it makes just as much sense  to imagine that Stata
works out that the variable should receive non-missing values only
when -foreign- is 1 and then works out what they should be. EIther
way, the result is the same.

> However, there may exist such commands that use a subsample in Step 1.
> In other words, before the command does any thing, the sample is
> reduced according to the -if- condition, so all other activities that
> the command is going to do are on this reduced sample. It seems to me
> that most commands work this way. But I found that -generate- is an
> exception. It does not restrict the sample until the last step.
> I think this is a little confusing. At least, there is no consistency
> in when to restrict the sample.
> Thank you.

Sorry, but I don't catch your meaning here at all. You've presumably
withdrawn your claim about -egen-, so you seem to be offering
speculation, but no examples that anyone  else can discuss.

> On Sat, Mar 22, 2014 at 6:45 PM, Nick Cox <[email protected]> wrote:
>> I don't think the one precise example here is puzzling in any sense.
>> Previous values of -mpg- are put in a new variable if and only
>> -foreign- is 1. This is calculated observation by observation.
>>
>> You allude to different behaviour with -egen-. But the help for -egen- explains
>>
>> "Explicit subscripting (using _N and _n), which is commonly used with
>>     generate, should not be used with egen; see subscripting."
>>
>> That may illuminate your puzzlement.
>>
>> Nick
>> [email protected]
>>
>>
>> On 22 March 2014 21:26, Yu Chen, PhD <[email protected]> wrote:
>>> I think there is some ambiguity in the meaning and usage of the -if-
>>> qualifier. Generally, the command is performed on a subset that meets
>>> the -if- condition. However, a command may perform many tasks, and the
>>> subset for each task is not clear sometimes. For example, for the
>>> -generate- command, it seems to calculate the result of the expression
>>> on the full sample first, and then that result is assigned to a
>>> subsample that meets the -if- condition. However, for the -egen-
>>> command, the calculation is performed on a subset that meets the -if-
>>> condition, not the full sample, and then that result is assigned to
>>> the new variable on that subsample.
>>>
>>> For example, see the code below.
>>>
>>> sysuse auto
>>> gen mpg2=mpg[_n-1] if foreign==1
>>>
>>> Notice that observation number 53 has a value of 24 for mpg2. This
>>> indicates that the task of taking a lagged value is performed on the
>>> full sample first. Otherwise, this value should be missing. But -egen-
>>> works differently.
>>>
>>> There may exist other cases that have similar ambiguities. I would
>>> suggest that Stata have a clear rule to address this issue. If the
>>> rule is already out there, please tell me.
>>> Thank you very much.
>>>
>>> Yu Chen
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>> *   http://www.ats.ucla.edu/stat/stata/
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
>
>
>
> --
> Yu Chen, Ph.D.
> Assistant Professor of Accounting
> A. R. Sanchez, Jr. School of Business, WHTC 218D
> Texas A&M International University
> 5201 University Boulevard
> Laredo, Texas 78041-1900
> USA
> 956-326-2513 (office)
> 956-326-2479 (fax)
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index