Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | "Yu Chen, PhD" <profyuchen@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: ambiguity in -if- qualifier |
Date | Sat, 22 Mar 2014 19:44:57 -0500 |
Hi, Nick, Let me clarify. For any assignment to a new variable, there are two steps. Step 1, the expression should be evaluated; and Step2, the result of the evaluation is assigned to the new variable. My question is, what is the sample used in each step? For -generate-, Step 1 uses the full sample. In other words, all observations, regardless whether they meet the -if- condition, can be used. But in Step 2, -generate- uses the subsample that meets the -if- condition. However, there may exist such commands that use a subsample in Step 1. In other words, before the command does any thing, the sample is reduced according to the -if- condition, so all other activities that the command is going to do are on this reduced sample. It seems to me that most commands work this way. But I found that -generate- is an exception. It does not restrict the sample until the last step. I think this is a little confusing. At least, there is no consistency in when to restrict the sample. Thank you. On Sat, Mar 22, 2014 at 6:45 PM, Nick Cox <njcoxstata@gmail.com> wrote: > I don't think the one precise example here is puzzling in any sense. > Previous values of -mpg- are put in a new variable if and only > -foreign- is 1. This is calculated observation by observation. > > You allude to different behaviour with -egen-. But the help for -egen- explains > > "Explicit subscripting (using _N and _n), which is commonly used with > generate, should not be used with egen; see subscripting." > > That may illuminate your puzzlement. > > Nick > njcoxstata@gmail.com > > > On 22 March 2014 21:26, Yu Chen, PhD <profyuchen@gmail.com> wrote: >> I think there is some ambiguity in the meaning and usage of the -if- >> qualifier. Generally, the command is performed on a subset that meets >> the -if- condition. However, a command may perform many tasks, and the >> subset for each task is not clear sometimes. For example, for the >> -generate- command, it seems to calculate the result of the expression >> on the full sample first, and then that result is assigned to a >> subsample that meets the -if- condition. However, for the -egen- >> command, the calculation is performed on a subset that meets the -if- >> condition, not the full sample, and then that result is assigned to >> the new variable on that subsample. >> >> For example, see the code below. >> >> sysuse auto >> gen mpg2=mpg[_n-1] if foreign==1 >> >> Notice that observation number 53 has a value of 24 for mpg2. This >> indicates that the task of taking a lagged value is performed on the >> full sample first. Otherwise, this value should be missing. But -egen- >> works differently. >> >> There may exist other cases that have similar ambiguities. I would >> suggest that Stata have a clear rule to address this issue. If the >> rule is already out there, please tell me. >> Thank you very much. >> >> Yu Chen >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/faqs/resources/statalist-faq/ >> * http://www.ats.ucla.edu/stat/stata/ > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ -- Yu Chen, Ph.D. Assistant Professor of Accounting A. R. Sanchez, Jr. School of Business, WHTC 218D Texas A&M International University 5201 University Boulevard Laredo, Texas 78041-1900 USA 956-326-2513 (office) 956-326-2479 (fax) * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/