Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: ambiguity in -if- qualifier


From   Nick Cox <[email protected]>
To   "[email protected]" <[email protected]>
Subject   Re: st: ambiguity in -if- qualifier
Date   Tue, 25 Mar 2014 09:21:36 +0000

I think this example highlights the core of Yu Chen's concern. I
reverse Yu's style and present a plausible example in facetious
manner.

Question. Professor Nobelordie is teaching an advanced econometrics
class, "Testing for heteros{c|k}edasticity under a full moon". He
presents students with a dataset for 1900-2012 but for reasons
compelling to economists tells them to use only data from 1970 on to
build an autoregressive model predicting something of interest.

Students Strict and Weak attempt this problem. Student Strict starts out by

keep if year >= 1970

and then fits her model. Student Weak omits this step but carefully puts

if year >= 1970

on all his statements. They get different results. Explain why, and
apportion blame between

(a) Professor Nobelordie

(b) Student Strict

(c) Student Weak

(d) Stata.

Answer. Student Strict is reasoning "only use data from 1970 on", but
following the -keep- L1. values are not available for 1970 because
1969 is not in the dataset any more, L2 values are not available for
1971 for the same reason, and so on and so forth. Student Weak can use
more data (much more if there are several lagged terms in his model).
Provided they keep and show their code, the discrepancy can be
unearthed and explained.

Professor Nobelordie is guilty of a vague instruction, unless the
point of the question was for students to discover the ambiguity hard
way.

Stata is blameless. It just sits there, trying very hard to do what
it's told. -if- pushes one way, time series operators push another
way.

Nick
[email protected]


On 25 March 2014 00:46, Nick Cox <[email protected]> wrote:
> What the -mvsumm- help calls the "weak" interpretation will always be
> followed unless you intervene afterwards to -replace- values that use
> information outside the -if- restriction (or, equivalently, reduce the
> dataset to the observations selected by -if-).
>
> That's much of the point of those comments! The rest of the point is
> to just to underline that that is what Stata does.
>
>
> Nick
> [email protected]
>
>
> On 24 March 2014 23:01, Yu Chen, PhD <[email protected]> wrote:
>> Hi, Nick,
>> Thank you very much for the explanation. You mentioned in the Remarks
>> of -mvsumm- (SSC) that there are possibly two interpretations: a weak
>> interpretation and a strong interpretation. You chose to use the weak
>> interpretation in developing the -mvsumm-.
>> Do you know whether such weak interpretation is consistently followed
>> by Stata in developing its official commands? If some official
>> commands employ the weak interpretation, but others employ the strong
>> interpretation, that will be a potential trap for those unaware of the
>> distinction.
>> Thank you.
>>
>> Yu
>>
>>
>>
>> On Mon, Mar 24, 2014 at 12:06 PM, Nick Cox <[email protected]> wrote:
>>> The reason for your puzzlement is becoming much clearer, so thanks for
>>> providing an example that can be discussed.
>>>
>>> Note, however, that your initial word description -- in your first
>>> paragraph -- does not fully match your code example, as your code
>>> example bites for a quite specific reason, which only the code makes
>>> clear.
>>>
>>> Naturally, Stata can calculate the previous value of a time series if
>>> the previous observation is present in the dataset, but not otherwise.
>>> (Similar remarks apply to the effects of any time series operator or
>>> subscripting where such imply reaching outside the observations
>>> selected by -if-.)
>>>
>>> Said differently, -if- selects observations to be used, but neither
>>> the -if- qualifier nor any other part of the syntax is thereby
>>> prohibited from invoking information in the other part of the data set
>>> whenever -if- selects a strict subset.
>>>
>>> But the problem here is not that Stata is being ambiguous, or
>>> inconsistent, or incorrect, but that users need to ask for what they
>>> want and want what they ask for.
>>>
>>> In your example, which we can all agree to be frivolous, you in effect
>>> carry out a regression on part of a panel and **part of what you
>>> calculate depends on values outside the data used**. That's at best
>>> dubious and at worst meaningless, but either way the decision to do
>>> that is yours, not Stata's.
>>>
>>> Otherwise put, it's your code that says "use lagged values for part of
>>> the data" and Stata does what it is told to the best of its ability.
>>> It's a robot and you are its instructor, in this example at least.
>>>
>>> I agree with you that people need to think about cases like this.
>>> Indeed, if you look at the help file for -mvsumm- (SSC) you will see
>>> "Remarks" written (by me, as it happens) on this very point in 2005.
>>>
>>> There are many other examples. Here is another.
>>>
>>> sysuse auto , clear
>>>
>>> gen mpg2 = mpg/_N if foreign
>>>
>>> keep if foreign
>>> gen mpg3 = mpg/_N
>>>
>>> -mpg2- and -mpg3- are quite different, as _N is the number of
>>> observations in the current dataset.
>>>
>>> The only clear rule needed here is to ask for exactly what you want.
>>>
>>> Nick
>>> [email protected]
>>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index