Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Interpolation [was: Re: st: From: Sadia Khalid ...]

From	David Hoaglin <[email protected]>
To	[email protected]
Subject	Re: Interpolation [was: Re: st: From: Sadia Khalid ...]
Date	Thu, 9 Jan 2014 14:25:13 -0500

In response to Sadia's question about my attitude toward
interpolation, I agree with Nick's discussion.

Whether interpolation is on a firmer basis than extrapolation depends
on the process underlying the data and on our understanding of that
process.  Whenever the data involve missing observations, it is
important to investigate the process(es) that caused those
observations to be missing.  It may be necessary to model the
missing-data process and also the process underlying the non-missing
data.  As I mentioned earlier, issues of missing data have a large
literature.  A number of questions remain open, and no single approach
is appropriate in all situations.  Creating data (imputation) involves
assumptions, some of which are easier to check than others.

I could see interpolation (done in more than one way) as a form of
sensitivity analysis.

David Hoaglin

On Thu, Jan 9, 2014 at 1:22 PM, Nick Cox <[email protected]> wrote:
> [I changed the thread title. Sadia: Giving a sensible title to your
> posts is one of several things you should please note.]
>
> I have some comments on _interpolation_ (usual word). Arguably,
> interpolation (wide sense) means predicting values from neighbouring
> values, while interpolation (narrow sense) means doing that _within
> the range of the data_ and extrapolation means doing that beyond the
> range of the data. This range could be in time, space or with
> reference to any other coordinates.
>
> Interpolation has a centuries-old history but has been over-shadowed
> in recent years within statistical science by much more elaborate
> techniques of imputation, even when interpolation offers a simpler,
> but different, solution to the same problem. (In general,
> interpolation and imputation are not identical problems, however.)
>
> This can be seen in specific terms: official Stata offers only linear
> interpolation through -ipolate-, although other techniques exist and
> some are available in Stata as user-written commands (e.g. -cipolate-,
> -csipolate-, -pchipolate-, -nnipolate- from SSC).
>
> Statistical people are often sceptical about or even hostile to
> interpolation, perhaps on the following grounds.
>
> 1. As David Hoaglin emphasised, it is easy (for naive users) to forget
> that you aren't really producing new and valid data, or even replacing
> old and invalid data, except with guesses. So, it is cautions all the
> way up, including not fooling yourself about how much to believe
> (e.g.) model goodness of fit or significance levels obtained from the
> data, including interpolations.
>
> 2. Interpolation inevitably understates variability to the extent that
> the real but unknown series will usually be rougher than the
> interpolated series. Statistical people in various fields are often
> trained to regard large variance of unknown magnitude as what you have
> to accept but large bias of unknown magnitude as the work of the
> devil.
>
> 3. Interpolation has poor or undefined statistical properties insofar
> as the simplest techniques offer no way of assessing associated error
> and/or are based on naive or poorly defined ideas of generating
> processes. (There are conversely exceptions at more advanced levels,
> e.g within spatial statistics.)
>
> 4. Through some tribal or traditional division of labour,
> interpolation is often taught (very briefly or briskly) under some
> heading such as numerical analysis or mathematical methods for
> scientists or engineers. It is often regarded as too trivial or
> elementary to be worth much attention even there, and is often omitted
> from statistical teaching. Here's a challenge: identify a book or
> course on statistical or data analysis that includes serious coverage
> of interpolation. There are some, but not I think many.
>
> Nevertheless, interpolation remains central to what we do with data.
> It is perhaps worth emphasising that much graphical interpretation
> depends on mental interpolation, for example.
>
> Here are some things that could be done, but according to my patchy
> reading are often not done:
>
> A. Keeping things graphical. A graph of data and interpolation is
> essential to keep track of whether results are plausible or
> trustworthy.
>
> B. Test an interpolation method by assessing its ability to reproduce
> _known_ data.
>
> C. Use two or more interpolation methods to see how far they (dis)agree.
>
> D. Be especially cautious about extrapolation.
>
> Finally, most interpolation methods are easy to understand and
> implement. Thus they are quick to apply. Even when they don't work
> well, you have acquired some suitable caution about your data.
>
> Nick
> [email protected]
>
>
> On 9 January 2014 16:04,  <[email protected]> wrote:
>> Thanks
>>
>> @ David Hoaglin
>>
>>
>> for your valuable comments.
>>
>> Actually I am not filling(extrapolating) all  the missing values. I am
>> only extrapolating the missing values if there are only five or less
>> missing values.
>>
>> if i will not extraploate the data , no of observation will be less.
>>
>> What are your comments on Intrapolation . are people comfortable with
>> intraploation ( filling the in between missing values).
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

References:
- Interpolation [was: Re: st: From: Sadia Khalid ...]
  - From: Nick Cox <[email protected]>

Prev by Date: st: Test command for continuous variables
Next by Date: Re: st: Marginal effects for panel logit correlated random effects estimation
Previous by thread: Interpolation [was: Re: st: From: Sadia Khalid ...]
Next by thread: st: Test command for continuous variables
Index(es):
- Date
- Thread