Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Splines

From   Nick Cox <>
Subject   Re: st: Splines
Date   Thu, 21 Feb 2013 09:19:36 +0000

Thanks for filling out the details. I've not read that paper. But in
any case I don't know what you mean by "dealing with temporal
dependence". Dependence in time series can mean anything from

dependence in error structure which is regarded as a nuisance or
complication in regression-type models


dependence treated as the main feature by some kind of time series
modelling, such as binary time series modelling or Markov chains.

It seems, however, that what you have in mind something else roughly
in between those extremes.

It seems that this is most likely to be carried forward by people
familiar with the literature now identified. Alternatively, if this is
a widely used method, there should be guides somewhere on how to do it
in Stata.


On Thu, Feb 21, 2013 at 2:16 AM, Marc Peters <> wrote:
> Dear Nick,
> Thank you for your prompt answer. I am very sorry for being imprecise.
> The reference I am talking about is Beck, Nathaniel; Jonathan N. Katz
> and Richard Tucker. 1998. "Taking Time Seriously: Time-Series
> Cross-Section Analysis with a Binary Dependent Variable." American
> Journal of Political Science, 42(4) 1260-1288.
> BTSCS is the word they use for Time-Series Cross-Section Analysis with
> a Binary Dependent Variable. In their article they replicate a study
> of militarized conflict, where a country dyad do or do not have a
> conflict in a given year. As a conflict can persist for a number of
> consecutive years, the data structure is quite similar to mine. Your
> point about lowess is well taken, but if I understand you correctly
> you would not recommend using splines for any analyses with repeated
> events? Would you recommend another strategy for dealing with temporal
> dependence. As I have understood it, a lagged dependent variable is
> insufficient.
> Once again, thank you for your help

On Wed, Feb 20, 2013 at 7:28 PM, Nick Cox <> wrote:

>> You were asked to read the FAQ before posting. That explains that you
>> are asked not to give minimal name (date) references. Also, BTSCS
>> looks to me like jargon from your field. It is difficult not to use
>> jargon on a list like this, but unexplained jargon nevertheless cuts
>> down the number of people who might both read and reply to your posts.
>> In terms of your question, running -lowess- and calling the smooth a
>> spline does not make it a spline. There are many classes of spline,
>> but I doubt that there's any definition that generous.
>> The most common kinds of splines are linear and cubic. -mkspline-
>> creates either kind. My best advice is to read the manual entry on
>> -mkspline- and run through the examples in the help.
>> I can't easily follow what you are trying to do otherwise. If you are
>> saying that your response (dependent variable, in your terms) flips
>> between states of 0 and states of 1, it sounds quite unsuitable for
>> splines. But you seem to be trying to model it as a function of
>> duration, not time; sorry, but you lost on me on that.
>> My bottom line is that -lowess- is _not_ a spline method.

On Thu, Feb 21, 2013 at 1:08 AM, Marc Peters <> wrote:

>>> I have never used splines before and have a rather silly question. I
>>> am running a BTSCS model and have read up on my Beck, Katz and Tucker
>>> (1998) and understood that I should use either temporal dummies or
>>> splines to adjust for temporal dependence.
>>> The data is structured as duration data, with events coded as 1 and
>>> non-events as 0. The dependent variable is measured at discrete
>>> intervals (years) and an event can go on for several years (it often
>>> does).
>>> From the data I have created a variable (duration) counting the number
>>> of years since the last event. The variable is coded as 0 as long as
>>> the event is ongoing.
>>> From this variable I create lowess splines using
>>> lowess Y duration, gen (spline)
>>> and then:
>>> logit Y X spline, cluster(id)
>>> I have understood that this is what you are supposed to do, but since
>>> the spline is defined on the dependent variable the spline variable
>>> always take on a high value when duration=0 (i.e. there is an event).
>>> Consequently, when running the model I receive the following message
>>> when running the command:
>>> spline > .4679623 predicts data perfectly
>>> I would be very grateful if anyone could help me with what it is I am
>>> doing wrong. In the end, I should probably use cubic splines but first
>>> I want to understand the simple principle.
*   For searches and help try:

© Copyright 1996–2016 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index