Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: dropping vars from analysis under conditions

From   Steve Samuels <>
Subject   Re: st: dropping vars from analysis under conditions
Date   Tue, 17 Apr 2012 23:44:24 -0400

Thanks, Richard. You (and Paul) are correct. The only reason to identify the individual is to use replication-based standard errors. Otherwise, standard errors are not based on iid observations but on conditional likelihoods.  I don't know how Katya's events were recorded, But if the measurements were grouped, I still think the -cloglog- approach is preferable.

My comment about time-dependent covariates was unwarranted, as I see that Katya's model has a variable with _tvc (time-varying covariate?) suffix. Katya  was just trying to give us the information she thought we needed to  answer her original question. I customarily like to step back and look at an entire analysis. I went too far here, and I apologize.


On Apr 17, 2012, at 11:16 PM, Richard Williams wrote:

At 06:12 PM 4/17/2012, Steve Samuels wrote:
> I think Maarten  is correct.  Katya is trying for a discrete duration
> analysis, by adding the time intervals "interval2 interval3 interval4
> interval5 interval6 interval7 ".  The logistic model operates
> interval-by-interval.  Her event indicator is zero for all intervals
> except those in which  an event occurred. Although the number of
> observations is expanded, the number of events would not be; so the
> effective amount of information in the data would be unchanged.
> However I don't like Katya's analysis.  There's a lot I don't
> understand, because she did not describe her data well or show us the
> actual command.
> Among the issues:
> 1) she doesn't include a cluster() option, so that standard errors
> are probably incorrect; 2) the parameters of the logistic model are
> not invariant to the choice of intervals; 3) the standard model would
> be a discrete hazard or cumulative log-log model; 4) if she has survey
> data, she is ignoring completely the sample design; 5) a discrete
> hazard model without time-dependent covariates over a long number of
> intervals is of doubtful use to me.

Paul Allison has written a couple of pieces about Discrete Time Methods for the Analysis of Event Histories. e.g. See his 1984 Green Sage Book on "Event History Analysis." I believe he shows the standard errors are correct and you don't need clustering. Being able to conveniently incorporate time-varying covariates is a big advantage of the approach. It also handles right-censoring well. I'm not sure about some of your other concerns, but I am guessing you could use the svy: prefix. My own example discussing this is at

Richard Williams, Notre Dame Dept of Sociology
OFFICE: (574)631-6668, (574)631-6463
HOME:   (574)289-5227
EMAIL:  Richard.A.Williams.5@ND.Edu

*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index