Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: # of Obs. in -stcox- result

From   Steven Samuels <>
Subject   Re: st: # of Obs. in -stcox- result
Date   Wed, 12 Oct 2011 17:24:13 -0400


The Statalist FAQ, which you were asked to read when you joined the list, state:

"Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly! If you can, reproduce the error with one of Stata's provided datasets or a simple concocted dataset that you include in your posting."

Here what you've typed would include the -stset-, -stdes-, -stcox-, and -reg- commands. What Stata typed would be the results of those commands. When we see these, perhaps we can give more specific answers.


On Oct 12, 2011, at 3:11 AM, Maarten Buis wrote:

On Wed, Oct 12, 2011 at 5:04 AM, Muyang Zhang wrote:
> This is the regular case in all kinds of analysis. My problem is that
> the number of observations reported by -stset- is smaller than that
> using a linear model with the same set of covariates.

With survival analysis the number of rows in your dataset does not
have to be the same as the number of observations. To be precise the
same person/firm/cow/whatever can appear multiple times. You tell
Stata which rows together form one observation with the -id()- option.
The logic is that survival analysis allows the value on explanatory
variables to change over time, but in order to allow that you need to
have a datastructure in which you can store those changing values. The
multiple rows represent the same observation at different time points,
thus enabling one to record those changing values. Linear regression
on the other hand assumes that every row in your dataset is one
observation. Moreover, to make things more complex, if any covariate
on any row belonging to one observation is missing, the entire
observation, i.e. all rows belonging to that observation, will be
ignored by survival analysis commands, while commands like linear
regression will only ignore the rows with missing values. Another
reason for the difference in the number of observations could be that
survival analysis will ignore negative times or times of zero, while
linear regression has no problem with those. So you really cannot use
linear regression in this context to find out where the dropped
observations in survival analysis come from.

Hope this helps,

Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index