Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: # of Obs. in -stcox- result

 From Steven Samuels To statalist@hsphsun2.harvard.edu Subject Re: st: # of Obs. in -stcox- result Date Wed, 12 Oct 2011 17:24:13 -0400

```Muyang--

The Statalist FAQ, which you were asked to read when you joined the list, state:

"Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly! If you can, reproduce the error with one of Stata's provided datasets or a simple concocted dataset that you include in your posting."

Here what you've typed would include the -stset-, -stdes-, -stcox-, and -reg- commands. What Stata typed would be the results of those commands. When we see these, perhaps we can give more specific answers.

Steve

On Oct 12, 2011, at 3:11 AM, Maarten Buis wrote:

On Wed, Oct 12, 2011 at 5:04 AM, Muyang Zhang wrote:
> This is the regular case in all kinds of analysis. My problem is that
> the number of observations reported by -stset- is smaller than that
> using a linear model with the same set of covariates.

With survival analysis the number of rows in your dataset does not
have to be the same as the number of observations. To be precise the
same person/firm/cow/whatever can appear multiple times. You tell
Stata which rows together form one observation with the -id()- option.
The logic is that survival analysis allows the value on explanatory
variables to change over time, but in order to allow that you need to
have a datastructure in which you can store those changing values. The
multiple rows represent the same observation at different time points,
thus enabling one to record those changing values. Linear regression
on the other hand assumes that every row in your dataset is one
observation. Moreover, to make things more complex, if any covariate
on any row belonging to one observation is missing, the entire
observation, i.e. all rows belonging to that observation, will be
ignored by survival analysis commands, while commands like linear
regression will only ignore the rows with missing values. Another
reason for the difference in the number of observations could be that
survival analysis will ignore negative times or times of zero, while
linear regression has no problem with those. So you really cannot use
linear regression in this context to find out where the dropped
observations in survival analysis come from.

Hope this helps,
Maarten

--------------------------
Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany

http://www.maartenbuis.nl
--------------------------
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```