[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: using srvyset with survival analysis

From   Sharon Minnick <>
To   statalist <" statalist">
Subject   Re: st: using srvyset with survival analysis
Date   Mon, 30 Jul 2007 10:54:36 -0700


Many thanks to Jeff Pitblado for his quick reply to my original query and apologies that it has taken me so long to get back, but I have been traveling. In my original message I explained that I get an error that there are "no observations; stset and subpop() option identify disjoint subsets of the data" when I try to use svy: stcox in Stata10. I have simplified my model to attempt to get svy to work with stcox but I continue to get the "no observations" error as before. Performing the checks that Jeff suggested indicate that there are no missing values so I am at a loss as to what the problem is. My simplified svyset command now includes only 1 level, sampling of individuals but stratified by a zone variable which is constant and nonmissing for all subjects (as reported by stvary and manual checks), but I get the same error with all attempts to svyset:

. svyset id, strata(fixedzone) vce(linearized) singleunit(certainty)

pweight: <none>
VCE: linearized
Single unit: certainty
Strata 1: fixedzone
SU 1: id
FPC 1: <zero>

The stset command looks like:

. stset stdate_b, id(id) failure(anysc) time0(stdate_a) exit(anysc==1 time d(30jun2001)) origin(time stdate_a)

Here is the result from stdes:
. stdes
failure _d: anysc
analysis time _t: (stdate_b-origin)
origin: time stdate_a
exit on or before: anysc==1 time d(30jun2001)
id: id

|-------------- per subject --------------|
Category total mean min median max
no. of subjects 1650
no. of records 4692 2.843636 1 3 10
(first) entry time 0 0 0 0
(final) exit time 569.5158 86 611 860
subjects with gap 3
time on gap if gap 649 216.3333 205 220 224
time at risk 939052 569.1224 86 611 860
failures 143 .0866667 0 0 1

(the gaps resulted from dropping observations in which subjects had moved from their
original locations, 3 of them moved back after some time; the first subjects were enrolled in 1999 and I need to end this period of analysis at June 30, 2001 for a reason).

The result from "svydes agegrp1" indicates that I have a minimum of 111 units per stratum and a minimum of 1 observation included per unit and 0 units omitted and 0 observations with missing data.

As far as I can tell, I don’t have any missing data that would be causing the error,
(and I've temporarily changed the single unit option to certainty to avoid missing data even though I don't think I have any) but here is what happens:

. svy: stcox agegrp1

no observations;
stset and subpop() option identify disjoint subsets of the data

Thanks in advance for any further help!

Sharon Minnick
UC Davis

Jeff Pitblado, StataCorp LP wrote:

Sharon Minnick <> is working with survival data from a
complex survey, and is having trouble with -svy: stcox-:

In Stata10, I have been attempting to analyze my survival data with cox regression while accounting for the sampling design and I always get this error:

the stset ID variable is not nested within the final stage sampling unit

Originally I thought this might be because some of our subjects moved during the study and so they change strata and cluster from where they were originally sampled. But I just tried creating a temporary variable that holds the strata constant for each subject and dropped the cluster variables and I still get the same error.

So my srvyset code is:

svyset _n, strata(staticzone) vce(linearized) singleunit(missing)

and the cox regression code is:

svy: stcox agegrp male

If I change the svyset code to use my subject ID variable instead of _n, then I get this error:

no observations;
stset and subpop() option identify disjoint subsets of the data

which I don't understand since I am not using a subpop option.
The -svy- prefix, when used with -stcox- or -streg-, requires that subjects
with multiple records be contained within the final stage clusters. Thus
subjects are not allowed to belong to more than one cluster.

Although Sharon did not show us the -stset- command she used, it appears that
it looked something like

. stset time, id(subject) failure(failed) ...

where 'time', 'subject', and 'failed' represent the names of Stata variables
Sharon used to -stset- her data.

Given the above -stset-, Sharon should use the 'subject' variable instead of
'_n' to identify the final stage units. If Sharon's has data from a
single-stage survey design, this means that the -svyset- command should look

. svyset subject, strata(staticzone) vce(linearized) singleunit(missing)

Using '_n' implies that the records were sampled in the first stage,
which cannot be true given the above -stset-.

After changing her -svyset- to use the 'subject' variable, Sharon was
presented with the following error message:

no observations;
stset and subpop() option identify disjoint subsets of the data

This indicates that -svy- was left with 'no observations' after removing
observations with missing values and checking for subpopulation

In addition to the -subpop()- option, -svy- uses the following options of
-stset- to identify the subpopulation:


Without Sharon's dataset, we can't say definitively what is going on.
However, comparing the results from the following two commands

. stset
. svydes agegrp male

should indicate how many observations are being dropped because of missing
values. The 'no observations' error message will result if the subpopulation
specification identifies only observations that contain missing values in the
variables of interest (that is: time, subject, failed, staticzone, agegrp, and

* For searches and help try:

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index