[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Gene-incidence question/simulation

From   Austin Nichols <>
Subject   Re: st: Gene-incidence question/simulation
Date   Sun, 22 Mar 2009 12:02:10 -0400

moleps islon <> :
Just to be clear: B causes Z and B causes A, but you don't observe B,
right? Let's ignore the survival model you are no doubt estimating,
and suppose you have gotten an estimate of P(Z|A)=.05 with a SE near
zero (a confidence interval of width zero).  Now you want to estimate
P(Z|B) and P(A|B), and you think P(Z|B) is near .65 and
P(Z|~B)=6/100000 (I assume "background incidence" is the probability
of Z given not B here; that may reflect my "background ignorance").
You will need much more information to make any progress!

Let p=P(B) in the population, y=P(Z|B), x=P(A|B), and w=P(A|~B). Note
that ~B means "not B" or B==0. Then


so even if you assume P(Z|A)=.05 and y=.65, you have 3 unknowns and 1
equation; even if you know p, you have two unknowns w and x, so the
best you can hope for is to express P(A|B) as a linear function of
P(A|~B).  For example, if p=.5 and y=.65 and P(Z|A)=.05 then w is 12
times as big as x (i.e. if Z is so rare in a sample of A, when B so
likely causes Z, it must be because A is much more likely when not B
than when B).  If p is 8% then w and x are roughly the same.  I
suggest you draw out a couple of trees with probabilities and check my

If you want to estimate y and x, you are out of luck.  If you know w
and p with certainty, you can express y as a function of x and the
estimate of P(Z|A), so if you have estimates of P(Z|A) in memory, you
can use -lincom- to get estimates of y conditional on x, but how
plausible is it you would know w with certainty when you are trying to
estimate x and y?

I suppose you could use known p, estimates of P(Z|A) in memory, and
-lincom-, to get estimates of y conditional on x and w, then present a
table of point estimates and confidence intervals for various values
of x and w.  Or get estimates of x conditional on y and w, or what
have you.  But you still have to assume you know p with certainty, or
the dimension of that table gets out of control...

I have been assuming that P(Z|A) is what you are estimating, but you
really have a competing risk model, I am guessing, modeling the hazard
of getting Z before death or censoring by some other process. So you
need to redefine Z to be not "gets condition Z" but  "gets condition Z
in my observation period" to use any of the above, which is probably
unpalatable.  Plus, I don't know if I've translated your description
into probabilities correctly--the jargon of genetics is unfamiliar to
me (and many other list members--you should translate to the common
language of statistics).

On Sun, Mar 22, 2009 at 10:37 AM, moleps islon <> wrote:
> Dear statalisters,
> I'm studying a tumor A that has a probability (x) of a being linked to
> a genetic mutation (B) that also predisposes (penetrance approx 65%(y)
> by 70 years) to condition Z. Now I've got 217 cases of A that resulted
> in 11 cases of Z over 8534 years of followup years (among the 217
> cases). I need to determine the number of patients with B given that
> there is also a background incidence of 6/100000 for Z.We know that
> x<<y. Besides running a simulation is there a more analytical way of
> estimating x and y given my data???
> Best wishes,
> Moleps
*   For searches and help try:

© Copyright 1996–2021 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index