# Re: st: Gene-incidence question/simulation

 From moleps islon To statalist@hsphsun2.harvard.edu Subject Re: st: Gene-incidence question/simulation Date Mon, 23 Mar 2009 11:15:41 +0100

```Thanks for the statistical input. I truly appreciate this. However
what I've done instead in order to get an estimate is to run a
simulation whereby I select g random patients in my sample and "give
them" the mutation and then do the usual calculations. From the
regressionline I can then get a (probably statistically dubious)
estimate of the cancerincidence in the face of having the mutation in
the sample given that the people in my sample without the mutation has
an incidence of 6/100000 for also sustaining cancer.

the simulation runs like this in case of curiosity (just generating
the observation-time here)...

set obs 217
gen id=_n
gen time=54*runiform()
gen p=runiform()
sort p
gen cancer=id<12
stset time, id(id) f(cancer==1)
simulate ratemutpos=y ratemutneg=o mutation=v, reps(100000):simulate ins
scatter ratemutpos ratemutneg||lfitci ratemutpos ratemutneg
tab mutation if ratemutneg<7

capture drop program ins
program ins
capture drop y
capture drop o
capture drop v
local b=runiform()*100
gen l=runiform()
sort l
gen mutation=_n<`b'
count if mutation==1
gen v=r(N)
stptime if mutation==1, per(100000)
gen y=r(rate)
stptime if mutation==0,per(100000)
gen o=r(rate)
end

On Sun, Mar 22, 2009 at 5:02 PM, Austin Nichols <austinnichols@gmail.com> wrote:
> moleps islon <moleps2@gmail.com> :
> Just to be clear: B causes Z and B causes A, but you don't observe B,
> right? Let's ignore the survival model you are no doubt estimating,
> and suppose you have gotten an estimate of P(Z|A)=.05 with a SE near
> zero (a confidence interval of width zero).  Now you want to estimate
> P(Z|B) and P(A|B), and you think P(Z|B) is near .65 and
> P(Z|~B)=6/100000 (I assume "background incidence" is the probability
> of Z given not B here; that may reflect my "background ignorance").
>
> Let p=P(B) in the population, y=P(Z|B), x=P(A|B), and w=P(A|~B). Note
> that ~B means "not B" or B==0. Then
>
> P(Z|A)=P(Z|B)P(B|A)+P(Z|~B)P(~B|A)=[ypx+.00006(1-p)w]/[(1-p)w+px]
>
> so even if you assume P(Z|A)=.05 and y=.65, you have 3 unknowns and 1
> equation; even if you know p, you have two unknowns w and x, so the
> best you can hope for is to express P(A|B) as a linear function of
> P(A|~B).  For example, if p=.5 and y=.65 and P(Z|A)=.05 then w is 12
> times as big as x (i.e. if Z is so rare in a sample of A, when B so
> likely causes Z, it must be because A is much more likely when not B
> than when B).  If p is 8% then w and x are roughly the same.  I
> suggest you draw out a couple of trees with probabilities and check my
> math.
>
> If you want to estimate y and x, you are out of luck.  If you know w
> and p with certainty, you can express y as a function of x and the
> estimate of P(Z|A), so if you have estimates of P(Z|A) in memory, you
> can use -lincom- to get estimates of y conditional on x, but how
> plausible is it you would know w with certainty when you are trying to
> estimate x and y?
>
> I suppose you could use known p, estimates of P(Z|A) in memory, and
> -lincom-, to get estimates of y conditional on x and w, then present a
> table of point estimates and confidence intervals for various values
> of x and w.  Or get estimates of x conditional on y and w, or what
> have you.  But you still have to assume you know p with certainty, or
> the dimension of that table gets out of control...
>
> I have been assuming that P(Z|A) is what you are estimating, but you
> really have a competing risk model, I am guessing, modeling the hazard
> of getting Z before death or censoring by some other process. So you
> need to redefine Z to be not "gets condition Z" but  "gets condition Z
> in my observation period" to use any of the above, which is probably
> unpalatable.  Plus, I don't know if I've translated your description
> into probabilities correctly--the jargon of genetics is unfamiliar to
> me (and many other list members--you should translate to the common
> language of statistics).
>
> On Sun, Mar 22, 2009 at 10:37 AM, moleps islon <moleps2@gmail.com> wrote:
>> Dear statalisters,
>> I'm studying a tumor A that has a probability (x) of a being linked to
>> a genetic mutation (B) that also predisposes (penetrance approx 65%(y)
>> by 70 years) to condition Z. Now I've got 217 cases of A that resulted
>> in 11 cases of Z over 8534 years of followup years (among the 217
>> cases). I need to determine the number of patients with B given that
>> there is also a background incidence of 6/100000 for Z.We know that
>> x<<y. Besides running a simulation is there a more analytical way of
>> estimating x and y given my data???
>>
>> Best wishes,
>> Moleps
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```