Re: st: Gene-incidence question/simulation

 From moleps islon To statalist@hsphsun2.harvard.edu Subject Re: st: Gene-incidence question/simulation Date Mon, 23 Mar 2009 13:32:02 +0100

```Sorry about the mistake Martin pointed out, it should read:

simulate ratemutpos=y ratemutneg=o mutation=v, reps(100000):ins

Thanks for the input Neil. For my use I believe I can disregard the
genetic status of it all (ie recessive, dominant etc.,etc) and just
treat it as a risk factor, either you have it or you dont. I'm not
modeling the inheritance and we dont have anyone related in our
sample. We know our  217 patients have 11 cancers. What we dont know
is how many have the risk factor, but we know the general incidence of
cancer in people without the risk factor to be 6/100000. By simulating
a random number of random people in the sample having the riskfactor I
can compare the random allocation of the risk factor in the sample.
Next, by finding the random samples resulting in incidence rates of
6/100000 in the risk factor negative group I can use those samples to
infer the incidence of the risk factor because the set has produced a
result within the constraints of the problem.

Regards
Moleps

On Mon, Mar 23, 2009 at 11:27 AM, Neil Shephard <nshephard@gmail.com> wrote:
> On Mon, Mar 23, 2009 at 10:15 AM, moleps islon <moleps2@gmail.com> wrote:
>> Thanks for the statistical input. I truly appreciate this. However
>> what I've done instead in order to get an estimate is to run a
>> simulation whereby I select g random patients in my sample and "give
>> them" the mutation and then do the usual calculations.
>
> Sorry, but I see no benefit to this at all.  How are you estimating
> the proportion of your sample to '"give them" the mutation'?
>
> The frequency of the allele will be pivotal to calculating the
> penetrance, and from your code all you've done is pick a random sample
> of g patients from the total N.  This is highly unlikely to reflect
> the true frequency of the polymorphism in the population, and all
> you'll have is a range of estimates based on varying genotype
> frequencies under a dominant model (see comment below).
>
> Further your code doesn't seem to be explicitly accounting for any
> form of genetic other than a dominant one and you may wish to consider
> recessive, additive and multiplicative (after all, you presumably have
> no idea about the mode of inheritance).
>
> What organism are you looking at and what marker are you considering?
> If its humans and a SNP, see if you can find the RS# on HapMap where
> there will (hopefully) be an estimate of the allele frequency from
> their standard populations.
>
> I don't think you can draw any conclusions from the results that you
> are obtaining here.  You really need to genotype your samples for the
> mutation to estimate the allele frequency, determine the frequency of
> each genotype in your data set and then you can start deriving the
> penetrance and joint probability of developing the two phenotypes.
>
> Regards
>
> Neil
> --
> "The combination of some data and an aching desire for an answer does
> not ensure that a reasonable answer can be extracted from a given body
> of data." ~ John Tukey (1986), "Sunset salvo". The American
> Statistician 40(1).
>
> Email - nshephard@gmail.com
> Website - http://slack.ser.man.ac.uk/
> Photos - http://www.flickr.com/photos/slackline/
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```