Hi Neil,

I'm trying to do an allele-wise analysis, the genotype analysis was as you said where the genotype is recoded. It is not quite a matched case control: there is not a case for each control where the outcome alternates like this:

x outcome
1 1 1
1 2 0
2 1 1
2 1 0
3 1 1
3 2 0

In my dataset, the outcome remains the same for each ID row (as it is the same person) they just have a 2 different alleles which are different for each person which is what i am testing, so:

ID SNP outcome
1 1 1
1 2 1
2 1 0
2 1 0
3 1 0
3 2 0

I have slightly different 95% confidence intervals when i do normal logistic regression and when i introduce the cluster (ID) option, so there is a difference in the 2 methods. Do you know why this is and which is the better model to use? Thanks


The logistic cluster option would be appropriate if your cases and
controls have been matched on an additional trait such as age or sex
(in which case you would add -, cluster(age)- to the above command).
More on this is given in the book Clayton & Hills (1993) "Statistical
Models in Epidemiology" OUP (I don't have it to hand, but I think its
chapter 29).
Further reflection on what I wrote earlier (and reading the -man-
pages) indicates that my above suggestion as to the use of the
-cluster()- option is completely wrong.

If you do have matched cases and controls then you would need an
additional column which indicates which group individuals belong to.
Thus in your given data example if person 1 is matched to person 2
then there should be a additional column that indicates this as group
1.  (If person 3 and 4 are matched then they would form group 2 and so

You then specify the variable that indicates which group pairs belong
to in the -cluster()- option.

Apologies for any confusion I've caused,

