Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Script searching data in a kinship matrix


From   Ginevra Biino <biino@igm.cnr.it>
To   statalist@hsphsun2.harvard.edu
Subject   st: Script searching data in a kinship matrix
Date   Wed, 14 Nov 2012 16:50:12 +0100

Hello everybody,
in a case-control study I have already sampled cases stratifying for sex (0,1) and age (<62y, >=62y). I need to sample a group of controls with the same characteristics (which I can easily do with the sample command) plus one more: the level of relatedness. Therefore controls should be matched to cases for sex (2 strata), age(2 strata) and, relatedness (less than a certain level). In particular I need that controls are as least as possible related to cases, for example each control should have a kinship coefficient less than 0.0156 (i.e. 1/64 as for second cousins) with its matched case. In the example data set below, there are the sampled cases [20 cases (where disease==1): 5 subjects for each strata] and 200 possible controls (disease==0) I have already sampled stratifying for age and sex (50 subjects per strata).
id disease age sex
109530 0 65 M
109398 0 65 M
109494 0 65 M
110077 0 71 M
109601 0 66 M
109585 0 66 M
114262 0 73 M
109311 0 63 M
109355 0 64 M
110756 0 78 M
111090 0 81 M
110806 0 78 M
110222 0 72 M
110955 0 80 M
110310 0 73 M
109829 0 68 M
109434 0 64 M
110006 0 70 M
109286 0 63 M
109298 0 63 M
109721 0 67 M
110234 0 72 M
110143 0 71 M
133061 0 78 M
110021 0 69 M
110296 0 73 M
109719 0 67 M
110198 0 72 M
110115 0 71 M
110092 0 70 M
109296 0 63 M
109540 0 66 M
109791 0 68 M
109227 0 62 M
109807 0 68 M
109934 0 69 M
125715 0 73 M
109577 0 66 M
110677 0 76 M
111792 0 89 M
110414 0 74 M
109505 0 65 M
111257 0 82 M
109651 0 66 M
109552 0 66 M
109356 0 64 M
110641 0 76 M
109866 0 69 M
110749 0 78 M
110923 0 79 M
109316 0 63 F
105263 0 70 F
110843 0 78 F
109878 0 68 F
110941 0 79 F
111008 0 79 F
109403 0 64 F
110083 0 70 F
109778 0 68 F
109783 0 68 F
109325 0 63 F
109726 0 67 F
109958 0 69 F
110049 0 70 F
110736 0 77 F
114290 0 74 F
110791 0 78 F
111315 0 83 F
109431 0 64 F
114096 0 75 F
109784 0 68 F
110656 0 77 F
114678 0 74 F
32255 0 88 F
109253 0 63 F
133094 0 62 F
111251 0 82 F
109851 0 68 F
109221 0 62 F
109271 0 63 F
110264 0 72 F
109615 0 66 F
110557 0 75 F
110082 0 71 F
110278 0 72 F
110925 0 79 F
110347 0 73 F
109636 0 67 F
110271 0 72 F
109635 0 66 F
109621 0 66 F
110496 0 75 F
109295 0 63 F
110781 0 78 F
109281 0 62 F
110289 0 73 F
111491 0 85 F
109753 0 67 F
109181 0 62 F
110353 0 73 F
104532 0 36 M
105965 0 51 M
105866 0 50 M
105916 0 51 M
106358 0 56 M
103664 0 23 M
105618 0 48 M
109082 0 61 M
104572 0 36 M
105897 0 50 M
105090 0 41 M
108918 0 59 M
104758 0 38 M
103330 0 19 M
104390 0 34 M
109086 0 61 M
105198 0 43 M
104781 0 39 M
109128 0 61 M
105002 0 40 M
108946 0 60 M
133131 0 51 M
106058 0 52 M
115009 0 48 M
104740 0 38 M
132995 0 37 M
103309 0 18 M
103943 0 28 M
105747 0 49 M
103850 0 26 M
104824 0 38 M
104516 0 35 M
106423 0 56 M
105266 0 44 M
105117 0 41 M
104803 0 39 M
105642 0 48 M
108940 0 59 M
104982 0 41 M
105235 0 43 M
104839 0 39 M
104207 0 32 M
105097 0 42 M
104948 0 40 M
104218 0 31 M
104604 0 36 M
105565 0 47 M
104134 0 41 M
105059 0 41 M
104784 0 39 M
105131 0 42 F
115126 0 33 F
105417 0 45 F
103831 0 26 F
133091 0 35 F
106238 0 54 F
104724 0 38 F
105511 0 46 F
105438 0 46 F
103422 0 29 F
105142 0 42 F
103388 0 18 F
104384 0 34 F
105203 0 42 F
105023 0 41 F
105076 0 41 F
105691 0 48 F
104844 0 39 F
104674 0 37 F
104345 0 34 F
104736 0 39 F
103892 0 27 F
103909 0 27 F
1537 0 33 F
104562 0 36 F
108828 0 24 F
103814 0 26 F
105558 0 47 F
103556 0 23 F
109108 0 61 F
105179 0 43 F
104230 0 32 F
133036 0 45 F
104419 0 36 F
105475 0 46 F
103931 0 28 F
113829 0 48 F
133026 0 32 F
104542 0 35 F
104221 0 31 F
104510 0 36 F
105803 0 50 F
106489 0 57 F
105671 0 48 F
137828 0 24 F
104021 0 29 F
106195 0 54 F
133105 0 43 F
105425 0 45 F
104524 0 35 F
106258 1 55 M
105270 1 44 M
106661 1 59 M
104363 1 33 M
108982 1 60 M
106046 1 52 F
103359 1 18 F
105939 1 51 F
104152 1 31 F
105351 1 44 F
110363 1 73 M
114307 1 77 M
110790 1 78 M
109486 1 65 M
109317 1 64 M
114643 1 74 F
114057 1 75 F
109895 1 69 F
110775 1 77 F
110178 1 71 F

What I do not know is how to solve the relatedness problem. I have already computed the kinship coefficients matrix of the extended pedigree to whom the cases and controls in the example data belong. I do not provide it here because is a 190X190 matrix. As an immediate example, such a kinship matrix for 5 subjects (ID: 51, 59, 119, 156 and 178) is like:
   51 59 119 156 178
51 0.500 0.000 0.000 0.000 0.000
59 0.000 0.500 0.250 0.000 0.250
119 0.000 0.250 0.500 0.000 0.250
156 0.000 0.000 0.000 0.500 0.000
178 0.000 0.250 0.250 0.000 0.500


In conclusion I need a script that looks down such kinship matrix searching for controls satisfying the relatedness condition and that adds this information to my data set (.
The information could be reported in many ways (the simplest to obtain it):
For example, the script may add to the original data set as many columns as the maximum number of controls satisfying the relatedness condition for their matched cases. In particular in correspondence of each case (rows for which disease==1) the new columns should return the ID of the matched control satisfying the condition: case-control kinship< 0.0156 ; otherwise a zero (if condition is not satisfied), and finally a missing value (if the subject's ID is not in the kinship matrix). One other alternative may be that the script adds to the original data set as many rows as the maximum number of controls satisfying the relatedness condition for their matched cases. Such new rows (for each matched case) may report the control's ID in the ID column, missing values in the disease, age and sex columns, and the ID of the matched case in a new column.
Whatever alternative solution is welcome!
Does anybody can help me?
Ginevra


Ginevra Biino, PhD
Institute of Molecular Genetics, CNR
Via Abbiategrasso, 207
27100 Pavia, Italy
Tel +39 382 546363
Fax +39 382 422286
http://www.igm.cnr.it/
Ginevra Biino, PhD
Institute of Molecular Genetics, CNR
Via Abbiategrasso, 207
27100 Pavia, Italy
Tel +39 382 546363
Fax +39 382 422286
http://www.igm.cnr.it/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index