# st: How to reproduce a very specific text book example of logistic regression using Stata

 From tiago.pereira@incor.usp.br To statalist@hsphsun2.harvard.edu Subject st: How to reproduce a very specific text book example of logistic regression using Stata Date Wed, 7 Nov 2007 18:17:57 -0200 (BRT)

```Dear all,

The following question is not entirely related to Stata, but I will be
really grateful if you could suggest any tip to answer it using Stata.

I am trying to figure out which analysis was done in a text book of
application of logistic regression in genetic studies. The aim is to know
which model was applied to be able to learn it.

According to the book, the model was fit by logistic regression using the
GLIM program. The GOF of each model was assessed by scaled deviance  and
AIC.

I could only reproduce succesfully only the final result, since it is a
simple logistic regression model:

. xi: logistic status i.exposure [freq=count]

It reproduces exactly what is presented in the book:

Logistic regression              Number of obs   =       2112
LR chi2(2)      =       6.77

Prob > chi2     =  0.0338 Log likelihood = -1460.1621

Pseudo R2       =    0.0023

----------------------------------------------------------------------
status | Odds Ratio   Std. Err.   z  P>|z|   [95% Conf. Interval]

-------------+--------------------------------------------------------
_Iexposure_1 |   1.099477   .1666565     0.63   0.532   .8168884  1.4798
_Iexposure_2 |   1.347291   .2037921     1.97   0.049   1.001632  1.8122
----------------------------------------------------------------------

The objective of such analysis is to know which coding (genetic model)
fits better the data (coding cases_1\constrols_1 cases_2\constrols_2 and
cases_3\constrols_3 as 1, 0.5, and 0 or 1,0, and 1).

According to the book, the best model will be that that uses the coding 1,
0.5, and 0 (or 2,1 and 0), which produces the following results:

Scaled deviance  df  AIC
Constant                52.209           32  -11.791
Cons+coding 1,0.5,0     45.731           31  -16.269

The data is shown below.

Thank you for any help.

All the best,

Tiago

*----------- begin data -------------
input cases_1 cases_2 cases_3 controls_1 controls_2 controls_3 strata
74 44 23 51 75 13 1
26 23 4 23 30 8 2
59 42 10 49 42 9 3
34 36 6 27 22 4 4
37 41 7 28 35 15 5
35 33 8 37 40 9 6
33 23 10 38 46 13 7
48 35 8 48 35 7 8
54 45 8 50 40 8 9
62 66 9 59 57 9 10
57 69 7 33 56 20 11
end
rename cases_1 a0
rename cases_2 b0
rename cases_3 c0
rename controls_1 a1
rename controls_2 b1
rename controls_3 c1
reshape long a b c, i(strata) j(status)
gen id = _n
rename a count2
rename b count1
rename c count0
reshape long count, i(id) j(exposure)
replace status= cond(status==1,0,1)

*----------- end data  --------------

--
Tiago V. Pereira
Heart Institute - InCor
Laboratory of Genetics and Molecular Cardiology,
Department of Biochemistry and Molecular Biology,
Federal University of São Paulo.
Av. Dr. Eneas de Carvalho Aguiar, 44;
Cerqueira César - CEP 05403-000
Sao Paulo, SP Brazil.
Tel./fax: +55 11 3069 5068
email:  tiago.pereira@incor.usp.br

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```