Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Subject: st: RE: RE: risk ratio

From	"Nick Cox" <[email protected]>
To	<[email protected]>
Subject	RE: Subject: st: RE: RE: risk ratio
Date	Tue, 23 Mar 2010 17:07:10 -0000

Clearly, so to speak, I have no problem over seeking clarity. 

My main point, although small, was that -rnormal()- is briefer than
-invnorm(uniform())- and -invlogit()- is in at least one sense more
transparent than the alternative. I don't think either substitution
worsens clarity. 

Nick 
[email protected] 

[email protected]

It is definitely possible to use shortcut code like Nick showed. I tend 
to prefer, however, for pedagogical purposes,
to specify separately the linear predictor, inverse link function, and 
pseudo-random number generator for a given distribution. I believe that 
it makes the logic of synthetic model creation more clear. When the 
models become more complex it helps not to make mistakes. And for 
binomial models where there is no special inverse link function; eg 
loglog and cloglog, you must specify it separately.  The code I give in 
the article, and in response to the query on Statalist, is constructed 
to be as clear as possible. Please keep in mind though that it is the 
case that for several of the synthetic models discussed, the code can 
be compacted in the manner Nick suggests. This is particularly the case 
for embedding the linear predictor within the appropriate pseudo-random 
number generator.

Nick Cox 

=====================================
As a footnote to this, note a few equivalences:

invlogit(x) <=> 1 / (1 + exp(-x))

rnormal() <=> invnorm(runiform())

An alternative to

gen x1 = invnorm(runiform())
gen xb = 2 + 0.75*x1
gen exb = 1/(1+exp(-xb))
gen by = rbinomial(1, exb)

is thus

gen x1 = rnormal()
gen by = rbinomial(1, invlogit(2 + 0.75*x1))

Nick
[email protected]

Joseph Hilbe

I have an article coming out in the next Stata Journal that details
how to create synthetic models for a wide  variety
of discrete response regression models. For your problem though, I
think that the best approach is to create a synthetic
binary logistic model with a single predictor - as you specified. Then
model the otherwise logistic data as
Poisson with a robust variance estimator. And the coefficient must be
exponentiated. It can be interpreted as
a relative risk ratio.

Below is code to create a simple binary logistic model. Then model as
mentioned above. You asked for a
continuous pseudo-random variate, so I generated it from a normal
distribution. I normally like to use pseudo-random
uniform variates rather normal variates when creating these types of
models, but it usually makes little difference.
Recall that without a seed the model results will differ each time
run. If you want the same results, pick a
seed. I used my birthday.

I hope that this is what you were looking for.

Joseph Hilbe

clear
set obs 50000
set seed 1230
gen x1 = invnorm(runiform())
gen xb = 2 + 0.75*x1
gen exb = 1/(1+exp(-xb))
gen by = rbinomial(1, exb)
glm by x1, nolog fam(bin 1)
glm by x1, nolog fam(poi) eform robust


. glm by x1, nolog fam(bin 1)
Generalized linear models                          No. of obs      =
50000
Optimization     : ML                              Residual df     =
49998
                                                   Scale parameter =
1
Deviance         =  37672.75548                    (1/df) Deviance =
.7534852
Pearson          =  49970.46961                    (1/df) Pearson  =
.9994494
Variance function: V(u) = u*(1-u)                  [Bernoulli]
Link function    : g(u) = ln(u/(1-u))              [Logit]
                                                   AIC             =
.7535351
Log likelihood   = -18836.37774                    BIC             =
- -503294.5
- 
------------------------------------------------------------------------
- ------
             |                 OIM
          by |      Coef.   Std. Err.      z    P>|z|     [95% Conf.
Interval]
- 
-------------+----------------------------------------------------------
- ------
          x1 |   .7534291   .0143134    52.64   0.000     .7253754
.7814828
       _cons |   1.993125   .0149177   133.61   0.000     1.963887
2.022363
- 
------------------------------------------------------------------------
- ------


. glm by x1, nolog fam(poi) eform robust
Generalized linear models                          No. of obs      =
50000
Optimization     : ML                              Residual df     =
49998
                                                   Scale parameter =
1
Deviance         =  12673.60491                    (1/df) Deviance =
.2534822
Pearson          =   7059.65518                    (1/df) Pearson  =
.1411988
Variance function: V(u) = u                        [Poisson]
Link function    : g(u) = ln(u)                    [Log]
                                                   AIC             =
1.970592
Log pseudolikelihood = -49262.80246                BIC             =
- -528293.7
- 
------------------------------------------------------------------------
- ------
             |               Robust
          by |        IRR   Std. Err.      z    P>|z|     [95% Conf.
Interval]
- 
-------------+----------------------------------------------------------
- ------
          x1 |   1.104476   .0021613    50.78   0.000     1.100248
1.10872
- 
------------------------------------------------------------------------
- ------
.

Tomas Lind wrote:
Does anyone know how to generate fake data for a dichotomous outcome (0,
1)
that is dependent on a continuous exposure variable in an
epidemiological
relative risk context. I know how to use the logit transformation but in
that case exposure is proportional to log(ods) and not to risk.


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- Subject: st: RE: RE: risk ratio
  - From: [email protected]

Prev by Date: Re: st: passing extra information to function evaluator program nl
Next by Date: Re: st: Adjusting file associations so do-files open in editor?
Previous by thread: Subject: st: RE: RE: risk ratio
Next by thread: st: code fragmennt in mlogit.ado
Index(es):
- Date
- Thread