Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Nick Cox" <n.j.cox@durham.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
RE: Subject: st: RE: RE: risk ratio |

Date |
Tue, 23 Mar 2010 17:07:10 -0000 |

Clearly, so to speak, I have no problem over seeking clarity. My main point, although small, was that -rnormal()- is briefer than -invnorm(uniform())- and -invlogit()- is in at least one sense more transparent than the alternative. I don't think either substitution worsens clarity. Nick n.j.cox@durham.ac.uk jhilbe@aol.com It is definitely possible to use shortcut code like Nick showed. I tend to prefer, however, for pedagogical purposes, to specify separately the linear predictor, inverse link function, and pseudo-random number generator for a given distribution. I believe that it makes the logic of synthetic model creation more clear. When the models become more complex it helps not to make mistakes. And for binomial models where there is no special inverse link function; eg loglog and cloglog, you must specify it separately. The code I give in the article, and in response to the query on Statalist, is constructed to be as clear as possible. Please keep in mind though that it is the case that for several of the synthetic models discussed, the code can be compacted in the manner Nick suggests. This is particularly the case for embedding the linear predictor within the appropriate pseudo-random number generator. Nick Cox ===================================== As a footnote to this, note a few equivalences: invlogit(x) <=> 1 / (1 + exp(-x)) rnormal() <=> invnorm(runiform()) An alternative to gen x1 = invnorm(runiform()) gen xb = 2 + 0.75*x1 gen exb = 1/(1+exp(-xb)) gen by = rbinomial(1, exb) is thus gen x1 = rnormal() gen by = rbinomial(1, invlogit(2 + 0.75*x1)) Nick n.j.cox@durham.ac.uk Joseph Hilbe I have an article coming out in the next Stata Journal that details how to create synthetic models for a wide variety of discrete response regression models. For your problem though, I think that the best approach is to create a synthetic binary logistic model with a single predictor - as you specified. Then model the otherwise logistic data as Poisson with a robust variance estimator. And the coefficient must be exponentiated. It can be interpreted as a relative risk ratio. Below is code to create a simple binary logistic model. Then model as mentioned above. You asked for a continuous pseudo-random variate, so I generated it from a normal distribution. I normally like to use pseudo-random uniform variates rather normal variates when creating these types of models, but it usually makes little difference. Recall that without a seed the model results will differ each time run. If you want the same results, pick a seed. I used my birthday. I hope that this is what you were looking for. Joseph Hilbe clear set obs 50000 set seed 1230 gen x1 = invnorm(runiform()) gen xb = 2 + 0.75*x1 gen exb = 1/(1+exp(-xb)) gen by = rbinomial(1, exb) glm by x1, nolog fam(bin 1) glm by x1, nolog fam(poi) eform robust . glm by x1, nolog fam(bin 1) Generalized linear models No. of obs = 50000 Optimization : ML Residual df = 49998 Scale parameter = 1 Deviance = 37672.75548 (1/df) Deviance = .7534852 Pearson = 49970.46961 (1/df) Pearson = .9994494 Variance function: V(u) = u*(1-u) [Bernoulli] Link function : g(u) = ln(u/(1-u)) [Logit] AIC = .7535351 Log likelihood = -18836.37774 BIC = - -503294.5 - ------------------------------------------------------------------------ - ------ | OIM by | Coef. Std. Err. z P>|z| [95% Conf. Interval] - -------------+---------------------------------------------------------- - ------ x1 | .7534291 .0143134 52.64 0.000 .7253754 .7814828 _cons | 1.993125 .0149177 133.61 0.000 1.963887 2.022363 - ------------------------------------------------------------------------ - ------ . glm by x1, nolog fam(poi) eform robust Generalized linear models No. of obs = 50000 Optimization : ML Residual df = 49998 Scale parameter = 1 Deviance = 12673.60491 (1/df) Deviance = .2534822 Pearson = 7059.65518 (1/df) Pearson = .1411988 Variance function: V(u) = u [Poisson] Link function : g(u) = ln(u) [Log] AIC = 1.970592 Log pseudolikelihood = -49262.80246 BIC = - -528293.7 - ------------------------------------------------------------------------ - ------ | Robust by | IRR Std. Err. z P>|z| [95% Conf. Interval] - -------------+---------------------------------------------------------- - ------ x1 | 1.104476 .0021613 50.78 0.000 1.100248 1.10872 - ------------------------------------------------------------------------ - ------ . Tomas Lind wrote: Does anyone know how to generate fake data for a dichotomous outcome (0, 1) that is dependent on a continuous exposure variable in an epidemiological relative risk context. I know how to use the logit transformation but in that case exposure is proportional to log(ods) and not to risk. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**Subject: st: RE: RE: risk ratio***From:*jhilbe@aol.com

- Prev by Date:
**Re: st: passing extra information to function evaluator program nl** - Next by Date:
**Re: st: Adjusting file associations so do-files open in editor?** - Previous by thread:
**Subject: st: RE: RE: risk ratio** - Next by thread:
**st: code fragmennt in mlogit.ado** - Index(es):