Stata software for generalized linear measurement error models
Three commands available since Stata 8 fit generalized linear models when
one or more covariates are measured with error.
It is well known that such errors usually lead to attenuation of estimated
effects, and the new commands adjust for that attenuation to produce correct
standard errors and test statistics. These commands allow adjustments to be
made in the generalized linear model framework using the following methods:
- Instrumental variables
- Regression calibration
- Simulation/extrapolation (SIMEX)
The software described here provides the first implementation of regression
calibration and of SIMEX in a general-purpose statistical package.
Regression calibration was suggested as a general approach by Carroll and
Stefanski (1990) and Gleser (1990). SIMEX was proposed by Cook and
Stefanski (1995) and further developed by Carroll, Küchenhoff, Lombard,
and Stefanski (1996) and Stefanski and Cook (1995).
The software provided is written by R. J. Carroll, J. Hardin, and H.
Schmiediche. The work described here was partly funded by the National
Institutes of Health, National Center for Research Resources, Grant Number
5R44RR12435-03. The SIMEX method is very computationally intensive, and
this new implementation of it is the fastest ever.
The discussion below is presented under the headings
- Downloading and installing the new commands
- Background and introduction
- Obtaining more information
- About the authors
- References
1. Downloading and installing the new commands
The new commands are named qvf, rcal, simex, and
simexplot.
To load them, type the following in Stata:
. net from http://www.stata.com/merror
. net install merror
Or you can download merror.zip and after
unzipping the file to somewhere on your harddrive (i.e., C:/data/merror/
), type
. net from C:/data/merror/
. net install merror
Once installed, you can type
. whelp qvf
. whelp rcal
. whelp simex
. whelp simexplot
These commands are implemented using Stata’s plug-in features, which
allow code written in C to be added to Stata. This means the new commands
are fast.
Because the new features are written as binary code, modules for different
platforms (e.g., Windows and Unix) cannot be interchanged. Nevertheless,
installation is completely automatic. When you type net install
merror, Stata will install the appropriate modules for your computer.
Just as with ado-files installed over the web, should you wish to uninstall
these materials, you can type ado uninstall merror.
The measurement-error analysis software is available for the following
platforms:
Windows (XP, 2000, NT, ME, 98)
Mac
Linux x86 and x86-64
IBM RS/6000 AIX
Digital Unix
HP-UX
Sun Solaris and Sun Solaris 64-bit
If you attempt to install from a computer not on the list above, when you type
net install merror, you will get the error "file
http://www.stata.com/merror/qvfmex.plugin not found; could not copy
http://www.stata.com/merror/qvfmex.plugin".
2. Background and introduction
The generalized linear model framework is a rich collection of models that
allows fitting of
- linear regression models
- logistic and probit regression models
- Poisson and negative binomial regression models
and many others. Say you wish to fit such a model and include the variable
X:
F(outcome) =
b0 + b1*X +
b2*Z2 +
b3*Z3 + ...
You, however, do not have X. Let's assume that instead you have
W, an error-prone version of X. Simply substituting W
for X will result in estimates of b1 being biased
toward 0 and estimates of b2, b3, ...,
also being biased, although the bias may be toward 0 or away from it.
Correctly dealing with measurement error requires estimating an equation
such as the one above to obtain unbiased coefficients and correct standard
errors.
The software provided here can do that when
- You have one variable W that measures X with error and a
value for s2, the variance of that error:
W = X + u, E(u)=0, V(u) = s2
- You have two or more replicates W1,
W2, ..., which each measure X with error, and
optionally you also have a value for s2, their common
error variance:
W1 = X + u1,
W2 = X + u2,
...
E(ui)=0 and V(ui) = s2
- You have a set of exogenous variables Z correlated with X
from which you can derive an instrument T;
W = a1*Z + e;
T = â1*Z
Using the SIMEX method, for instance, not only can you obtain unbiased
estimates and correct standard errors, you can obtain a graph that shows how
the amount of measurement error affects the estimated coefficients:
The above graph shows estimated coefficients
(b1,
b2,
b3,
b4,
b5)
for
yi = b1x1i +
b2x2i +
b3x3i +
b4x4i +
b5 + ui
where x3 and x4 are measured with error
by w3 and w4. The graph illustrates the
extrapolated point estimates for all covariates in the fitted model. With
multiple covariates, naive fitted covariates may be biased in either
direction, as illustrated.
3. Obtaining more information
At the North American Users Group meeting held
the March 18–19th, 2003 in Boston, Massachusetts, Raymond Carroll,
James Hardin, and Henrik Schmiediche presented a one-day workshop on
measurement error and the use of the new software. The slides from that
presentation are available in two formats:
162 slides, one per page, in
pdf format
162 slides, four per page,
in pdf format (suitable for printing)
Stata Journal
Volume 3, Number 4
is dedicated to measurement-error issues and the use of the software:
Measurement error, GLMs, and notational conventions, by James Hardin
and Raymond Carroll
Variance estimation for the instrumental variables approach to measurement
error in generalized linear models by James Hardin and Raymond
Carroll
Instrumental variables, bootstrapping, and generalized linear
models, by James Hardin, Henrik Schmiediche, and Raymond Carroll
The regression calibration method for fitting generalized linear models with
additive measurement error, by James Hardin, Henrik Schmiediche, and
Raymond Carroll
The simulation extrapolation method for fitting generalized linear models
with additive measurement error, by James Hardin, Henrik
Schmiediche, and Raymond Carroll
Maximum likelihood estimation of generalized linear models with covariate
measurement error, by Sophia Rabe–Hesketh, Anders Skrondal,
and Andrew Pickles
We also recommend the book
Measurement Error in Nonlinear Models by R. J. Carroll,
D. Ruppert, and L. A. Stefanski, published by Chapman & Hall, 1995.
4. About the authors
Raymond Carroll is a Distinguished Professor, a Professor of Statistics, and
a Professor of Nutrition and Toxicology at Texas A&M University. He is
also Director of Biostatistics Research at the Center for Environmental and
Rural Health (NIEHS) and Director of the Training Program in Biology,
Bioinformatics, and Nutrition for the National Cancer Institute, both at
Texas A&M University.
Dr. Carroll is the author of three books and over 200 professional papers,
including papers on measurement error modeling, regression variance
functions and transformations, nutrition, toxicology, and bioinformatics.
Dr. Carroll received his Ph.D. in Statistics from Purdue University in 1974.
James Hardin is Lecturer and Assistant Research Scientist at Texas A&M
University and previously was a Senior Statistician at StataCorp, where he
developed Stata's cross-sectional time-series capabilities. He is also the
author of Stata's current GLM command. He is the author of two books and
ten refereed papers, and he has recently been working with Henrik
Schmiediche developing the Stata software for fitting measurement-error
models. Dr. Hardin received his Ph.D. in Statistics from Texas A&M
University in 1992.
Henrik Schmiediche is a Senior Lecturer and Senior Systems Analyst at the
Department of Statistics of Texas A&M University. He holds a B.S.
degree in Computer Science and Ph.D. in Statistics. He has enjoyed
programming since his high school days when RAM was scarce and CPU’s
were slow. Over the last decade he has had several occasions to work on
implementing and coding aspects of estimating measurement error models. The
culmination of this effort is the software in Stata, written in
collaboration with Dr. Hardin.
5. References
- Carroll, R. J., D. Ruppert, and L. A. Stefanski. 1995.
- Measurement Error in Nonlinear Models. London: Chapman &
Hall/CRC.
- Carroll, R. J., H. Küchenhoff, F. Lombard, and L. A. Stefanski.
1996.
- Asymptotics for the SIMEX estimator in structural measurement error
models. Journal of the American Statistical Association, vol.
91, no. 433, 242–250.
- Carroll, R. J. and L. A. Stefanski. 1990.
- Approximate quasilikelihood estimation in models with surrogate
predictors. Journal of the American Statistical Association,
vol. 85, pp. 652–663.
- Cook, J. and L. A. Stefanski. 1995.
- A simulation extrapolation method for parametric measurement error
models. Journal of the American Statistical Association, vol.
89, pp. 1314–1328.
- Gleser, L. J. 1990.
- Improvements of the naive approach to estimation in nonlinear
errors-in-variables regression models. In Statistical Analysis of
Error Measurement Models and Application, P. J. Brown and W. A.
Fuller, ed. Providence: American Mathematics Society.
- Stefanski, L. A. and J. Cook. 1995.
- Simulation extrapolation: The measurement error jackknife. Journal
of the American Statistical Association, vol. 90, no. 432,
1247–1256.
The project described above was supported by Grant Number R44 RR12435 from
the National Institutes of Health, National Center for Research Resources.
Its contents are solely the responsibility of the authors and do not
necessarily represent the official views of the National Center for Research
Resources.
|
Software archive
Adding commands
Finding resources
Measurement error
Installing ssc commands
|