help tetrachoric dialog: tetrachoric
-------------------------------------------------------------------------------
Title
[R] tetrachoric -- Tetrachoric correlations for binary variables
Syntax
tetrachoric varlist [if] [in] [weight] [, options]
options description
-------------------------------------------------------------------------
Main
stats(statlist) list of statistics; select up to 4 statistics; default
is stats(rho)
edwards use the noniterative Edwards and Edwards estimator;
default is the maximum likelihood estimator
print(#) significance level for displaying coefficients
star(#) significance level for displaying with a star
bonferroni use Bonferroni-adjusted significance level
sidak use Sidak-adjusted significance level
pw calculate all the pairwise correlation coefficients by
using all available data (pairwise deletion)
zeroadjust adjust frequencies when one cell has a zero count
matrix display output in matrix form
notable suppress display of correlations
posdef modify correlation matrix to be positive semidefinite
-------------------------------------------------------------------------
statlist description
-------------------------------------------------------------------------
rho tetrachoric correlation coefficient
se standard error of rho
obs number of observations
p exact two-sided significance level
-------------------------------------------------------------------------
by is allowed; see [D] by.
fweights are allowed; see weight.
Menu
Statistics > Summaries, tables, and tests > Summary and descriptive
statistics > Tetrachoric correlations
Description
tetrachoric computes estimates of the tetrachoric correlation
coefficients of the binary variables in varlist. All these variables
should be 0, 1, or missing values.
Tetrachoric correlations assume a latent bivariate normal distribution
(X1, X2) for each pair of variables (v1, v2), with a threshold model for
the manifest variables (vi = 1 if and only if Xi > 0). The means and
variances of the latent variables are not identified, but the
correlation, r, of X1 and X2 can be estimated from the joint distribution
of v1 and v2 and is called the tetrachoric correlation coefficient.
tetrachoric computes pairwise estimates of the tetrachoric correlations
by the (iterative) maximum likelihood estimator obtained from bivariate
probit without explanatory variables (see [R] biprobit) by using the
Edwards and Edwards (1984) noniterative estimator as the initial value.
The pairwise correlation matrix is returned as r(Rho) and can be used to
perform a factor analysis or a principal component analysis of binary
variables by using the factormat or pcamat commands; see [MV] factor and
[MV] pca.
Options
+------+
----+ Main +-------------------------------------------------------------
stats(statlist) specifies the statistics to be displayed in the matrix of
output. stats(rho) is the default. Up to four statistics may be
specified. stats(rho se p obs) would display the tetrachoric
correlation coefficient, its standard error, the significance level,
and the number of observations. If varlist contains only two
variables, all statistics are shown in tabular form. stats(),
print(), and star() have no effect unless the matrix option is also
specified.
edwards specifies that the noniterative Edwards and Edwards estimator be
used. The default is the maximum likelihood estimator. If you
analyze many binary variables, you may want to use the fast
noniterative estimator proposed by Edwards and Edwards (1984).
However, if you have skewed variables, the approximation does not
perform well.
print(#) specifies the maximum significance level of correlation
coefficients to be printed. Correlation coefficients with larger
significance levels are left blank in the matrix. Typing tetrachoric
..., print(.10) would list only those correlation coefficients that
are significant at the 10% level or lower.
star(#) specifies the maximum significance level of correlation
coefficients to be marked with a star. Typing tetrachoric ...,
star(.05) would "star" all correlation coefficients significant at
the 5% level or lower.
bonferroni makes the Bonferroni adjustment to calculated significance
levels. This option affects printed significance levels and the
print() and star() options. Thus tetrachoric ..., print(.05)
bonferroni prints coefficients with Bonferroni-adjusted significance
levels of 0.05 or less.
sidak makes the Sidak adjustment to calculated significance levels. This
option affects printed significance levels and the print() and star()
options. Thus tetrachoric ..., print(.05) sidak prints coefficients
with Sidak-adjusted significance levels of 0.05 or less.
pw specifies that the tetrachoric correlation be calculated by using all
available data. By default, tetrachoric uses casewise deletion,
where observations are ignored if any of the specified variables in
varlist are missing.
zeroadjust specifies that when one of the cells has a zero count, a
frequency adjustment be applied in such a way as to increase the zero
to one-half and maintain row and column totals.
matrix forces tetrachoric to display the statistics as a matrix, even if
varlist contains only two variables. matrix is implied if more than
two variables are specified.
notable suppresses the output.
posdef modifies the correlation matrix so that it is positive
semidefinite, i.e., a proper correlation matrix. The modified result
is the correlation matrix associated with the least-squares
approximation of the tetrachoric correlation matrix by a
positive-semidefinite matrix. If the correlation matrix is modified,
the standard errors and significance levels are not displayed and are
not returned in r().
Examples
Setup
. webuse familyvalues
Pearson correlations
. correlate RS074 RS075 RS076
Correlations produced by tetrachoric
. tetrachoric RS074 RS075 RS076
Pearson correlations
. correlate RS056-RS063
Correlations produced by tetrachoric
. tetrachoric RS056-RS063
Adjust correlation matrix, if need be, to be positive semidefinite
. tetrachoric RS056-RS063 in 1/20, posdef
Saved results
tetrachoric saves the following in r():
Scalars
r(rho) tetrachoric correlation coefficient between variables 1
and 2
r(N) number of observations
r(nneg) number of negative eigenvalues (posdef only)
r(se_rho) standard error of r(rho)
r(p) exact two-sided significance level
Macros
r(method) estimator used
Matrices
r(Rho) tetrachoric correlation matrix
r(Se_Rho) standard errors of r(Rho)
r(corr) synonym for r(Rho)
r(Nobs) number of observations used in computing correlation
r(P) exact two-sided significance level matrix
Reference
Edwards, J. H., and A. W. F. Edwards. 1984. Approximating the
tetrachoric correlation coefficient. Biometrics 40: 563.
Also see
Manual: [R] tetrachoric
Help: [R] biprobit, [R] correlate, [MV] factor, [R] spearman (ktau),
[MV] pca, [R] tabulate twoway