Statistics
|
FAQ: |
|
Statistics |
|
|
Last updated: |
14 October 2011 |
|
Stata release: |
12 |
Questions are listed below in the following categories:
- Probability distributions
- Tests and CIs
- General questions
- Linear regression with simple error structures
- ANOVA and ANCOVA
- Binary outcome qualitative dependent variable models
- Conditional logistic regression
- Multiple outcome qualitative dependent variable models
- Simple count dependent variable models
- Models with endogenous sample selection
- Models with time-series data
- Panel-data models
12.1 General questions
12.2 Linear regression with panel data
12.3 Censored linear regression with panel data
12.4 Generalized linear model with panel data
- Survival-time (failure-time) models
- Survey-data analysis
- Robust variance estimation
- Marginal effects after estimation
- Epidemiological tables
- Cluster/factor analysis
- Instrumental variables and simultaneous equations systems
- Meta-analysis
- Multiple imputation
- FAQs concerning Stata 11 (previous release)
- FAQs concerning releases before Stata 11
1. Probability distributions
How do I get the Euler–Mascheroni
constant gamma = 0.57721 ... in Stata?
How do I calculate values of the beta function?
What is the delta method and how is it used to estimate the standard
error of a transformed parameter?
How are the chi-squared and F distributions related?
2. Tests and CIs
Is there a way to estimate a nonlinear combination with nlcom,
when the error “expression too long” is displayed?
The results from estimation commands display only two-sided tests for the
coefficients. How can I perform a one-sided test?
How do I bootstrap a vector of results?
Can you explain Chow tests?
How can I use Stata to calculate power by simulation?
How large should the bootstrapped samples be relative to the total
number of cases in the dataset?
How can you specify a term other than residual error as the
denominator in a single degree-of-freedom F test after ANOVA?
What are some of the small sample adjustments to the sandwich
estimate of variance?
Why does test sometimes produce chi-squared and other times F
statistics?
How can I do a Chow test with the robust variance estimates, that is,
after estimating with regress, vce(robust)?
How can I compute the Chow test statistic?
Should the p-value given with a paired t-test always
be lower than the signrank?
Does Stata provide a test for trend?
3. General questions
Why do I get different results when running a ml procedure on
Stata/SE and Stata/MP?
Why do I see different p-values, etc., when I change the
base level for a factor in my regression?
How do I calculate row medians?
How can I get an R-squared
value when a Stata command does not supply one?
How can I calculate percentile ranks?
How can I calculate plotting positions?
How do I estimate a nonlinear model using ml?
Why does bootstrap give a warning
message for non-eclass commands?
How do you fit a model when the
dependent variable is a proportion?
How can I take random samples from an existing dataset?
How can I get the variance–covariance
matrix or coefficient vector?
What are some of the problems with stepwise regression?
Why doesn't summarize accept pweights?
What does summarize calculate when you use aweights?
Why do estimation commands sometimes omit variables?
How do I keep all levels of my categorical variable in my model?
How do I specify a cell means model?
4. Linear regression with simple error structures
How do I fit a linear regression with interval
(inequality) constraints in Stata?
How do I fit a regression with interval constraints in Stata?
How can I pool data (and perform Chow tests) in linear regression
without constraining the residual variances to be equal?
How can I form various tests comparing the different levels of a categorical
variable after anova or regress?
For two-stage least-squares (2SLS/IV/ivregress) estimates,
Why is the R-squared statistic not printed in some cases?
Why is the model sum of squares sometimes negative?
Why are the R-squared and model sum of squares sometimes negative?
What is the effect of specifying aweights with regress?
Why is the pseudo-R2 for tobit negative or greater
than one?
5. ANOVA and ANCOVA
Why does the
p-value for a term in my ANOVA not agree with the
p-value for the coefficient for that term in the
corresponding regression?
How can you specify a term other than residual error as the
denominator in a single degree-of-freedom F test after
ANOVA?
Why do I get an error message when I try to run a repeated measures
ANOVA?
How can I form various tests comparing the different levels of a categorical
variable after anova or regress?
How does the anova command handle collinearity?
6. Binary outcome qualitative dependent variable models
How do I fit a bivariate probit model with partial observability and
only one dependent variable?
How do I obtain the standard error of the predicted probability with
logistic regression analysis?
How are the standard errors and confidence
intervals computed for odds ratios (ORs) by logistic?
How do I obtain confidence intervals for predicted
probabilities after logistic regression?
How can I get confidence intervals
for predicted probabilities after probit?
How can I do logistic regression or
multinomial logistic regression with grouped data?
Why do I get the message "outcome does not vary" when I perform a
logistic or logit regression?
What is the difference between odds and odds ratio?
7. Conditional logistic regression
Why is there no intercept in the clogit model?
In clogit, why can't I use
covariates that are constant within panel?
Why does clogit sometimes report a coefficient but missing value for
the standard error, confidence interval, etc.?
8. Multiple outcome qualitative dependent variable models
Is it possible to include a constant term (intercept) in ordered
probit model within Stata? What is the relationship between ordered
probit and probit?
How are the standard errors and confidence intervals computed for relative
risk ratios (RRRs) by mlogit?
How can I convert Stata's parameterization of ordered probit and
logistic models to one in which a constant is estimated? Why is there
no constant term reported in ologit and oprobit?
How can I do logistic regression or multinomial logistic
regression with grouped data?
9. Simple count dependent variable models
How do you specify the variance function in nbreg to coincide
with Cameron and Trivedi's (Regression analysis of count data, page 62) NB1
and NB2 variance functions?
What is the difference between the models fit using nbreg,
dispersion(mean) and nbreg, dispersion(constant)?
My raw data contains evidence of both over-dispersion and "excess zeros".
Is a zero-inflated negative binomial model the only count data model
that can account for both the over-dispersion and "excess-zeros"?
10. Models with endogenous sample selection
How do I impose the restriction that rho is zero using the
heckman command with full ml?
What is the difference between “endogeneity” and
“sample selection bias”?
Why are observations that are
noninformative about the dependent variable, but are known to be selected,
excluded by heckman from the estimation sample?
How are estimates of rho
outside the bounds [-1,1] handled in the two-step Heckman
estimator? (Technical FAQ)
Why are there so many formulas for the inverse of Mills' ratio?
What if I have censoring from above/below in my Heckman selection model?
11. Models with time-series data
Where can I find a description of the various time-series operators?
12. Panel-data models
12.1 General questions
How do I obtain bootstrapped standard errors with panel data?
How can I generate a variable relating
panel data to a reference panel?
How should I interpret changing quadchk results?
What is the difference between
random-effects and population-averaged estimators?
Why don't the decomposed variances in xtsum add up?
12.2 Linear regression with panel data
Why does xtgls not report an R2 statistic?
How do I test for panel-level heteroskedasticity
and autocorrelation?
What is the between estimator?
How does xtgls differ from
regression clustered with robust standard errors?
Why does xtreg with the mle option produce different results
from xtreg with only the re option?
How can there be an intercept in the fixed-effects
model estimated by xtreg, fe?
What role does the time variable play in xtgls?
Why isn't the calculation of R2 the
same for areg and xtreg, fe?
12.3 Censored linear regression with panel data
Why do I obtain different results when executing xttobit on
the same data in different sessions?
12.4 Generalized linear model with panel data
Why does xtgee sometimes report that convergence was not
achieved?
How can I calculate the pseudo
R2 for xtprobit?
What are the divisors used in xtgee? (Technical FAQ)
Can Stata estimate a Rasch model?
How does Stata's implementation
of GEE differ from other implementations?
13. Survival-time (failure-time) models
What is the relationship between baseline
hazard and baseline hazard contribution?
How can I obtain the standard error of the regression with streg?
How are the standard errors and confidence
intervals computed for hazard ratios (HRs) by
stcox and streg?
How do I convert my spell-type data into a survival dataset?
How do I stset my spell-type data?
How do I analyze multiple failure-time data using Stata?
Why does stsum sometimes report
missing values for the percentiles of survival time?
Why can't a subject die at time 0?
Why can't a subject enter and die at the same time in the Cox model?
What is the difference between sts list and ltable?
14. Survey-data analysis
How is the number of observations computed for
subpopulation estimation?
How do I obtain percentiles for survey data?
If we change the order of cluster sampling and stratification
when sampling the population, would the svyset command be
different?
How can I estimate correlations and
their level of significance with survey data?
Is there a way in Stata to do stepwise
regression with svy: logit or any of the svy commands?
Do the svy commands handle zero
weights differently than non-svy commands do?
Are the estimates produced by probit and logit with the
vce(cluster clustvar) option true maximum likelihood
estimates?
Is there a difference between the estimates produced by the svy:
probit, with psu variable specified in svyset command and
probit, vce(cluster clustvar) (and, similarly, between
svy: logit, psu variable specified in svyset and
logit, vce(cluster clustvar))?
Why doesn't summarize accept
pweights? What does summarize calculate when you use aweights?
15. Robust variance estimation
Which references should I cite when using the vce(cluster
clustvar) option to obtain Stata's cluster-correlated robust
estimate of variance?
What are some of the small sample
adjustments to the sandwich estimate of variance?
How can I do a Chow test with the robust variance estimates, that is,
after estimating with regress, vce(robust)?
How can the standard errors with the vce(cluster
clustvar) option be smaller than those
without the vce(cluster clustvar) option?
What are the advantages of using the robust variance estimator over
the standard maximum-likelihood variance estimator in logistic regression?
How do the ML estimation commands (e.g., logit and probit) compute the
model chi-squared test when they estimate robust standard errors on
clustered data?
Are the estimates produced by probit and logit with the
vce(cluster clustvar) option true maximum likelihood
estimates?
Is there a difference between the estimates produced by the svy:
probit, with psu variable specified in svyset command and
probit, vce(cluster clustvar) (and, similarly, between
svy: logit, psu variable specified in svyset and
logit, vce(cluster clustvar))?
Why should I not do a likelihood-ratio test after an ML estimation
(e.g., logit, probit) with clustering or pweights?
16. Marginal effects after estimation
When I use the eyex option of margins, what is it actually computing and how does it relate to the coefficients of the loglinear model?
I am using margins after an estimation that has time-series
operators in the independent variable list. How does margins
calculate the means of the independent variables?
I am using a probit model, and margins says that my marginal
effect is greater than 1. Can that be correct?
17. Epidemiological tables
Why does Fisher’s exact test disagree with the confidence interval for
the odds ratio?
Can I do n:1 matching with the mcc command?
18. Cluster/factor analysis
Why do I sometimes get negative eigenvalues when using the pf and
ipf options of factor?
Why does the cumulative proportion of variance
sometimes exceed 1 when using the pf and
ipf options of factor?
19. Instrumental variables and simultaneous equations systems
How do I estimate recursive systems using a subset of available
instruments?
Must I use all of my exogenous variables as instruments when estimating
instrumental variables regression?
What meta-analysis features are available in Stata?
21. Multiple imputation
How can I combine results other than coefficients in e(b) with
multiply imputed data?
How can I account for clustering when creating imputations with mi
impute?
What is the relation between the official multiple-imputation command,
mi, and the user-written ice and mim commands?
22. FAQs concerning Stata 11 (previous release)
What are the divisors used in xtgee? (Technical FAQ)
How can I form various tests comparing the different levels of a categorical
variable after anova or regress?
23. FAQs concerning releases before Stata 11
Why do Stata’s xtgee standard errors differ from those reported
by SAS’s PROC GENMOD?
I am using a model with interactions. How can I obtain marginal
effects and their standard errors?
I need to run mfx more than once on my dataset, and it's taking a
long time. What can I do to make it run as fast as possible?
Can I use mfx on survey data with unweighted means?
I am using mfx after an estimation that has an offset. How
does mfx take that into account?
Running mfx on my dataset takes a long time, and I am worried it may
have crashed. How can I tell if it is still running?
I am only interested in obtaining a few of the marginal effects for a
few independent variables. How can I do that?
When I run mfx, I am getting the warning message "warning:
predict() expression unsuitable for standard-error calculation; option nose
imposed". What does that mean?
When I run mfx, I am getting the error message "predict() option
unsuitable for marginal effects". What does that mean?
When I run mfx, I am getting the warning message "warning:
derivative missing; try rescaling variable mpg". What does that
mean?
What is the difference between the linear and nonlinear
methods that mfx uses?
How do I calculate least square means in Stata?
What does “completely determined” mean in my logistic
regression output?
How can I produce adjusted means after ANOVA?
Why does stcox sometimes produce missing standard errors?
What are the differences between predict and adjust?
How can I obtain the correlation matrix as a Stata matrix?
Why does my mlogit take so long to converge?
How can I get robust standard errors for tobit?
Why do Stata and SAS differ in the results that they report for the
stratified generalized Wilcoxon test for time-to-event data?
Is there any difference between using tsset and iis and
tis before xt commands?
How can I get robust standard errors for tobit?
How do I estimate a nonlinear model using ml?
Why do I get an "unbalanced data" error message
when I run nlogit?
How do you test the equality of
regression coefficients that are generated from
two different regressions, estimated on two different samples?
How can I obtain the correlation between the factors after an oblique
rotation?
What do I do when one of the survey estimators returns an error
message, "stratum with only one PSU detected"?
Is it possible to analyze survey data with two or more levels of
clustering with the svy commands?
How can I calculate moving averages for panel data?
Does Stata support any multiple comparison
tests following two-way ANOVA?
How do I get the correct variance–covariance matrix from the
bs routine?
How can I estimate stepwise Cox models?
How can I estimate a fixed-effects
regression with instrumental variables?
How do I interpret the Vuong statistic of a test between a negative binomial
and a zero-inflated negative binomial model for count data?
Why were the timings in the American Statistician (August 1997)
review of the svy commands so slow?
How do I estimate a Cox model with a continuously time-varying
parameter?
What are completely determined panels?
What is the difference between biprobit/heckprob and the STB commands?
Where are the Wald tests for zinb that appear in the manual?
Why do Stata's cc and cci commands
report different confidence intervals than Epi Info?
How can I get one-tailed probabilities for the Student's t distribution?
How can I simulate random multivariate normal observations from a
given correlation matrix?
Why does Weibull with entry and exit times
produce different results from Weibull with duration?
How does Stata's xtgee handle singletons with exchangeable correlation?
I am running clogit and get the message "Note: multiple positive
outcomes within groups encountered." Is this something I should worry about
or is this a normal message?
Can Stata's ml routine converge and
produce answers that look good even when it shouldn't?
Why don't the old huber results match the new robust versions?
How can I get predicted probabilities for different x values after
probit?
How can I get predicted probabilities after
svylogit, svyprobt, svymlog, svyolog, or svyoprob?
Why does the goodness-of-fit chi-squared test reported by poisson
change when the counts and exposures are grouped differently?
What is the pseudo R2 in the weibull output?
How can I get the Mills' ratios for my heckman model?
How do I test endogeneity?
How do I perform a Durbin–Wu–Hausman test?
|