Statistics
|
FAQ: |
|
Statistics |
|
|
Last updated: |
29 February 2008 |
|
Stata release: |
10 |
Questions are listed below in the following categories:
- Probability distributions
- Tests and CIs
- General questions
- Linear regression with simple error structures
- ANOVA and ANCOVA
- Generalized linear model
- Binary outcome qualitative dependent variable models
- Conditional logistic regression
- Multiple outcome qualitative dependent variable models
- Simple count dependent variable models
- Linear regression with heteroskedastic errors
- Linear regression with systems of equations (correlated errors)
- Models with endogenous sample selection
- Models with time-series data
- Panel-data models
15.1 General questions
15.2 Linear regression with panel data
15.3 Censored linear regression with panel data
15.4 Generalized linear model with panel data
15.5 Qualitative dependent variable models with panel data
15.6 Count dependent variable models with panel data
15.7 Random-coefficients models with panel data
- Survival-time (failure-time) models
- Survey-data analysis
- Robust variance estimation
- Marginal effects after estimation
- Multivariate analysis
- Pharmacokinetic data
- Epidemiological tables
- Cluster/factor analysis
- Instrumental variables and simultaneous equations systems
- Meta-analysis
- FAQs concerning Stata 9 (previous release)
- FAQs concerning releases before Stata 9
1. Probability distributions
-
How do I get the Euler–Mascheroni
constant gamma = 0.57721 ... in Stata?
How do I calculate values of the beta
function?
What is the delta method and how is
it used to estimate the standard error of a transformed parameter?
How are the chi-squared and F
distributions related?
2. Tests and CIs
-
Is there a way to estimate a nonlinear combination with nlcom, when the error “expression too long” is displayed?
The results from estimation commands display only two-sided tests for the coefficients. How can I perform a one-sided test?
How do I bootstrap a
vector of results?
Can you explain Chow tests?
How can I use Stata to calculate power by
simulation?
How large should the bootstrapped
samples be relative to the total number of cases
in the dataset?
How can you specify a term other than
residual error as the denominator in a single
degree-of-freedom F test after ANOVA?
What are some of the small sample
adjustments to the sandwich estimate of
variance?
How can I form various tests comparing the different
levels of a categorical variable after anova or
regress?
Why does test sometimes produce
chi-squared and other times F statistics?
How can I do a Chow test with the
robust variance estimates, that is, after estimating with
regress, vce(robust)?
How can I compute the Chow test
statistic?
Is my boss correct in saying that the
p-value given with a paired ttest should always
be lower than the signrank?
Does Stata provide a test for
trend?
3. General questions
-
How do I calculate row medians?
How can I get an R-squared
value when a Stata command does not supply one?
What are the differences between
predict and adjust?
How can I calculate percentile ranks? How can I
calculate plotting positions?
How do I estimate a nonlinear model using ml?
Why does bootstrap now give a warning
message for non-eclass commands?
How does one estimate a model when the
dependent variable is a proportion?
How can I take random samples
from an existing dataset?
How can I get the variance–covariance
matrix or coefficient vector?
What are some of the problems with stepwise
regression?
Why doesn't summarize accept
pweights? What does summarize calculate when you use
aweights?
How can I obtain the correlation matrix
as a Stata matrix?
Why do estimation commands sometimes
drop variables?
4. Linear regression with simple error structures
-
How do I fit a linear regression with interval constraints in Stata?
How do I calculate least square means in Stata?
How can I pool data (and perform Chow tests)
in linear regression without constraining the residual
variances to be equal?
How can I form various tests comparing the different
levels of a categorical variable after anova or
regress?
For two-stage least-squares
(2SLS/IV/ivreg) estimates,
-
Why is the R-squared statistic not printed in
some cases?
Why is the model sum of squares sometimes negative?
Why are the R-squared and model sum of squares sometimes
negative?
What is the effect of specifying
aweights with regress?
Why is the pseudo-R2 for
tobit negative or greater than one?
5. ANOVA and ANCOVA
-
How can you specify a term other than
residual error as the denominator in a single
degree-of-freedom F test after ANOVA?
How can I produce adjusted means
after ANOVA?
Why do I get an error message when I try to run a
repeated measures ANOVA?
How can I form various tests comparing the different
levels of a categorical variable after anova or
regress?
How does the anova command handle
collinearity?
6. Generalized linear model
-
There are no FAQs on this subject at this time.
7. Binary outcome qualitative dependent variable models
-
How do I fit a bivariate probit
model with partial observability and only one dependent
variable?
How do I obtain the standard error
of the predicted probability with logistic regression
analysis?
How are the standard errors and confidence
intervals computed for odds ratios (ORs) by logistic?
How do I obtain confidence intervals for predicted
probabilities after logistic regression?
What does "completely determined"
mean in my logistic regression output?
How can I get confidence intervals
for predicted probabilities after probit?
How can I do logistic regression or
multinomial logistic regression with grouped data?
Why do I get the message "outcome does not
vary" when I perform a logistic or logit
regression?
What is the difference between odds and odds ratio?
8. Conditional logistic regression
-
Why is there no intercept in the
clogit model?
In clogit, why can't I use
covariates that are constant within panel?
Why does clogit sometimes report
a coefficient but missing value for the standard error, confidence
interval, etc.?
9. Multiple outcome qualitative dependent variable models
-
Is it possible to include a constant term
(intercept) in ordered probit model within Stata? What is the
relationship between ordered probit and
probit?
How are the standard errors and confidence
intervals computed for relative risk ratios (RRRs) by
mlogit?
How can I convert Stata's parameterization of
ordered probit and logistic models to
one in which a constant is estimated? Why is there no constant term
reported in ologit and oprobit?
How can I do logistic regression or
multinomial logistic regression with grouped data?
Why does my mlogit take so long to
converge?
10. Simple count dependent variable models
-
How do you specify the variance
function in nbreg to coincide with Cameron and
Trivedi's (Regression analysis of count data, page 62) NB1 and NB2
variance functions?
What is the difference between the models fit using
nbreg, dispersion(mean) and nbreg,
dispersion(constant)?
My raw data contains evidence of both over-dispersion
and "excess zeros". Is a zero-inflated negative
binomial model the only count data model that can account for
both the over-dispersion and "excess-zeros"?
11. Linear regression with heteroskedastic errors
-
There are no FAQs on this subject at this time.
12. Linear regression with systems of equations
(correlated errors)
-
There are no FAQs on this subject at this time.
13. Models with endogenous sample selection
-
How do I impose the restriction that
rho is zero using the heckman command
with full ml?
What is the difference between 'endogeneity
' and 'sample selection bias'?
Why are observations that are noninformative about
the dependent variable, but are known to be selected, excluded by
heckman from the estimation sample?
How are estimates of rho
outside the bounds [-1,1] handled in the two-step Heckman
estimator? Technical FAQ
Why are there so many formulas for the inverse of
Mills' ratio?
What if I have censoring from
above/below in my Heckman selection model?
14. Models with time-series data
-
Where can I find a description of the various
time-series operators?
15. Panel-data models
-
15.1 General questions
-
How do I obtain bootstrapped standard errors with panel data?
How can I generate a variable relating
panel data to a reference panel?
How should I interpret changing quadchk
results?
What is the difference between
random-effects and population-averaged
estimators?
Why don't the decomposed variances
in xtsum add up?
15.2 Linear regression with panel data
-
Why does xtgls not report an R2 statistic?
How do I test for panel-level
heteroskedasticity and autocorrelation?
What is the between estimator?
How does xtgls differ from
regression clustered with robust standard errors?
Why does xtreg with the
mle option produce different results from
xtreg with only the re option?
How can there be an intercept in the fixed-effects
model estimated by xtreg, fe?
What role does the time variable play in
xtgls?
Is there more information on xtreg?
Why isn't the calculation of R2 the
same for areg and xtreg, fe?
15.3 Censored linear regression with panel data
-
Why do I obtain different results
when executing xttobit on the same data in different
sessions?
15.4 Generalized linear model with panel data
-
Why does xtgee sometimes report that convergence was not achieved?
How can I calculate the pseudo
R2 for xtprobit?
What are the divisors used in
xtgee? Technical FAQ
Why do Stata's xtgee
standard errors differ from those reported by SAS's PROC GENMOD?
Can Stata estimate a Rasch model?
Why must weights be constant
within panel for xtgee?
How does Stata's implementation
of GEE differ from other implementations?
15.5 Qualitative dependent variable models with panel
data
-
There are no FAQs on this subject at this time.
15.6 Count dependent variable models with panel data
-
There are no FAQs on this subject at this time.
15.7 Random-coefficients models with panel data
-
There are no FAQs on this subject at this time.
16. Survival-time (failure-time) models
-
What is the relationship between baseline
hazard and baseline hazard contribution?
How can I obtain the standard error
of the regression with streg?
How are the standard errors and confidence
intervals computed for hazard ratios (HRs) by
stcox and streg?
How do I convert my spell-type
data into a survival dataset? How do I stset
my spell-type data?
How do I analyze
multiple failure-time data using Stata?
Why does stcox
sometimes produce missing standard errors?
Why does stsum sometimes report
missing values for the percentiles of survival time?
Why can't a subject die at time 0?
Why can't a subject enter and die at the same
time in the Cox model?
What is the difference between
sts list and ltable?
17. Survey-data analysis
-
How do I obtain percentiles for survey data?
If we change
the order of cluster sampling and stratification when
sampling the population, would the svyset command be
different?
How can I estimate correlations and
their level of significance with survey data?
Is there a way in Stata to do stepwise
regression with svy: logit or any of the svy commands?
Do the svy commands handle zero
weights differently than non-svy commands do?
Are the estimates produced by
probit and logit with the
vce(cluster clustvar) option true maximum likelihood estimates?
Is there a difference between the estimates produced
by the svy: probit, with psu variable specified in svyset command and probit, vce(cluster clustvar) (and,
similarly, between svy: logit, psu variable specified in svyset and logit,
vce(cluster clustvar))?
Why doesn't summarize accept
pweights? What does summarize calculate when you use
aweights?
18. Robust variance estimation
-
Which references should I cite when using the
vce(cluster clustvar)
option to obtain Stata's cluster-correlated robust estimate of
variance?
What are some of the small sample
adjustments to the sandwich estimate of
variance?
How can I do a Chow test with the
robust variance estimates, that is, after estimating with
regress, vce(robust)?
How can the standard
errors with the vce(cluster clustvar)
option be smaller than those without the
vce(cluster clustvar) option?
What are the advantages of using the
robust variance estimator over the standard
maximum-likelihood variance estimator in logistic regression?
How do the ML estimation commands (e.g., logit and
probit) compute the model chi-squared test when they
estimate robust standard errors on clustered data?
Are the estimates produced by
probit and logit with the
vce(cluster clustvar) option true maximum likelihood estimates?
Is there a difference between the estimates produced
by the svy: probit, with psu variable specified in svyset command and probit, vce(cluster clustvar) (and,
similarly, between svy: logit, psu variable specified in svyset and logit,
vce(cluster clustvar))?
Why should I not do a likelihood-ratio test
after an ML estimation (e.g., logit, probit) with clustering or
pweights?
How can I get robust standard
errors for tobit?
19. Marginal effects after estimation
-
What is the difference between the
linear and nonlinear methods that
mfx uses?
When I run mfx, I am getting the
warning message "warning: derivative missing; try rescaling
variable mpg". What does that mean?
When I run mfx, I am getting the
error message "predict() option unsuitable for marginal
effects". What does that mean?
When I run mfx, I am getting the
warning message "warning: predict() expression unsuitable for
standard-error calculation; option nose imposed". What does that
mean?
I am only interested in obtaining a few of the
marginal effects for a few independent
variables. How can I do that?
Running mfx on my dataset takes a
long time, and I am worried it may have crashed. How can
I tell if it is still running?
I am using mfx after an
estimation that has an offset. How does
mfx take that into account?
When I use the eyex option of mfx, what is it actually computing and how does it relate to the coefficients of the loglinear model?
I am using mfx after an estimation
that has time-series operators in the independent variable
list. How does mfx calculate the means of the
independent variables?
Can I use mfx on survey data
with unweighted means?
I need to run mfx more than once
on my dataset, and it's taking a long time. What can I do to make it run
as fast as possible?
I am using a probit model, and mfx
says that my marginal effect is greater than 1. Can that
be correct?
I am using
a model with interactions. How can I obtain marginal effects
and their standard errors?
20. Multivariate analysis
-
There are no FAQs on this subject at this time.
21. Pharmacokinetic data
-
There are no FAQs on this subject at this time.
22. Epidemiological tables
-
Can I do n:1 matching with the mcc
command?
23. Cluster/factor analysis
-
Why do I sometimes get negative eigenvalues when
using the pf and ipf options of
factor?
Why does the cumulative proportion of variance
sometimes exceed 1 when using the pf and
ipf options of factor?
24. Instrumental variables and simultaneous equations systems
-
How do I estimate recursive systems using a subset of available instruments?
Must I use all of my exogenous variables as
instruments when estimating instrumental variables
regression?
-
What meta-analysis features are available in Stata?
26. FAQs concerning Stata 9 (previous release)
-
Why do Stata and SAS differ in the results that
they report for the stratified generalized Wilcoxon test for
time-to-event data?
Is there any difference between using
tsset and iis and tis
before xt commands?
How can I get robust standard
errors for tobit?
27. FAQs concerning releases before Stata 9
-
How do I estimate a nonlinear model using ml?
Why do I get an "unbalanced data" error message when I run nlogit?
How do you test the equality of
regression coefficients that are generated from
two different regressions, estimated on two different samples?
How can I obtain the correlation between the
factors after an oblique rotation?
What do I do when one of the survey
estimators returns an error message, "stratum with only
one PSU detected"?
Is it possible to analyze survey data with two or
more levels of clustering with the svy commands?
How can I calculate moving averages
for panel data?
Does Stata support any multiple comparison
tests following two-way ANOVA?
How do I get the correct
variance–covariance matrix from the
bs routine?
How can I estimate stepwise Cox
models?
How can I estimate a fixed-effects
regression with instrumental variables?
How do I interpret the Vuong statistic of a test
between a negative binomial and a zero-inflated negative binomial
model for count data?
Why were the timings in the American
Statistician (August 1997) review of the svy commands so
slow?
How do I estimate a Cox model
with a continuously time-varying parameter?
What are completely determined
panels?
What is the difference between
biprobit/heckprob and the STB commands?
Where are the Wald tests for zinb
that appear in the manual?
Why do Stata's cc and cci commands
report different confidence intervals than Epi Info?
How can I get one-tailed probabilities for the
Student's t distribution?
How can I simulate random
multivariate normal observations from a
given correlation matrix?
Why does Weibull with entry and exit times
produce different results from Weibull with duration?
How does Stata's xtgee handle
singletons with exchangeable correlation?
I am running clogit and get the
message "Note: multiple positive outcomes within groups
encountered." Is this something I should worry about or is this a
normal message?
Can Stata's ml routine converge and
produce answers that look good even when it shouldn't?
Why don't the old huber results
match the new robust versions?
How can I get
predicted probabilities for different
x values after probit?
How can I get predicted probabilities after
svylogit, svyprobt, svymlog, svyolog, or svyoprob?
Why does the goodness-of-fit
chi-squared test reported by poisson change when the counts
and exposures are grouped differently?
What is the pseudo R2 in the weibull output?
How can I get the Mills'
ratios for my heckman model?
How do I test endogeneity?
How do I perform a Durbin–Wu–Hausman test?
|
|