List of tables
List of figures
1 Stata basics
1.1 Interactive use
1.2 Documentation
1.2.1 Stata manuals
1.2.2 Additional Stata resources
1.2.3 The help command
1.2.4 The search, findit, and hsearch commands
1.3 Command syntax and operators
1.3.1 Basic command syntax
1.3.2 Example: The summarize command
1.3.3 Example: The regress command
1.3.4 Abbreviations, case sensitivity, and wildcards
1.3.5 Arithmetic, relational, and logical operators
1.3.6 Error messages
1.4 Do-files and log files
1.4.1 Writing a do-file
1.4.2 Running do-files
1.4.3 Log files
1.4.4 A three-step process
1.4.5 Comments and long lines
1.4.6 Different implementations of Stata
1.5 Scalars and matrices
1.5.1 Scalars
1.5.2 Matrices
1.6 Using results from Stata commands
1.6.1 Using results from the r-class command summarize
1.6.2 Using results from the e-class command regress
1.7 Global and local macros
1.7.1 Global macros
1.7.2 Local macros
1.7.3 Scalar or macro?
1.8 Looping commands
1.8.1 The foreach loop
1.8.2 The forvalues loop
1.8.3 The while loop
1.8.4 The continue command
1.9 Some useful commands
1.10 Template do-file
1.11 User-written commands
1.12 Stata resources
1.13 Exercises
2 Data management and graphics
2.1 Introduction
2.2 Types of data
2.2.1 Text or ASCII data
2.2.2 Internal numeric data
2.2.3 String data
2.2.4 Formats for displaying numeric data
2.3 Inputting data
2.3.1 General principles
2.3.2 Inputting data already in Stata format
2.3.3 Inputting data from the keyboard
2.3.4 Inputting nontext data
2.3.5 Inputting text data from a spreadsheet
2.3.6 Inputting text data in free format
2.3.7 Inputting text data in fixed format
2.3.8 Dictionary files
2.3.9 Common pitfalls
2.4 Data management
2.4.1 PSID example
2.4.2 Naming and labeling variables
2.4.3 Viewing data
2.4.4 Using original documentation
2.4.5 Missing values
2.4.6 Imputing missing data
2.4.7 Transforming data (generate, replace, egen, recode)
The generate and replace commands
The egen command
The recode command
The by prefix
Indicator variables
Set of indicator variables
Interactions
Demeaning
2.4.8 Saving data
2.4.9 Selecting the sample
2.5 Manipulating datasets
2.5.1 Ordering observations and variables
2.5.2 Preserving and restoring a dataset
2.5.3 Wide and long forms for a dataset
2.5.4 Merging datasets
2.5.5 Appending datasets
2.6 Graphical display of data
2.6.1 Stata graph commands
Example graph commands
Saving and exporting graphs
Learning how to use graph commands
2.6.2 Box-and-whisker plot
2.6.3 Histogram
2.6.4 Kernel density plot
2.6.5 Twoway scatterplots and fitted lines
2.6.6 Lowess, kernel, local linear, and nearest-neighbor regression
2.6.7 Multiple scatterplots
2.7 Stata resources
2.8 Exercises
3 Linear regression basics
3.1 Introduction
3.2 Data and data summary
3.2.1 Data description
3.2.2 Variable description
3.2.3 Summary statistics
3.2.4 More-detailed summary statistics
3.2.5 Tables for data
3.2.6 Statistical tests
3.2.7 Data plots
3.3 Regression in levels and logs
3.3.1 Basic regression theory
3.3.2 OLS regression and matrix algebra
3.3.3 Properties of the OLS estimator
3.3.4 Heteroskedasticity-robust standard errors
3.3.5 Cluster–robust standard errors
3.3.6 Regression in logs
3.4 Basic regression analysis
3.4.1 Correlations
3.4.2 The regress command
3.4.3 Hypothesis tests
3.4.4 Tables of output from several regressions
3.4.5 Even better tables of regression output
3.4.6 Factor variables for categorical variables and interactions
3.5 Specification analysis
3.5.1 Specification tests and model diagnostics
3.5.2 Residual diagnostic plots
3.5.3 Influential observations
3.5.4 Specification tests
Test of omitted variables
Test of the Box–Cox model
Test of the functional form of the conditional mean
Heteroskedasticity test
Omnibus test
3.5.5 Tests have power in more than one direction
3.6 Prediction
3.6.1 In-sample prediction
3.6.2 MEs and elasticities
3.6.3 Prediction in logs: The retransformation problem
3.6.4 Prediction exercise
3.7 Sampling weights
3.7.1 Weights
3.7.2 Weighted mean
3.7.3 Weighted regression
3.7.4 Weighted prediction and MEs
3.8 OLS using Mata
3.9 Stata resources
3.10 Exercises
4 Simulation
4.1 Introduction
4.2 Pseudorandom-number generators: Introduction
4.2.1 Uniform random-number generation
4.2.2 Draws from normal
4.2.3 Draws from t, chi-squared, F, gamma, and beta
4.2.4 Draws from binomial, Poisson, and negative binomial
Independent (but not identically distributed) draws from binomial
Independent (but not identically distributed) draws from Poisson
Histograms and density plots
4.3 Distribution of the sample mean
4.3.1 Stata program
4.3.2 The simulate command
4.3.3 Central limit theorem simulation
4.3.4 The postfile command
4.3.5 Alternative central limit theorem simulation
4.4 Pseudorandom-number generators: Further details
4.4.1 Inverse-probability transformation
4.4.2 Direct transformation
4.4.3 Other methods
4.4.4 Draws from truncated normal
4.4.5 Draws from multivariate normal
Direct draws from multivariate normal
Transformation using Cholesky decomposition
4.4.6 Draws using Markov chain Monte Carlo method
4.5 Computing integrals
4.5.1 Quadrature
4.5.2 Monte Carlo integration
4.5.3 Monte Carlo integration using different S
4.6 Simulation for regression: Introduction
4.6.1 Simulation example: OLS with
X^{2} errors
4.6.2 Interpreting simulation output
Unbiasedness of estimator
Standard errors
t statistic
Test size
Number of simulations
4.6.3 Variations
Different sample size and number of simulations
Test power
Different error distributions
4.6.4 Estimator inconsistency
4.6.5 Simulation with endogenous regressors
4.7 Stata resources
4.8 Exercises
5 GLS regression
5.1 Introduction
5.2 GLS and FGLS regression
5.2.1 GLS for heteroskedastic errors
5.2.2 GLS and FGLS
5.2.3 Weighted least squares and robust standard errors
5.2.4 Leading examples
5.3 Modeling heteroskedastic data
5.3.1 Simulated dataset
5.3.2 OLS estimation
5.3.3 Detecting heteroskedasticity
5.3.4 FGLS estimation
5.3.5 WLS estimation
5.4 System of linear regressions
5.4.1 SUR model
5.4.2 The sureg command
5.4.3 Application to two categories of expenditures
5.4.4 Robust standard errors
5.4.5 Testing cross-equation constraints
5.4.6 Imposing cross-equation constraints
5.5 Survey data: Weighting, clustering, and stratification
5.5.1 Survey design
5.5.2 Survey mean estimation
5.5.3 Survey linear regression
5.6 Stata resources
5.7 Exercises
6 Linear instrumental-variables regression
6.1 Introduction
6.2 IV estimation
6.2.1 Basic IV theory
6.2.2 Model setup
6.2.3 IV estimators: IV, 2SLS, and GMM
6.2.4 Instrument validity and relevance
6.2.5 Robust standard-error estimates
6.3 IV example
6.3.1 The ivregress command
6.3.2 Medical expenditures with one endogenous regressor
6.3.3 Available instruments
6.3.4 IV estimation of an exactly identified model
6.3.5 IV estimation of an overidentified model
6.3.6 Testing for regressor endogeneity
6.3.7 Tests of overidentifying restrictions
6.3.8 IV estimation with a binary endogenous regressor
6.4 Weak instruments
6.4.1 Finite-sample properties of IV estimators
6.4.2 Weak instruments
Diagnostics for weak instruments
Formal tests for weak instruments
6.4.3 The estat firststage command
6.4.4 Just-identified model
6.4.5 Overidentified model
6.4.6 More than one endogenous regressor
6.4.7 Sensitivity to choice of instruments
6.5 Better inference with weak instruments
6.5.1 Conditional tests and confidence intervals
6.5.2 LIML estimator
6.5.3 Jackknife IV estimator
6.5.4 Comparison of 2SLS, LIML, JIVE, and GMM
6.6 3SLS systems estimation
6.7 Stata resources
6.8 Exercises
7 Quantile regression
7.1 Introduction
7.2 QR
7.2.1 Conditional quantiles
7.2.2 Computation of QR estimates and standard errors
7.2.3 The qreg, bsqreg, and sqreg commands
7.3 QR for medical expenditures data
7.3.1 Data summary
7.3.2 QR estimates
7.3.3 Interpretation of conditional quantile coefficients
7.3.4 Retransformation
7.3.5 Comparison of estimates at different quantiles
7.3.6 Heteroskedasticity test
7.3.7 Hypothesis tests
7.3.8 Graphical display of coefficients over quantiles
7.4 QR for generated heteroskedastic data
7.4.1 Simulated dataset
7.4.2 QR estimates
7.5 QR for count data
7.5.1 Quantile count regression
7.5.2 The qcount command
7.5.3 Summary of doctor visits data
7.5.4 Results from QCR
7.6 Stata resources
7.7 Exercises
8 Linear panel-data models: Basics
8.1 Introduction
8.2 Panel-data methods overview
8.2.1 Some basic considerations
8.2.2 Some basic panel models
Individual-effects model
Fixed-effects model
Random-effects model
Pooled model or population-averaged model
Two-way–effects model
Mixed linear models
8.2.3 Cluster–robust inference
8.2.4 The xtreg command
8.2.5 Stata linear panel-data commands
8.3 Panel-data summary
8.3.1 Data description and summary statistics
8.3.2 Panel-data organization
8.3.3 Panel-data description
8.3.4 Within and between variation
8.3.5 Time-series plots for each individual
8.3.6 Overall scatterplot
8.3.7 Within scatterplot
8.3.8 Pooled OLS regression with cluster–robust standard errors
8.3.9 Time-series autocorrelations for panel data
8.3.10 Error correlation in the RE model
8.4 Pooled or population-averaged estimators
8.4.1 Pooled OLS estimator
8.4.2 Pooled FGLS estimator or population-averaged estimator
8.4.3 The xtreg, pa command
8.4.4 Application of the xtreg, pa command
8.5 Within estimator
8.5.1 Within estimator
8.5.2 The xtreg, fe command
8.5.3 Application of the xtreg, fe command
8.5.4 Least-squares dummy-variables regression
8.6 Between estimator
8.6.1 Between estimator
8.6.2 Application of the xtreg, be command
8.7 RE estimator
8.7.1 RE estimator
8.7.2 The xtreg, re command
8.7.3 Application of the xtreg, re command
8.8 Comparison of estimators
8.8.1 Estimates of variance components
8.8.2 Within and between R-squared
8.8.3 Estimator comparison
8.8.4 Fixed effects versus random effects
8.8.5 Hausman test for fixed effects
The hausman command
Robust Hausman test
8.8.6 Prediction
8.9 First-difference estimator
8.9.1 First-difference estimator
8.9.2 Strict and weak exogeneity
8.10 Long panels
8.10.1 Long-panel dataset
8.10.2 Pooled OLS and PFGLS
8.10.3 The xtpcse and xtgls commands
8.10.4 Application of the xtgls, xtpcse, and xtscc commands
8.10.5 Separate regressions
8.10.6 FE and RE models
8.10.7 Unit roots and cointegration
8.11 Panel-data management
8.11.1 Wide-form data
8.11.2 Convert wide form to long form
8.11.3 Convert long form to wide form
8.11.4 An alternative to wide-form data
8.12 Stata resources
8.13 Exercises
9 Linear panel-data models: Extensions
9.1 Introduction
9.2 Panel IV estimation
9.2.1 Panel IV
9.2.2 The xtivreg command
9.2.3 Application of the xtivreg command
9.2.4 Panel IV extensions
9.3 Hausman–Taylor estimator
9.3.1 Hausman–Taylor estimator
9.3.2 The xthtaylor command
9.3.3 Application of the xthtaylor command
9.4 Arellano–Bond estimator
9.4.1 Dynamic model
9.4.2 IV estimation in the FD model
9.4.3 The xtabond command
9.4.4 Arellano–Bond estimator: Pure time series
9.4.5 Arellano–Bond estimator: Additional regressors
9.4.6 Specification tests
9.4.7 The xtdpdsys command
9.4.8 The xtdpd command
9.5 Mixed linear models
9.5.1 Mixed linear model
9.5.2 The xtmixed command
9.5.3 Random-intercept model
9.5.4 Cluster–robust standard errors
9.5.5 Random-slopes model
9.5.6 Random-coefficients model
9.5.7 Two-way random-effects model
9.6 Clustered data
9.6.1 Clustered dataset
9.6.2 Clustered data using nonpanel commands
9.6.3 Clustered data using panel commands
9.6.4 Hierarchical linear models
9.7 Stata resources
9.8 Exercises
10 Nonlinear regression methods
10.1 Introduction
10.2 Nonlinear example: Doctor visits
10.2.1 Data description
10.2.2 Poisson model description
10.3 Nonlinear regression methods
10.3.1 MLE
10.3.2 The poisson command
10.3.3 Postestimation commands
10.3.4 NLS
10.3.5 The nl command
10.3.6 GLM
10.3.7 The glm command
10.3.8 The gmm command
10.3.9 Other estimators
10.4 Different estimates of the VCE
10.4.1 General framework
10.4.2 The vce() option
10.4.3 Application of the vce() option
10.4.4 Default estimate of the VCE
10.4.5 Robust estimate of the VCE
10.4.6 Cluster–robust estimate of the VCE
10.4.7 Heteroskedasticity- and autocorrelation-consistent estimate
of the VCE
10.4.8 Bootstrap standard errors
10.4.9 Statistical inference
10.5 Prediction
10.5.1 The predict and predictnl commands
10.5.2 Application of predict and predictnl
10.5.3 Out-of-sample prediction
10.5.4 Prediction at a specified value of one of the regressors
10.5.5 Prediction at a specified value of all the regressors
10.5.6 Prediction of other quantities
10.5.7 The margins command for prediction
10.6 Marginal effects
10.6.1 Calculus and finite-difference methods
10.6.2 MEs estimates AME, MEM, and MER
10.6.3 Elasticities and semielasticities
10.6.4 Simple interpretations of coefficients in single-index models
10.6.5 The margins command for marginal effects
10.6.6 MEM: Marginal effect at mean
Comparison of calculus and finite-difference methods
10.6.7 MER: Marginal effect at representative value
10.6.8 AME: Average marginal effect
10.6.9 Elasticities and semielasticities
10.6.10 AME computed manually
10.6.11 Polynomial regressors
10.6.12 Interacted regressors
10.6.13 Complex interactions and nonlinearities
10.7 Model diagnostics
10.7.1 Goodness-of-fit measures
10.7.2 Information criteria for model comparison
10.7.3 Residuals
10.7.4 Model-specification tests
10.8 Stata resources
10.9 Exercises
11 Nonlinear optimization methods
11.1 Introduction
11.2 Newton–Raphson method
11.2.1 NR method
11.2.2 NR method for Poisson
11.2.3 Poisson NR example using Mata
Core Mata code for Poisson NR iterations
Complete Stata and Mata code for Poisson NR iterations
11.3 Gradient methods
11.3.1 Maximization options
11.3.2 Gradient methods
11.3.3 Messages during iterations
11.3.4 Stopping criteria
11.3.5 Multiple maximums
11.3.6 Numerical derivatives
11.4 The ml command: lf method
11.4.1 The ml command
11.4.2 The lf method
11.4.3 Poisson example: Single-index model
11.4.4 Negative binomial example: Two-index model
11.4.5 NLS example: Nonlikelihood model
11.5 Checking the program
11.5.1 Program debugging using ml check and ml trace
11.5.2 Getting the program to run
11.5.3 Checking the data
11.5.4 Multicollinearity and near collinearity
11.5.5 Multiple optimums
11.5.6 Checking parameter estimation
11.5.7 Checking standard-error estimation
11.6 The ml command: d0, d1, d2, lf0, lf1, and lf2 methods
11.6.1 Evaluator functions
11.6.2 The d0 method
11.6.3 The d1 method
11.6.4 The lf1 method with the robust estimate of the VCE
11.6.5 The d2 and lf2 methods
11.7 The Mata optimize() function
11.7.1 Type d and gf evaluators
11.7.2 Optimize functions
11.7.3 Poisson example
Evaluator program for Poisson MLE
The optimize() function for Poisson MLE
11.8 Generalized method of moments
11.8.1 Definition
11.8.2 Nonlinear IV example
11.8.3 GMM using the Mata optimize() function
11.9 Stata resources
11.10 Exercises
12 Testing methods
12.1 Introduction
12.2 Critical values and p-values
12.2.1 Standard normal compared with Student's t
12.2.2 Chi-squared compared with F
12.2.3 Plotting densities
12.2.4 Computing p-values and critical values
12.2.5 Which distributions does Stata use?
12.3 Wald tests and confidence intervals
12.3.1 Wald test of linear hypotheses
12.3.2 The test command
Test single coefficient
Test several hypotheses
Test of overall significance
Test calculated from retrieved coefficients and VCE
12.3.3 One-sided Wald tests
12.3.4 Wald test of nonlinear hypotheses (delta method)
12.3.5 The testnl command
12.3.6 Wald confidence intervals
12.3.7 The lincom command
12.3.8 The nlcom command (delta method)
12.3.9 Asymmetric confidence intervals
12.4 Likelihood-ratio tests
12.4.1 Likelihood-ratio tests
12.4.2 The lrtest command
12.4.3 Direct computation of LR tests
12.5 Lagrange multiplier test (or score test)
12.5.1 LM tests
12.5.2 The estat command
12.5.3 LM test by auxiliary regression
12.6 Test size and power
12.6.1 Simulation DGP: OLS with chi-squared errors
12.6.2 Test size
12.6.3 Test power
12.6.4 Asymptotic test power
12.7 Specification tests
12.7.1 Moment-based tests
12.7.2 Information matrix test
12.7.3 Chi-squared goodness-of-fit test
12.7.4 Overidentifying restrictions test
12.7.5 Hausman test
12.7.6 Other tests
12.8 Stata resources
12.9 Exercises
13 Bootstrap methods
13.1 Introduction
13.2 Bootstrap methods
13.2.1 Bootstrap estimate of standard error
13.2.2 Bootstrap methods
13.2.3 Asymptotic refinement
13.2.4 Use the bootstrap with caution
13.3 Bootstrap pairs using the vce(bootstrap) option
13.3.1 Bootstrap-pairs method to estimate VCE
13.3.2 The vce(bootstrap) option
13.3.3 Bootstrap standard-errors example
13.3.4 How many bootstraps?
13.3.5 Clustered bootstraps
13.3.6 Bootstrap confidence intervals
13.3.7 The postestimation estat bootstrap command
13.3.8 Bootstrap confidence-intervals example
13.3.9 Bootstrap estimate of bias
13.4 Bootstrap pairs using the bootstrap command
13.4.1 The bootstrap command
13.4.2 Bootstrap parameter estimate from a Stata estimation command
13.4.3 Bootstrap standard error from a Stata estimation command
13.4.4 Bootstrap standard error from a user-written estimation command
13.4.5 Bootstrap two-step estimator
13.4.6 Bootstrap Hausman test
13.4.7 Bootstrap standard error of the coefficient of variation
13.5 Bootstraps with asymptotic refinement
13.5.1 Percentile-t method
13.5.2 Percentile-t Wald test
13.5.3 Percentile-t Wald confidence interval
13.6 Bootstrap pairs using bsample and simulate
13.6.1 The bsample command
13.6.2 The bsample command with simulate
13.6.3 Bootstrap Monte Carlo exercise
13.7 Alternative resampling schemes
13.7.1 Bootstrap pairs
13.7.2 Parametric bootstrap
13.7.3 Residual bootstrap
13.7.4 Wild bootstrap
13.7.5 Subsampling
13.8 The jackknife
13.8.1 Jackknife method
13.8.2 The vce(jackknife) option and the jackknife command
13.9 Stata resources
13.10 Exercises
14 Binary outcome models
14.1 Introduction
14.2 Some parametric models
14.2.1 Basic model
14.2.2 Logit, probit, linear probability, and clog-log models
14.3 Estimation
14.3.1 Latent-variable interpretation and identification
14.3.2 ML estimation
14.3.3 The logit and probit commands
14.3.4 Robust estimate of the VCE
14.3.5 OLS estimation of LPM
14.4 Example
14.4.1 Data description
14.4.2 Logit regression
14.4.3 Comparison of binary models and parameter estimates
14.5 Hypothesis and specification tests
14.5.1 Wald tests
14.5.2 Likelihood-ratio tests
14.5.3 Additional model-specification tests
Lagrange multiplier test of generalized logit
Heteroskedastic probit regression
14.5.4 Model comparison
14.6 Goodness of fit and prediction
14.6.1 Pseudo-R^{2} measure
14.6.2 Comparing predicted probabilities with sample frequencies
14.6.3 Comparing predicted outcomes with actual outcomes
14.6.4 The predict command for fitted probabilities
14.6.5 The prvalue command for fitted probabilities
14.7 Marginal effects
14.7.1 Marginal effect at a representative value (MER)
14.7.2 Marginal effect at the mean (MEM)
14.7.3 Average marginal effect (AME)
14.7.4 The prchange command
14.8 Endogenous regressors
14.8.1 Example
14.8.2 Model assumptions
14.8.3 Structural-model approach
The ivprobit command
Maximum likelihood estimates
Two-step sequential estimates
14.8.4 IVs approach
14.9 Grouped data
14.9.1 Estimation with aggregate data
14.9.2 Grouped-data application
14.10 Stata resources
14.11 Exercises
15 Multinomial models
15.1 Introduction
15.2 Multinomial models overview
15.2.1 Probabilities and MEs
15.2.2 Maximum likelihood estimation
15.2.3 Case-specific and alternative-specific regressors
15.2.4 Additive random-utility model
15.2.5 Stata multinomial model commands
15.3 Multinomial example: Choice of fishing mode
15.3.1 Data description
15.3.2 Case-specific regressors
15.3.3 Alternative-specific regressors
15.4 Multinomial logit model
15.4.1 The mlogit command
15.4.2 Application of the mlogit command
15.4.3 Coefficient interpretation
15.4.4 Predicted probabilities
15.4.5 MEs
15.5 Conditional logit model
15.5.1 Creating long-form data from wide-form data
15.5.2 The asclogit command
15.5.3 The clogit command
15.5.4 Application of the asclogit command
15.5.5 Relationship to multinomial logit model
15.5.6 Coefficient interpretation
15.5.7 Predicted probabilities
15.5.8 MEs
15.6 Nested logit model
15.6.1 Relaxing the independence of irrelevant alternatives assumption
15.6.2 NL model
15.6.3 The nlogit command
15.6.4 Model estimates
15.6.5 Predicted probabilities
15.6.6 MEs
15.6.7 Comparison of logit models
15.7 Multinomial probit model
15.7.1 MNP
15.7.2 The mprobit command
15.7.3 Maximum simulated likelihood
15.7.4 The asmprobit command
15.7.5 Application of the asmprobit command
15.7.6 Predicted probabilities and MEs
15.8 Random-parameters logit
15.8.1 Random-parameters logit
15.8.2 The mixlogit command
15.8.3 Data preparation for mixlogit
15.8.4 Application of the mixlogit command
15.9 Ordered outcome models
15.9.1 Data summary
15.9.2 Ordered outcomes
15.9.3 Application of the ologit command
15.9.4 Predicted probabilities
15.9.5 MEs
15.9.6 Other ordered models
15.10 Multivariate outcomes
15.10.1 Bivariate probit
15.10.2 Nonlinear SUR
15.11 Stata resources
15.12 Exercises
16 Tobit and selection models
16.1 Introduction
16.2 Tobit model
16.2.1 Regression with censored data
16.2.2 Tobit model setup
16.2.3 Unknown censoring point
16.2.4 Tobit estimation
16.2.5 ML estimation in Stata
16.3 Tobit model example
16.3.1 Data summary
16.3.2 Tobit analysis
16.3.3 Prediction after tobit
16.3.4 Marginal effects
Left-truncated, left-censored, and right-truncated examples
Left-censored case computed directly
Marginal impact on probabilities
16.3.5 The ivtobit command
16.3.6 Additional commands for censored regression
16.4 Tobit for lognormal data
16.4.1 Data example
16.4.2 Setting the censoring point for data in logs
16.4.3 Results
16.4.4 Two-limit tobit
16.4.5 Model diagnostics
16.4.6 Tests of normality and homoskedasticity
Generalized residuals and scores
Test of normality
Test of homoskedasticity
16.4.7 Next step?
16.5 Two-part model in logs
16.5.1 Model structure
16.5.2 Part 1 specification
16.5.3 Part 2 of the two-part model
16.6 Selection model
16.6.1 Model structure and assumptions
16.6.2 ML estimation of the sample-selection model
16.6.3 Estimation without exclusion restrictions
16.6.4 Two-step estimation
16.6.5 Estimation with exclusion restrictions
16.7 Prediction from models with outcome in logs
16.7.1 Predictions from tobit
16.7.2 Predictions from two-part model
16.7.3 Predictions from selection model
16.8 Stata resources
16.9 Exercises
17 Count-data models
17.1 Introduction
17.2 Features of count data
17.2.1 Generated Poisson data
17.2.2 Overdispersion and negative binomial data
17.2.3 Modeling strategies
17.2.4 Estimation methods
17.3 Empirical example 1
17.3.1 Data summary
17.3.2 Poisson model
Poisson model results
Robust estimate of VCE for Poisson MLE
Test of overdispersion
Coefficient interpretation and marginal effects
17.3.3 NB2 model
NB2 model results
Fitted probabilities for Poisson and NB2 models
The countfit command
The prvalue command
Discussion
Generalized NB model
17.3.4 Nonlinear least-squares estimation
17.3.5 Hurdle model
Variants of the hurdle model
Application of the hurdle model
17.3.6 Finite-mixture models
FMM specification
Simulated FMM sample with comparisons
ML estimation of the FMM
The fmm command
Application: Poisson finite-mixture model
Interpretation
Comparing marginal effects
Application: NB finite-mixture model
Model selection
Cautionary note
17.4 Empirical example 2
17.4.1 Zero-inflated data
17.4.2 Models for zero-inflated data
17.4.3 Results for the NB2 model
The prcounts command
17.4.4 Results for ZINB
17.4.5 Model comparison
The countfit command
Model comparison using countfit
17.5 Models with endogenous regressors
17.5.1 Structural-model approach
Model and assumptions
Two-step estimation
Application
17.5.2 Nonlinear IV method
17.6 Stata resources
17.7 Exercises
18 Nonlinear panel models
18.1 Introduction
18.2 Nonlinear panel-data overview
18.2.1 Some basic nonlinear panel models
FE models
RE models
Pooled models or population-averaged models
Comparison of models
18.2.2 Dynamic models
18.2.3 Stata nonlinear panel commands
18.3 Nonlinear panel-data example
18.3.1 Data description and summary statistics
18.3.2 Panel-data organization
18.3.3 Within and between variation
18.3.4 FE or RE model for these data?
18.4 Binary outcome models
18.4.1 Panel summary of the dependent variable
18.4.2 Pooled logit estimator
18.4.3 The xtlogit command
18.4.4 The xtgee command
18.4.5 PA logit estimator
18.4.6 RE logit estimator
18.4.7 FE logit estimator
18.4.8 Panel logit estimator comparison
18.4.9 Prediction and marginal effects
18.4.10 Mixed-effects logit estimator
18.5 Tobit model
18.5.1 Panel summary of the dependent variable
18.5.2 RE tobit model
18.5.3 Generalized tobit models
18.5.4 Parametric nonlinear panel models
18.6 Count-data models
18.6.1 The xtpoisson command
18.6.2 Panel summary of the dependent variable
18.6.3 Pooled Poisson estimator
18.6.4 PA Poisson estimator
18.6.5 RE Poisson estimators
18.6.6 FE Poisson estimator
18.6.7 Panel Poisson estimators comparison
18.6.8 Negative binomial estimators
18.7 Stata resources
18.8 Exercises
A Programming in Stata
A.1 Stata matrix commands
A.1.1 Stata matrix overview
A.1.2 Stata matrix input and output
Matrix input by hand
Matrix input from Stata estimation results
A.1.3 Stata matrix subscripts and combining matrices
A.1.4 Matrix operators
A.1.5 Matrix functions
A.1.6 Matrix accumulation commands
A.1.7 OLS using Stata matrix commands
A.2 Programs
A.2.1 Simple programs (no arguments or access to results)
A.2.2 Modifying a program
A.2.3 Programs with positional arguments
A.2.4 Temporary variables
A.2.5 Programs with named positional arguments
A.2.6 Storing and retrieving program results
A.2.7 Programs with arguments using standard Stata syntax
A.2.8 Ado-files
A.3 Program debugging
A.3.1 Some simple tips
A.3.2 Error messages and return code
A.3.3 Trace
B Mata
B.1 How to run Mata
B.1.1 Mata commands in Mata
B.1.2 Mata commands in Stata
B.1.3 Stata commands in Mata
B.1.4 Interactive versus batch use
B.1.5 Mata help
B.2 Mata matrix commands
B.2.1 Mata matrix input
Matrix input by hand
Identity matrices, unit vectors, and matrices of constants
Matrix input from Stata data
Matrix input from Stata matrix
Stata interface functions
B.2.2 Mata matrix operators
Element-by-element operators
B.2.3 Mata functions
Scalar and matrix functions
Matrix inversion
B.2.4 Mata cross products
B.2.5 Mata matrix subscripts and combining matrices
B.2.6 Transferring Mata data and matrices to Stata
Creating Stata matrices from Mata matrices
Creating Stata data from a Mata vector
B.3 Programming in Mata
B.3.1 Declarations
B.3.2 Mata program
B.3.3 Mata program with results output to Stata
B.3.4 Stata program that calls a Mata program
B.3.5 Using Mata in ado-files
Glossary of abbreviations
References