Preface
New to the Second Edition
Guiding Principles Underlying Our Approach
Overview of Content Coverage and Intended Audience
Acknowledgments
1 INTRODUCTION
The Role of Statistical Software in Data Analysis
Statistics: Descriptive and Inferential
Variables and Constants
The Measurement of Variables
Nominal Level
Ordinal Level
Interval Level
Ratio Level
Choosing a Scale of Measurement
Discrete and Continuous Variables
Setting a Context with Real Data
Exercises
2 EXAMINING UNIVARIATE DISTRIBUTIONS
Counting the Occurrence of Data Values
When Variables are Measured at the Nominal Level
Frequency and Percent Distribution Tables
Bar Charts
Pie Charts
When Variables are Measured at the Ordinal, Interval, or Ratio Level
Frequency and Percent Distribution Tables
Stem-and-Leaf Displays
Histograms
Line Graphs
Describing the Shape of a Distribution
Accumulating Data
Cumulative Percent Distributions
Ogive Curves
Percentile Ranks
Percentiles
Five-Number Summaries and Boxplots
Modifying the Appearance of Graphs
Summary of Graphical Selection
Summary of Stata Commands
Exercises
3 MEASURES OF LOCATION, SPREAD, AND SKEWNESS
Characterizing the Location of a Distribution
The Mode
The Median
The Arithmetic Mean
Interpreting the Mean of a Dichotomous Variable
The Weighted Mean
Comparing the Mode, Median, and Mean
Characterizing the Spread of a Distribution
The Range and Interquartile Range
The Variance
The Standard Deviation
Characterizing the Skewness of a Distribution
Selecting Measures of Location and Spread
Applying What We Have Learned
Summary of Stata Commands
Helpful Hints When Using Stata
Online Resources
The Stata Command
Stata Tips
Exercises
4 RE–EXPRESSING VARIABLES
Linear and Nonlinear Transformations
Linear Transformations: Addition, Subtraction, Multiplication, and Division
The Effect on the Shape of a Distribution
The Effect on Summary Statistics of a Distribution
Common Linear Transformations
Standard Scores
z-Scores
Using z-Scores to Detect Outliers
Using z-Scores to Compare Scores in Different Distributions
Relating z-Scores to Percentile Ranks
Nonlinear Transformations: Square Roots and Logarithms
Nonlinear Transformations: Ranking Variables
Other Transformations: Recoding and Combining Variables
Recoding Variables
Combining Variables
Data Management Fundamentals: The Do-File
Summary of Stata Commands
Exercises
5 EXPLORING RELATIONSHIPS BETWEEN TWO VARIABLES
When Both Variables are at Least Interval-Leveled
Scatterplots
The Pearson Product–Moment Correlation Coefficient
Interpreting the Pearson Correlation Coefficient
Judging the Strength of the Linear Relationship
The Correlation Scale Itself Is Ordinal
Correlation Does Not Imply Causation
The Effect of Linear Transformations
Restriction of Range
The Shape of the Underlying Distributions
The Reliability of the Data
When at Least One Variable Is Ordinal and the Other Is at Least Ordinal: The
Spearman Rank Correlation Coefficient
When at Least One Variable Is Dichotomous: Other Special Cases of the Pearson
Correlation Coefficient
The Point Biserial Correlation Coefficient: The Case of One at Least
Interval and One Dichotomous Variable
The Phi Coefficient: The Case of Two Dichotomous Variables
Other Visual Displays of Bivariate Relationships
Selection of Appropriate Statistic or Graph to Summarize a Relationship
Summary of Stata Commands
Exercises
6 SIMPLE LINEAR REGRESSION
The “Best-Fitting” Linear Equation
The Accuracy of Prediction Using the Linear Regression Model
The Standardized Regression Equation
R As a Measure of the Overall Fit of the Linear Regression Model
Simple Linear Regression When the Independent Variable Is Dichotomous
Using r and R As Measures of Effect Size
Emphasizing the Importance of the Scatterplot
Summary of Stata Commands
Exercises
7 PROBABILITY FUNDAMENTALS
The Discrete Case
The Complement Rule of Probability
The Additive Rules of Probability
First Additive Rule of Probability
Second Additive Rule of Probability
The Multiplicative Rule of Probability
The Relationship between Independence and Mutual Exclusivity
Conditional Probability
The Law of Total Probability
Bayes' Theorem
The Law of Large Numbers
Exercises
8 THEORETICAL PROBABILITY MODELS
The Binomial Probability Model and Distribution
The Applicability of the Binomial Probability Model
The Normal Probability Model and Distribution
Using the Normal Distribution to Approximate the Binomial Distribution
Summary of Stata Commands
Exercises
9 THE ROLE OF SAMPLING IN INFERENTIAL STATISTICS
Samples and Populations
Random Samples
Obtaining a Simple Random Sample
Sampling with and without Replacement
Sampling Distributions
Describing the Sampling Distribution of Means Empirically
Describing the Sampling Distribution of Means Theoretically
Central Limit Theorem
Estimators and BIAS
Summary of Stata Commands
Exercises
10 INFERENCES INVOLVING THE MEAN OF A SINGLE POPULATION
WHEN σ IS KNOWN
Estimating the Population Mean, μ, When the Population Standard Deviation,
σ, Is Known
Interval Estimation
Relating the Length of a Confidence Interval, the Level of Confidence, and the
Sample Size
Hypothesis Testing
The Relationship between Hypothesis Testing and Interval Estimation
Effect Size
Type II Error and the Concept of Power
Increasing the Level of Significance, α
Increasing the Effect Size, δ
Decreasing the Standard Error of the Mean, σ𝓍̅
Closing Remarks
Summary of Stata Commands
Exercises
11 INFERENCES INVOLVING THE MEAN WHEN σ IS NOT
KNOWN: ONE- AND TWO-SAMPLE DESIGNS
Single Sample Designs When the Parameter of Interest Is the Mean and σ
Is Not Known
The t- Distribution
Degrees of Freedom for the One-Sample t-Test
Violating the Assumption of a Normally Distributed Parent Population in the
One-Sample t-Test
Confidence Intervals for the One-Sample t-Test
Hypothesis Tests: The One-Sample t-Test
Effect Size for the One-Sample t-Test
Two-Sample Designs When the Parameter of Interest Is μ, and σ Is
Not Known
Independent (or Unrelated) and Dependent (or Related) Samples
Independent Samples t-Test and Confidence Interval
The Assumptions of the Independent Samples t-Test
Effect Size for the Independent Samples t-Test
Paired Samples t-Test and Confidence Interval
The Assumptions of the Paired Samples t-Test
Effect Size for the Paired Samples t-Test
The Bootstrap
Conducting Power Analyses for
t-Tests on Means
Summary
Summary of Stata Commands
Exercises
12 RESEARCH DESIGN: INTRODUCTION AND OVERVIEW
Questions and their Link to Descriptive, Relational, and Causal Research
Studies
The Need for a Good Measure of Our Construct: Weight
The Descriptive Study
From Descriptive to Relational Studies
From Relational to Causal Studies
The Gold Standard of Causal Studies: The True Experiment and Random Assignment
Comparing Two Kidney Stone Treatments Using a Non-Randomized Controlled Study
Including Blocking in a Research Design
Underscoring the Importance of Having a True Control Group Using Randomization
Analytic Methods for Bolstering Claims of Causality from Observational Data
Quasi-Experimental Designs
Threats to the Internal Validity of a Quasi-Experimental Design
Threats to the External Validity of a Quasi-Experimental Design
Threats to the Validity of a Study: Some Clarifications and Caveats
Threats to the Validity of a Study: Some Examples
Exercises
13 ONE-WAY ANALYSIS OF VARIANCE
The Disadvantage of Multiple
t-Tests
The One-Way Analysis of Variance
A Graphical Illustration of the Role of Variance in Tests on Means
ANOVA As an Extension of the Independent Samples
t-Test
Developing an Index of Separation for the Analysis of Variance
Carrying Out the ANOVA Computation
The Between Group Variance (MSB)
The Within Group Variance (MSW)
The Assumptions of the One-Way ANOVA
Testing the Equality of Population Means: The
F-Ratio
How to Read the Tables and Use Stata Functions for the
F-Distribution
ANOVA Summary Table
Measuring the Effect Size
Post-Hoc Multiple Comparison Tests
The Bonferroni Adjustment: Testing Planned Comparisons
The Bonferroni Tests on Multiple Measures
Conducting Power Analyses for One-Way ANOVA
Summary of Stata Commands
Exercises
14 TWO-WAY ANALYSIS OF VARIANCE
The Two-Factor Design
The Concept of Interaction
The Hypotheses That are Tested by a Two-Way Analysis of Variance
Assumptions of the Two-Way Analysis of Variance
Balanced versus Unbalanced Factorial Designs
Partitioning the Total Sum of Squares
Using the
F-Ratio to Test the Effects in Two-Way ANOVA
Carrying Out the Two-Way ANOVA Computation by Hand
Decomposing Score Deviations about the Grand Mean
Modeling Each Score as a Sum of Component Parts
Explaining the Interaction As a Joint (or Multiplicative) Effect
Measuring Effect Size
Fixed versus Random Factors
Post-hoc Multiple Comparison Tests
Simple Effects and Pairwise Comparisons
Summary of Steps to Be Taken in a Two-Way ANOVA Procedure
Conducting Power Analyses for Two-Way ANOVA
Summary of Stata Commands
Exercises
15 CORRELATION AND SIMPLE REGRESSION AS INFERENTIAL
TECHNIQUES
The Bivariate Normal Distribution
Testing whether the Population Pearson Product-Moment Correlation Equals Zero
Using a Confidence Interval to Estimate the Size of the Population Correlation
Coefficient, ρ
Revisiting Simple Linear Regression for Prediction
Estimating the Population Standard Error of Prediction, σΥ|Χ
Testing the b-Weight for Statistical Significance
Explaining Simple Regression Using an Analysis of Variance Framework
Measuring the Fit of the Overall Regression Equation: Using R and R2
Relating R2 to σ2Υ|Χ
Testing R2 for Statistical Significance
Estimating the True Population R2: The Adjusted R2
Exploring the Goodness of Fit of the Regression Equation: Using Regression
Diagnostics
Residual Plots: Evaluating the Assumptions Underlying Regression
Detecting Influential Observations: Discrepancy and Leverage
Using Stata to Obtain Leverage
Using Stata to Obtain Discrepancy
Using Stata to Obtain Influence
Using Diagnostics to Evaluate the Ice Cream Sales Example
Using the Prediction Model to Predict Ice Cream Sales
Simple Regression When the Predictor is Dichotomous
Conducting Power Analyses for Correlation and Simple Regression
Summary of Stata Commands
Exercises
16 AN INTRODUCTION TO MULTIPLE REGRESSION
The Basic Equation with Two Predictors
Equations for
b, β, and
RΥ.12 When the
Predictors Are Not Correlated
Equations for
b, β, and
RΥ.12 When the
Predictors Are Correlated
Summarizing and Expanding on Some Important Principles of Multiple Regression
Testing the
b-Weights for Statistical Significance
Assessing the Relative Importance of the Independent Variables in the Equation
Measuring the Drop in
R2 Directly: An Alternative to the
Squared Semipartial Correlation
Evaluating the Statistical Significance of the Change in
R2
The
b-Weight As a Partial Slope in Multiple Regression
Multiple Regression When One of the Two Independent Variables is Dichotomous
Controlling Variables Statistically: A Closer Look
A Hypothetical Example
Conducting Power Analyses for Multiple Regression
Summary of Stata Commands
Exercises
17 TWO-WAY INTERACTIONS IN MULTIPLE REGRESSION
Testing the Statistical Significance of an Interaction Using Stata
Comparing the Y-Hat Values from the Additive and Interaction Models
Centering First-Order Effects if the Equation Has an Interaction
Probing the Nature of a Two-Way Interaction
Interaction When One of the Independent Variables Is Dichotomous and the Other Is Continuous
Methods Useful for Model Selection
Conducting a Power Analysis to Detect an Interaction
Summary of Stata Commands
Exercises
18 NONPARAMETRIC METHODS
Parametric versus Nonparametric Methods
Nonparametric Methods When the Dependent Variable Is at the Nominal Level
The Chi-Square Distribution (Χ
2)
The Chi-Square Goodness-of-Fit Test
The Chi-Square Test of Independence
Assumptions of the Chi-Square Test of Independence
Fisher’s Exact Test
Calculating the Fisher is Exact Test by Hand Using the Hypergeometric
Distribution
Nonparametric Methods When the Dependent Variable Is Ordinal-Leveled
Wilcoxon Sign Test
The Mann–Whitney
U Test or Wilcoxon's Rank-Sum Test
The Kruskal–Wallis Analysis of Variance
Summary of Stata Commands
Exercises
19 COMMUNICATING YOUR STATA RESULTS VIA EXCEL
Setting the Working Directory
Reproducing a Table of Univariate Summary Statistics in Excel
Using estpost and esttab
Using putexcel
Reproducing a Correlation Matrix As a Table in Excel
Using estpost and esttab
Using putexcel
Reproducing Regression Output As a Table in Excel
Using outreg2 to obtain a table of model statistics in Excel
Using eststo and esttab to obtain a table of model statistics in Excel
Using putexcel to reproduce a table of regression coefficients in Excel
Reproducing a Graph in Excel (Using putexcel)
Conclusion
Summary of Stata Commands
Exercises
Appendix A Data Set Descriptions
Appendix B Stata .Do-files and Data Sets in Stata Format
Appendix C Statistical Tables
Appendix D Solutions
References
Index