Part 1 **Foundations of Businesss Data Analysis**

**1. Introduction to Data Analysis and Decision Making**

1.1 Introduction: Hierarchy Between Data, Information, and Knowledge

1.2 Overview of the Book

1.3 Final Remarks

**2. Types of Variables and Measurement and Accuracy Scales**

2.1 Introduction

2.2 Types of Variables

2.2.1 Nonmetric or Qualitative Variables

2.2.2 Metric or Quantitative Variables

2.3 Types of Variables x Scales of Measurement

2.3.1 Nonmetric Variables—Nominal Scale

2.3.2 Nonmetric Variables—Ordinal Scale

2.3.3 Quantitative Variable—Interval Scale

2.3.4 Quantitative Variable—Ratio Scale

2.4 Types of Variables x Number of Categories and Scales of Accuracy

2.4.1 Dichotomous or Binary Variable (Dummy)

2.4.2 Polychotomous Variable

2.4.3 Discrete Quantitative Variable

2.4.4 Continuous Quantitative Variable

2.5 Final Remarks

2.6 Exercises

Part II **Descriptive Statistics**

**3. Univariate Descriptive Statistics**

3.1 Introduction

3.2 Frequency Distribution Table

3.2.1 Frequency Distribution Table for Qualitative Variables

3.2.2 Frequency Distribution Table for Discrete Data

3.2.3 Frequency Distribution Table for Continuous Data Grouped into Classes

3.3 Graphical Representation of the Results

3.3.1 Graphical Representation for Qualitative Variables

3.3.2 Graphical Representation for Quantitative Variables

3.4 The Most Common Summary-Measures in Univariate Descriptive Statistics

3.4.1 Measures of Position or Location

3.4.2 Measures of Dispersion or Variability

3.4.3 Measures of Shape

3.5 A Practical Example in Excel

3.6 A Practical Example on SPSS

3.6.1 Frequencies Option

3.6.2 Descriptives Option

3.6.3 Explore Option

3.7 A Practical Example on Stata

3.7.1 Univariate Frequency Distribution Tables on Stata

3.7.2 Summary of Univariate Descriptive Statistics on Stata

3.7.3 Calculating Percentiles on Stata

3.7.4 Charts on Stata: Histograms, Stem-and-Leaf, and Boxplots

3.8 Final Remarks

3.9 Exercises

**4. Bivariate Descriptive Statistics**

4.1 Introduction

4.2 Association Between Two Qualitative Variables

4.2.1 Joint Frequency Distribution Tables

4.2.2 Measures of Association

4.3 Correlation Between Two Quantitative Variables

4.3.1 Joint Frequency Distribution Tables

4.3.2 Graphical Representation Through a Scatter Plot

4.3.3 Measures of Correlation

4.4 Final Remarks

4.5 Exercises

Part III **Probabilistic Statistics**

**5. Introduction to Probability**

5.1 Introduction

5.2 Terminology and Concepts

5.2.1 Random Experiment

5.2.2 Sample Space

5.2.3 Events

5.2.4 Unions, Intersections, and Complements

5.2.5 Independent Events

5.2.6 Mutually Exclusive Events

5.3 Definition of Probability

5.4 Basic Probability Rules

5.4.1 Probability Variation Field

5.4.2 Probability of the Sample Space

5.4.3 Probability of an Empty Set

5.4.4 Probability Addition Rule

5.4.5 Probability of a Complementary Event

5.4.6 Probability Multiplication Rule for Independent Events

5.5 Conditional Probability

5.5.1 Probability Multiplication Rule

5.6 Bayes' Theorem

5.7 Combinatorial Analysiss

5.7.1 Arrangements

5.7.2 Combinations

5.7.3 Permutations

5.8 Final Remarks

5.9 Exercises

**6. Random Variables and Probability Distributions**

6.1 Introduction

6.2 Random Variables

6.2.1 Discrete Random Variable

6.2.2 Continuous Random Variable

6.3 Probability Distributions for Discrete Random Variables

6.3.1 Discrete Uniform Distribution

6.3.2 Bernoulli Distribution

6.3.3 Binomial Distribution

6.3.4 Geometric Distribution

6.3.5 Negative Binomial Distribution

6.3.6 Hypergeometric Distribution

6.3.7 Poisson Distribution

6.4 Probability Distributions for Continuous Random Variables

6.4.1 Uniform Distribution

6.4.2 Normal Distribution

6.4.3 Exponential Distribution

6.4.4 Gamma Distribution

6.4.5 Chi-Square Distribution

6.4.6 Student's *t* Distribution

6.4.7 Snedecor's *F* Distribution

6.5 Final Remarks

6.6 Exercises

Part IV **Statistical Inference**

**7. Sampling**

7.1 Introduction

7.2 Probability or Random Sampling

7.2.1 Simple Random Sampling

7.2.2 Systematic Sampling

7.2.3 Stratified Sampling

7.2.4 Cluster Sampling

7.3 Nonprobability or Nonrandom Sampling

7.3.1 Convenience Sampling

7.3.2 Judgmental or Purposive Sampling

7.3.3 Quota Sampling

7.3.4 Geometric Propagation or Snowball Sampling

7.4 Sample Size

7.4.1 Size of a Simple Random Sample

7.4.2 Size of the Systematic Sample

7.4.3 Size of the Stratified Sample

7.4.4 Size of a Cluster Sample

7.5 Final Remarks

7.6 Exercises

**8. Estimation**

8.1 Introduction

8.2 Point and Interval Estimation

8.2.1 Point Estimation

8.2.2 Interval Estimation

8.3 Point Estimation Methods

8.3.1 Method of Moments

8.3.2 Ordinary Least Squares

8.3.3 Maximum Likelihood Estimation

8.4 Interval Estimation or Confidence Intervals

8.4.1 Confidence Interval for the Population Mean (μ)

8.4.2 Confidence Interval for Proportions

8.4.3 Confidence Interval for the Population Variance

8.5 Final Remarks

8.6 Exercises

**9. Hypotheses Tests**

9.1 Introduction

9.2 Parametric Tests

9.3 Univariate Tests for Normality

9.3.1 Kolmogorov-Smirnov Test

9.3.2 Shapiro-Wilk Test

9.3.3 Shapiro-Francia Test

9.3.4 Solving Tests for Normality by Using SPSS Software

9.3.5 Solving Tests for Normality by Using Stata

9.4 Tests for the Homogeneity of Variances

9.4.1 Bartlett's χ^{2} Test

9.4.2 Cochran's *C* Test

9.4.3 Hartley's *F*_{max} Test

9.4.4 Levene's *F*-Test

9.4.5 Solving Levene's Test by Using SPSS Software

9.4.6 Solving Levene's Test by Using the Stata Software

9.5 Hypotheses Tests Regarding a Population Mean (μ) From One Random Sample

9.5.1 *Z* Test When the Population Standard Deviation (σ) Is Known and the Distribution Is Normal

9.5.2 Student's *t*-Test When the Population Standard Deviation (σ) Is Not Known

9.5.3 Solving Student's *t*-Test for a Single Sample by Using SPSS Software

9.5.4 Solving Student's *t*-Test for a Single Sample by Using Stata Software

9.6 Student's

*t*-Test to Compare Two Population Means From Two Independent Random Samples

Case 1: σ^{2}_{1}≠σ^{2}_{2}

Case 2: σ^{2}_{1}=;σ^{2}_{2}

9.6.1 Solving Student's *t*-Test From Two Independent Samples by Using SPSS Software

9.6.2 Solving Student's *t*-Test From Two Independent Samples by Usi
ng Stata Software

9.7 Student's

*t*-Test to Compare Two Population Means From Two Paired Random Samples

9.7.1 Solving Student's *t*-Test From Two Paired Sampless by Using SPSS Software

9.7.2 Solving Student's *t*-Test From Two Paired Sampless by Using Stata

9.8 ANOVA to Compare the Means of More Than Two Populations

9.8.1 One-Way ANOVA

9.8.2 Factorial ANOVA

9.9 Final Remarks

9.10 Exercises

**10. Nonparametric Tests**

10.1 Introduction

10.2 Tests for One Sample

10.2.1 Binomial Tests

10.2.2 Chi-Square Test (χ^{2}) for One Sample

10.2.3 Sign Test for One Sample

10.3 Tests for Two Paired Samples

10.3.1 McNemar Test

10.3.2 Sign Test for Two Paired Samples

10.3.3 Wilcoxon Test

10.4 Tests for Two Independent Samples

10.4.1 Chi-Square Test (χ^{2}) for Two Independent Samples

10.4.2 Mann-Whitney *U* Test

10.5 Tests for

*k* Paired Samples

10.5.1 Cochran's *Q* Tests

10.5.2 Friedman's Test

10.6 Tests for

*k* Independent Samples

10.6.1 The χ^{2} Test for *k* Independent Samples

10.6.2 Kruskal-Wallis Test

10.7 Final Remarks

10.8 Exercises

Part V **Multivariate Exploratory Data Analysis**

**11. Cluster Analysis**

11.1 Introduction

11.2 Cluster Analysis

11.2.1 Defining Distance or Similarity Measures in Cluster Analysis

11.2.2 Agglomeration Schedules in Cluster Analysis

11.3 Cluster Analysis with Hierarchical and Nonhierarchical Agglomeration Schedules in SPSS

11.3.1 Elaborating Hierarchical Agglomeration Schedules in SPSS

11.3.2 Elaborating Nonhierarchical *K*-Means Agglomeration Schedules in SPSS

11.4 Cluster Analysis With Hierarchical and Nonhierarchical Agglomeration Schedules in Stata

11.4.1 Elaborating Hierarchical Agglomeration Schedules in Stata

11.4.2 Elaborating Nonhierarchical *K*-Means Agglomeration Schedules
in Stata

11.5 Final Remarks

11.6 Exercises

Appendix

A.1 Detecting Multivariate Outliers

**12. Principal Component Factor Analysis**

12.1 Introduction

12.2 Principal Component Factor Analysis

12.2.1 Pearson's Linear Correlation and the Concept of Factor

12.2.2 Overall Adequacy of the Factor Analysis: Kaise-Meyer-Olkin Statistic and Bartlett's Test of Sphericity

12.2.3 Defining the Principal Component Factors: Determining the Eigenvalues and Eigenvectors of Correlation Matrix ρ and Calculating the Factor Scores

12.2.4 Factor Loadings and Communalities

12.2.5 Factor Rotation

12.2.6 A Practical Example of the Principal Component Factor Analysis

12.3 Principal Component Factor Analysis in SPSS

12.4 Principal Component Factor Analysis in Stata

12.5 Final Remarks

12.6 Exercises

Appendix: Cronbach's Alpha

A.1 Brief Presentation

A.2 Determining Cronbach's Alpha Algebraically

A.3 Determining Cronbach's Alpha in SPSS

A.4 Determining Cronbach's Alpha in Stata

Part VI **Generalized Linear Models**

**13. Simple and Multiple Regression Models**

13.1 Introduction

13.2 Linear Regression Models

13.2.1 Estimation of the Linear Regression Model by Ordinary Least Squares

13.2.2 Explanatory Power of the Regression Model: Coefficient of Determination *R*^{2}

13.2.3 General Statistical Significance of the Regression Model and Each of Its Parameters

13.2.4 Construction of the Confidence Intervals of the Model Pareamters and Elaboration of Predictions

13.2.5 Estimation of Multiple Linear Regression Models

13.2.6 Dummy Variables in Regression Models

13.3 Presuppositions of Regression Models Estimated by OLS

13.3.1 Normality of Residuals

13.3.2 The Multicollinearity Problem

13.3.3 The Problem of Heteroskedasticity

13.3.4 The Autocorrelation of Residuals Problem

13.3.5 Detection of Specification Problems: *Linktest* and *RESET Test*

13.4 Nonlinear Regression Models

13.4.1 The Box-Cox Transformation: The General Regression Model

13.5 Estimation of Regression Models in Stata

13.6 Estimation of Regression Models in SPSS

13.7 Final Remarks

13.8 Exercises

Appendix: Quantile Regression Models

A.1 A Brief Introduction

A.2 Example: Quantile Regression Model in Stata

**14. Binary and Multinomial Logistic Regression Models**

14.1 Introduction

14.2 The Binary Logistic Regression Model

14.2.1 Estimation of the Binary Logistic Regression Model by Maximum Likelihood

14.2.2 General Statistical Significance of the Binary Logistic Regression Model and Each of Its Parameters

14.2.3 Construction of the Confidence Intervals of the Parameters for the Binary Logistic Regression Model

14.2.4 Cutoff, Sensitivity Analysis, Overall Model Efficiency, Sensitivity, and Specificity

14.3 The Multinomial Logistic Regression Model

14.3.1 Estimation of the Multinomial Logistic Regression Model by Maximum Likelihood

14.3.2 General Statistical Significance of the Multinomial Logistic Regression Model and Each of Its Parameters

14.3.3 Construction of the Confidence Intervals of the Parameters for the Multinomial Logistic Regression Model

14.4 Estimation of Binary and Multinomial Logistic Regression Models in Stata

14.4.1 Binary Logistic Regression in Stata

14.4.2 Multinomial Logistic Regression in Stata

14.5 Estimation of Binary and Multinomial Logistic Regression Models in SPSS

14.5.1 Binary Logistic Regression in SPSS

14.5.2 Multinomial Logistic Regression in SPSS

14.6 Final Remarks

14.7 Exercises

Appendix: Probit Regression Models

A.1 A Brief Introductionn

A.2 Example: Probit Regression Model in Stata

**15. Regression Models for Count Data: Poisson and Negative Binomial**

15.1 Introduction

15.2 The Poisson Regression Model

15.2.1 Estimation of the Poisson Regression Model by Maximum Likelihood

15.2.2 General Statistical Significance of the Poisson Regression Model and Each of Its Parameters

15.2.3 Construction of the Confidence Intervals of the Parameters for the Poisson Regression Model

15.2.4 Test to Verify Overdispersion in Poisson Regression Models

15.3 The Negative Binomial Regression Model

15.3.1 Estimation of the Negative Binomial Regression Model by Maximum Likelihood

15.3.2 General Statistical Significance of the Negative Binomial Regression Model and Each of Its Parameters

15.3.3 Construction of the Confidence Intervals of the Parameters for the Negative Binomial Regression Model

15.4 Estimating Regression Models for Count Data in Stata

15.4.1 Poisson Regression Model in Stata

15.4.2 Negative Binomial Regression Model in Stata

15.5 Regression Model Estimation for Count Data in SPSS

15.5.1 Poisson Regression Model in SPSS

15.5.2 Negative Binomial Regression Model in SPSS

15.6 Final Remarks

15.7 Exercises

Appendix: Zero-Inflated Regression Models

A.1 Brief Introduction

A.2 Example: Zero-Inflated Poisson Regression Model in Stata

A.3 Example: Zero-Inflated Negative Binomial Regression Model in Stata

Part VII **Optimization Models and Simulation**

**16. Introduction to Optimization Models: General Formulations and Business Modeling**

16.1 Introduction to Optimization Models

16.2 Introduction to Linear Programming Models

16.3 Mathematical Formulation of a General Linear Programming Model

16.4 Linear Programming Model in the Standard and Canonical Forms

16.4.1 Linear Programming Model in the Standard Form

16.4.2 Linear Programming Model in the Canonical Form

16.4.3 Transformations Into the Standard or Canonical Form

16.5 Assumptions of the Linear Programming Model

16.5.1 Proportionality

16.5.2 Additivity

16.5.3 Divisibility and Non-negativity

16.5.4 Certainty

16.6 Modeling Business Problems Using Linear Programming

16.6.1 Production Mix Problem

16.6.2 Blending or Mixing Problem

16.6.3 Diet Problem

16.6.4 Capital Budget Problems

16.6.5 Portfolio Selection Problem

16.6.6 Production and Inventory Problem

16.6.7 Aggregated Planning Problem

16.7 Final Remarks

16.8 Exercises

**17. Solution of Linear Programming Problems**

17.1 Introduction

17.2 Graphical Solution of a Linear Programming Problem

17.2.1 Linear Programming Maximization Problem with a Single Optimal Solution

17.2.2 Linear Programming Minimization Problem With a Single Optimal Solution

17.2.3 Special Cases

17.3 Analytical Solution of a Linear Programming Problem in Which

*m < n*
17.4 The Simplex Method

17.4.1 Logic of the Simplex Method

17.4.2 Analytical Solution of the Simplex method for Maximization Problems

17.4.3 Tabular Form of the Simplex Method for Maximization Problems

17.4.4 The Simplex Method for Minimization Problems

Special Cases of the Simplex Method

17.5 Solution by Using a Computer

17.5.1 Solver in Excel

17.5.2 Solution of the Examples found in Section 16.6 of Chapter 16 using Solver in Excel

17.5.3 Solver Error Messages for Unlimited and Infeasible Solutions

17.5.4 Result Analysis by Using the Solver Answer and Limits Reports

17.6 Sensitivity Analysis

17.6.1 Alteration in one of the Objective Function Coefficients (Graphical Solution)

17.6.2 Alteration in One of the Constants on the Right-Hand Side of the Constraint and Concept of Shadow Price (Graphical Solution)

17.6.3 Reduced Cost

17.6.4 Sensitivity Analysis With Solver in Excel

17.7 Exercises

**18. Network Programming**

18.1 Introduction

18.2 Terminology of Graphs and Networks

18.3 Classic Transportation Problem

18.3.1 Mathematical Formulation of the Classic Transportation Problem

18.3.2 Balancing the Transportation Problem When the Total Supply Capacity Is Not Equal to the Total Demand Consumed

18.3.3 Solution of the Classic Transportation Problem

18.4 Transhipment Problem

18.4.1 Mathematical Formulation of the Transhipment Problem

18.4.2 Solution of the Transhipment Problem Using Excel Solver

18.5 Job Assignment Problem

18.5.1 Mathematical Formulation of the Job Assignment Problem

18.5.2 Solution of the Job Assignment Problem Using Excel Solver

18.6 Shortest Path Problem

18.6.1 Mathematical Formulation of the Shortest Path Problem

18.6.2 Solution of the Shortest Path Problem Using Excel Solver

18.7 Maximum Flow Problem

18.7.1 Mathematical Formulation of the Maximum Flow Problem

18.7.2 Solution of the Maximum Flow Problem Using Excel Solver

18.8 Exercises

**19. Integer Programming**

19.1 Introduction

19.2 Mathematical Formulation of a General Model for Integer Programming and/or Binary and Linear Relaxation

19.3 The Knapsack Problem

19.3.1 Modeling of the Knapsack Problem

19.3.2 Solution of the Knapsack Problem Using Excel Solver

19.4 The Capital Budgeting Problem as a Model of Binary Programming

18.4.1 Solution of the Capital Budgeting Problem as a Model of Binary Programming Using Excel Solver

19.5 The Traveling Salesman Problem

19.5.1 Modeling of the Traveling Salesman Problem

19.5.2 Solution of the Traveling Salesman Problem Using Excel Solver

19.6 The Facility Location Problem

19.6.1 Modeling of the Facility Location Problem

19.6.2 Solution of the Facility Location Problem Using Excel Solver

19.7 The Staff Scheduling Problem

19.7.1 Solution of the Staff Scheduling Problem Using Excel Solver

19.8 Exercisess

**20. Simulation and Risk Analysis**

20.1 Introduction to Simulation

20.2 The Monte Carlo Method

20.3 Monte Carlo Simulation in Excel

20.3.1 Generation of Random Numbers and Probability Distributions in Excel

20.3.2 Practical Examples

20.4 Final Remarks

20.5 Exercisess

Part VIII **Other Topics**

**21. Design and Analysis of Experiments**

21.1 Introduction

21.2 Steps in the Design of Experiments

21.3 The Four Principles of Experimental Design

21.4 Types of Experimental Design

21.4.1 Completely Randomized Design (CRD)

21.4.2 Randomized Block Design (RBD)

21.4.3 Factorial Design (FD)

21.5 One-Way Analysis of Variance

21.6 Factorial ANOVA

21.7 Final Remarks

21.8 Exercises

**22. Statistical Process Control**

22.1 Introduction

22.2 Estimating the Process Mean and Variability

22.3 Control Charts for Variables

22.3.1 Control Charts for **X̅** and *R*

22.3.2 Control Charts for **X̅** and *S*

22.4 Control Charts for Attributes

22.4.1 *P* Chart (Defective Fraction)

22.4.2 *np* Chart (Number of Defective Products)

22.4.3 *C* Chart (Total Number of Defects per Unit)

22.4.4 *U* Chart (Average Number of Defects per Unit)

22.5 Process Capability

22.5.1 *C*_{p} Index

22.5.2 *C*_{pk} Index

22.5.3 *C*_{pm} and *C*_{pmk} Index

22.6 Final Remarks

22.7 Exercises

**23. Data Mining and Multilevel Modeling**

23.1 Introduction to Data Mining

23.2 Multilevel Modeling

23.3 Nested Data Structures

23.4 Hierarchical Linear Models

23.4.1 Two-Level Hierarchical Linear Models With Clustered Data (HLM2)

23.4.2 Three-Level Hierarchical Linear Models With Repeated Measures (HLM3)

23.5 Estimation of Hierarchical Linear Models in Stata

23.5.1 Estimation of a Two-Level Hierarchical Linear Model With Clustered Data in Stata

23.5.2 Estimation of a Three-Level Hierarchical Linear Model With Repeated Measures in Stata

23.6 Estimation of Hierarchical Linear Models in SPSS

23.6.1 Estimation of a Two-Level Hierarchical Linear Model With Clustered Data in SPSS

23.6.2 Estimation of a Three-Level Hierarchical Linear Model With Repeated
Measures in SPSS

23.7 Final Remarks

23.8 Exercises

Appendix

A.1 Hierarchical Nonlinear Models

Answers

Appendices

References

Index