Stata Bookstore: Data Science for Business and Decision Making

Home / Bookstore / Title index / Multidisciplinary / Data Science for Business and Decision Making

Data Science for Business and Decision Making

As an Amazon Associate, StataCorp earns a small referral credit from qualifying purchases made from affiliate links on our site.

Amazon Associate affiliate link

What are VitalSource eBooks?
Your access code will be emailed upon purchase.

eBook not available for this title

Authors:	Luiz Paulo Fávero, and Patrícia Belfíore
Publisher:	Academic Press
Copyright:	2019
ISBN-13:	978-0-12-811216-8
Pages:	1,227; paperback

Authors:	Luiz Paulo Fávero, and Patrícia Belfíore
Publisher:	Academic Press
Copyright:	2019
ISBN-13:
Pages:	1,227; eBook

Authors:	Luiz Paulo Fávero, and Patrícia Belfíore
Publisher:	Academic Press
Copyright:	2019
ISBN-13:
Pages:	1,227; Kindle

Comment from the Stata technical group

Data Science for Business and Decision Making, by Luiz Paulo Fávero and Patrícia Belfiore, is an introductory text ideal for students and researchers. It covers key concepts of data science and demonstrates how to perform analyses in Stata, Excel, and SPSS.

This book covers a range of statistical concepts, from descriptive statistics to multilevel models. The authors begin by describing the different types of variables, which is essential for discussing the appropriate descriptive statistics and regression models for each type. Then, they cover probabilistic statistics, statistical inference, and advanced topics, including exploratory data analysis and generalized linear models. Noting the importance of optimization models in the decision-making process, the authors dedicate individual chapters to linear, network, and integer programming models. The remaining chapters discuss quality control, data mining, and multilevel models.

This book is both a conceptual and applied guide to data science. There are conceptual problems with worked solutions throughout each chapter. These allow readers to continuously apply the concepts they learn. Additionally, the software exercises provide readers hands-on practice using Stata, Excel, or SPSS. Datasets for each exercise are made available online.

View table of contents >>

Part 1 Foundations of Businesss Data Analysis

1. Introduction to Data Analysis and Decision Making

1.1 Introduction: Hierarchy Between Data, Information, and Knowledge
1.2 Overview of the Book
1.3 Final Remarks

2. Types of Variables and Measurement and Accuracy Scales

2.1 Introduction
2.2 Types of Variables

2.2.1 Nonmetric or Qualitative Variables
2.2.2 Metric or Quantitative Variables

2.3 Types of Variables x Scales of Measurement

2.3.1 Nonmetric Variables—Nominal Scale
2.3.2 Nonmetric Variables—Ordinal Scale
2.3.3 Quantitative Variable—Interval Scale
2.3.4 Quantitative Variable—Ratio Scale

2.4 Types of Variables x Number of Categories and Scales of Accuracy

2.4.1 Dichotomous or Binary Variable (Dummy)
2.4.2 Polychotomous Variable
2.4.3 Discrete Quantitative Variable
2.4.4 Continuous Quantitative Variable

2.5 Final Remarks
2.6 Exercises

Part II Descriptive Statistics

3. Univariate Descriptive Statistics

3.1 Introduction
3.2 Frequency Distribution Table

3.2.1 Frequency Distribution Table for Qualitative Variables
3.2.2 Frequency Distribution Table for Discrete Data
3.2.3 Frequency Distribution Table for Continuous Data Grouped into Classes

3.3 Graphical Representation of the Results

3.3.1 Graphical Representation for Qualitative Variables
3.3.2 Graphical Representation for Quantitative Variables

3.4 The Most Common Summary-Measures in Univariate Descriptive Statistics

3.4.1 Measures of Position or Location
3.4.2 Measures of Dispersion or Variability
3.4.3 Measures of Shape

3.5 A Practical Example in Excel
3.6 A Practical Example on SPSS

3.6.1 Frequencies Option
3.6.2 Descriptives Option
3.6.3 Explore Option

3.7 A Practical Example on Stata

3.7.1 Univariate Frequency Distribution Tables on Stata
3.7.2 Summary of Univariate Descriptive Statistics on Stata
3.7.3 Calculating Percentiles on Stata
3.7.4 Charts on Stata: Histograms, Stem-and-Leaf, and Boxplots

3.8 Final Remarks
3.9 Exercises

4. Bivariate Descriptive Statistics

4.1 Introduction
4.2 Association Between Two Qualitative Variables

4.2.1 Joint Frequency Distribution Tables
4.2.2 Measures of Association

4.3 Correlation Between Two Quantitative Variables

4.3.1 Joint Frequency Distribution Tables
4.3.2 Graphical Representation Through a Scatter Plot
4.3.3 Measures of Correlation

4.4 Final Remarks
4.5 Exercises

Part III Probabilistic Statistics

5. Introduction to Probability

5.1 Introduction
5.2 Terminology and Concepts

5.2.1 Random Experiment
5.2.2 Sample Space
5.2.3 Events
5.2.4 Unions, Intersections, and Complements
5.2.5 Independent Events
5.2.6 Mutually Exclusive Events

5.3 Definition of Probability
5.4 Basic Probability Rules

5.4.1 Probability Variation Field
5.4.2 Probability of the Sample Space
5.4.3 Probability of an Empty Set
5.4.4 Probability Addition Rule
5.4.5 Probability of a Complementary Event
5.4.6 Probability Multiplication Rule for Independent Events

5.5 Conditional Probability

5.5.1 Probability Multiplication Rule

5.6 Bayes' Theorem
5.7 Combinatorial Analysiss

5.7.1 Arrangements
5.7.2 Combinations
5.7.3 Permutations

5.8 Final Remarks
5.9 Exercises

6. Random Variables and Probability Distributions

6.1 Introduction
6.2 Random Variables

6.2.1 Discrete Random Variable
6.2.2 Continuous Random Variable

6.3 Probability Distributions for Discrete Random Variables

6.3.1 Discrete Uniform Distribution
6.3.2 Bernoulli Distribution
6.3.3 Binomial Distribution
6.3.4 Geometric Distribution
6.3.5 Negative Binomial Distribution
6.3.6 Hypergeometric Distribution
6.3.7 Poisson Distribution

6.4 Probability Distributions for Continuous Random Variables

6.4.1 Uniform Distribution
6.4.2 Normal Distribution
6.4.3 Exponential Distribution
6.4.4 Gamma Distribution
6.4.5 Chi-Square Distribution
6.4.6 Student's t Distribution
6.4.7 Snedecor's F Distribution

6.5 Final Remarks
6.6 Exercises

Part IV Statistical Inference

7. Sampling

7.1 Introduction
7.2 Probability or Random Sampling

7.2.1 Simple Random Sampling
7.2.2 Systematic Sampling
7.2.3 Stratified Sampling
7.2.4 Cluster Sampling

7.3 Nonprobability or Nonrandom Sampling

7.3.1 Convenience Sampling
7.3.2 Judgmental or Purposive Sampling
7.3.3 Quota Sampling
7.3.4 Geometric Propagation or Snowball Sampling

7.4 Sample Size

7.4.1 Size of a Simple Random Sample
7.4.2 Size of the Systematic Sample
7.4.3 Size of the Stratified Sample
7.4.4 Size of a Cluster Sample

7.5 Final Remarks
7.6 Exercises

8. Estimation

8.1 Introduction
8.2 Point and Interval Estimation

8.2.1 Point Estimation
8.2.2 Interval Estimation

8.3 Point Estimation Methods

8.3.1 Method of Moments
8.3.2 Ordinary Least Squares
8.3.3 Maximum Likelihood Estimation

8.4 Interval Estimation or Confidence Intervals

8.4.1 Confidence Interval for the Population Mean (μ)
8.4.2 Confidence Interval for Proportions
8.4.3 Confidence Interval for the Population Variance

8.5 Final Remarks
8.6 Exercises

9. Hypotheses Tests

9.1 Introduction
9.2 Parametric Tests
9.3 Univariate Tests for Normality

9.3.1 Kolmogorov-Smirnov Test
9.3.2 Shapiro-Wilk Test
9.3.3 Shapiro-Francia Test
9.3.4 Solving Tests for Normality by Using SPSS Software
9.3.5 Solving Tests for Normality by Using Stata

9.4 Tests for the Homogeneity of Variances

9.4.1 Bartlett's χ² Test
9.4.2 Cochran's C Test
9.4.3 Hartley's F_max Test
9.4.4 Levene's F-Test
9.4.5 Solving Levene's Test by Using SPSS Software
9.4.6 Solving Levene's Test by Using the Stata Software

9.5 Hypotheses Tests Regarding a Population Mean (μ) From One Random Sample

9.5.1 Z Test When the Population Standard Deviation (σ) Is Known and the Distribution Is Normal
9.5.2 Student's t-Test When the Population Standard Deviation (σ) Is Not Known
9.5.3 Solving Student's t-Test for a Single Sample by Using SPSS Software
9.5.4 Solving Student's t-Test for a Single Sample by Using Stata Software

9.6 Student's t-Test to Compare Two Population Means From Two Independent Random Samples

Case 1: σ²₁≠σ²₂
Case 2: σ²₁=;σ²₂
9.6.1 Solving Student's t-Test From Two Independent Samples by Using SPSS Software
9.6.2 Solving Student's t-Test From Two Independent Samples by Usi ng Stata Software

9.7 Student's t-Test to Compare Two Population Means From Two Paired Random Samples

9.7.1 Solving Student's t-Test From Two Paired Sampless by Using SPSS Software
9.7.2 Solving Student's t-Test From Two Paired Sampless by Using Stata

9.8 ANOVA to Compare the Means of More Than Two Populations

9.8.1 One-Way ANOVA
9.8.2 Factorial ANOVA

9.9 Final Remarks
9.10 Exercises

10. Nonparametric Tests

10.1 Introduction
10.2 Tests for One Sample

10.2.1 Binomial Tests
10.2.2 Chi-Square Test (χ²) for One Sample
10.2.3 Sign Test for One Sample

10.3 Tests for Two Paired Samples

10.3.1 McNemar Test
10.3.2 Sign Test for Two Paired Samples
10.3.3 Wilcoxon Test

10.4 Tests for Two Independent Samples

10.4.1 Chi-Square Test (χ²) for Two Independent Samples
10.4.2 Mann-Whitney U Test

10.5 Tests for k Paired Samples

10.5.1 Cochran's Q Tests
10.5.2 Friedman's Test

10.6 Tests for k Independent Samples

10.6.1 The χ² Test for k Independent Samples
10.6.2 Kruskal-Wallis Test

10.7 Final Remarks
10.8 Exercises

Part V Multivariate Exploratory Data Analysis

11. Cluster Analysis

11.1 Introduction
11.2 Cluster Analysis

11.2.1 Defining Distance or Similarity Measures in Cluster Analysis
11.2.2 Agglomeration Schedules in Cluster Analysis

11.3 Cluster Analysis with Hierarchical and Nonhierarchical Agglomeration Schedules in SPSS

11.3.1 Elaborating Hierarchical Agglomeration Schedules in SPSS
11.3.2 Elaborating Nonhierarchical K-Means Agglomeration Schedules in SPSS

11.4 Cluster Analysis With Hierarchical and Nonhierarchical Agglomeration Schedules in Stata

11.4.1 Elaborating Hierarchical Agglomeration Schedules in Stata
11.4.2 Elaborating Nonhierarchical K-Means Agglomeration Schedules in Stata

11.5 Final Remarks
11.6 Exercises
Appendix

A.1 Detecting Multivariate Outliers

12. Principal Component Factor Analysis

12.1 Introduction
12.2 Principal Component Factor Analysis

12.2.1 Pearson's Linear Correlation and the Concept of Factor
12.2.2 Overall Adequacy of the Factor Analysis: Kaise-Meyer-Olkin Statistic and Bartlett's Test of Sphericity
12.2.3 Defining the Principal Component Factors: Determining the Eigenvalues and Eigenvectors of Correlation Matrix ρ and Calculating the Factor Scores
12.2.4 Factor Loadings and Communalities
12.2.5 Factor Rotation
12.2.6 A Practical Example of the Principal Component Factor Analysis

12.3 Principal Component Factor Analysis in SPSS
12.4 Principal Component Factor Analysis in Stata
12.5 Final Remarks
12.6 Exercises
Appendix: Cronbach's Alpha

A.1 Brief Presentation
A.2 Determining Cronbach's Alpha Algebraically
A.3 Determining Cronbach's Alpha in SPSS
A.4 Determining Cronbach's Alpha in Stata

Part VI Generalized Linear Models

13. Simple and Multiple Regression Models

13.1 Introduction
13.2 Linear Regression Models

13.2.1 Estimation of the Linear Regression Model by Ordinary Least Squares
13.2.2 Explanatory Power of the Regression Model: Coefficient of Determination R²
13.2.3 General Statistical Significance of the Regression Model and Each of Its Parameters
13.2.4 Construction of the Confidence Intervals of the Model Pareamters and Elaboration of Predictions
13.2.5 Estimation of Multiple Linear Regression Models
13.2.6 Dummy Variables in Regression Models

13.3 Presuppositions of Regression Models Estimated by OLS

13.3.1 Normality of Residuals
13.3.2 The Multicollinearity Problem
13.3.3 The Problem of Heteroskedasticity
13.3.4 The Autocorrelation of Residuals Problem
13.3.5 Detection of Specification Problems: Linktest and RESET Test

13.4 Nonlinear Regression Models

13.4.1 The Box-Cox Transformation: The General Regression Model

13.5 Estimation of Regression Models in Stata
13.6 Estimation of Regression Models in SPSS
13.7 Final Remarks
13.8 Exercises
Appendix: Quantile Regression Models

A.1 A Brief Introduction
A.2 Example: Quantile Regression Model in Stata

14. Binary and Multinomial Logistic Regression Models

14.1 Introduction
14.2 The Binary Logistic Regression Model

14.2.1 Estimation of the Binary Logistic Regression Model by Maximum Likelihood
14.2.2 General Statistical Significance of the Binary Logistic Regression Model and Each of Its Parameters
14.2.3 Construction of the Confidence Intervals of the Parameters for the Binary Logistic Regression Model
14.2.4 Cutoff, Sensitivity Analysis, Overall Model Efficiency, Sensitivity, and Specificity

14.3 The Multinomial Logistic Regression Model

14.3.1 Estimation of the Multinomial Logistic Regression Model by Maximum Likelihood
14.3.2 General Statistical Significance of the Multinomial Logistic Regression Model and Each of Its Parameters
14.3.3 Construction of the Confidence Intervals of the Parameters for the Multinomial Logistic Regression Model

14.4 Estimation of Binary and Multinomial Logistic Regression Models in Stata

14.4.1 Binary Logistic Regression in Stata
14.4.2 Multinomial Logistic Regression in Stata

14.5 Estimation of Binary and Multinomial Logistic Regression Models in SPSS

14.5.1 Binary Logistic Regression in SPSS
14.5.2 Multinomial Logistic Regression in SPSS

14.6 Final Remarks
14.7 Exercises
Appendix: Probit Regression Models

A.1 A Brief Introductionn
A.2 Example: Probit Regression Model in Stata

15. Regression Models for Count Data: Poisson and Negative Binomial

15.1 Introduction
15.2 The Poisson Regression Model

15.2.1 Estimation of the Poisson Regression Model by Maximum Likelihood
15.2.2 General Statistical Significance of the Poisson Regression Model and Each of Its Parameters
15.2.3 Construction of the Confidence Intervals of the Parameters for the Poisson Regression Model
15.2.4 Test to Verify Overdispersion in Poisson Regression Models

15.3 The Negative Binomial Regression Model

15.3.1 Estimation of the Negative Binomial Regression Model by Maximum Likelihood
15.3.2 General Statistical Significance of the Negative Binomial Regression Model and Each of Its Parameters
15.3.3 Construction of the Confidence Intervals of the Parameters for the Negative Binomial Regression Model

15.4 Estimating Regression Models for Count Data in Stata

15.4.1 Poisson Regression Model in Stata
15.4.2 Negative Binomial Regression Model in Stata

15.5 Regression Model Estimation for Count Data in SPSS

15.5.1 Poisson Regression Model in SPSS
15.5.2 Negative Binomial Regression Model in SPSS

15.6 Final Remarks
15.7 Exercises
Appendix: Zero-Inflated Regression Models

A.1 Brief Introduction
A.2 Example: Zero-Inflated Poisson Regression Model in Stata
A.3 Example: Zero-Inflated Negative Binomial Regression Model in Stata

Part VII Optimization Models and Simulation

16. Introduction to Optimization Models: General Formulations and Business Modeling

16.1 Introduction to Optimization Models
16.2 Introduction to Linear Programming Models
16.3 Mathematical Formulation of a General Linear Programming Model
16.4 Linear Programming Model in the Standard and Canonical Forms

16.4.1 Linear Programming Model in the Standard Form
16.4.2 Linear Programming Model in the Canonical Form
16.4.3 Transformations Into the Standard or Canonical Form

16.5 Assumptions of the Linear Programming Model

16.5.1 Proportionality
16.5.2 Additivity
16.5.3 Divisibility and Non-negativity
16.5.4 Certainty

16.6 Modeling Business Problems Using Linear Programming

16.6.1 Production Mix Problem
16.6.2 Blending or Mixing Problem
16.6.3 Diet Problem
16.6.4 Capital Budget Problems
16.6.5 Portfolio Selection Problem
16.6.6 Production and Inventory Problem
16.6.7 Aggregated Planning Problem

16.7 Final Remarks
16.8 Exercises

17. Solution of Linear Programming Problems

17.1 Introduction
17.2 Graphical Solution of a Linear Programming Problem

17.2.1 Linear Programming Maximization Problem with a Single Optimal Solution
17.2.2 Linear Programming Minimization Problem With a Single Optimal Solution
17.2.3 Special Cases

17.3 Analytical Solution of a Linear Programming Problem in Which m < n
17.4 The Simplex Method

17.4.1 Logic of the Simplex Method
17.4.2 Analytical Solution of the Simplex method for Maximization Problems
17.4.3 Tabular Form of the Simplex Method for Maximization Problems
17.4.4 The Simplex Method for Minimization Problems
Special Cases of the Simplex Method

17.5 Solution by Using a Computer

17.5.1 Solver in Excel
17.5.2 Solution of the Examples found in Section 16.6 of Chapter 16 using Solver in Excel
17.5.3 Solver Error Messages for Unlimited and Infeasible Solutions
17.5.4 Result Analysis by Using the Solver Answer and Limits Reports

17.6 Sensitivity Analysis

17.6.1 Alteration in one of the Objective Function Coefficients (Graphical Solution)
17.6.2 Alteration in One of the Constants on the Right-Hand Side of the Constraint and Concept of Shadow Price (Graphical Solution)
17.6.3 Reduced Cost
17.6.4 Sensitivity Analysis With Solver in Excel

17.7 Exercises

18. Network Programming

18.1 Introduction
18.2 Terminology of Graphs and Networks
18.3 Classic Transportation Problem

18.3.1 Mathematical Formulation of the Classic Transportation Problem
18.3.2 Balancing the Transportation Problem When the Total Supply Capacity Is Not Equal to the Total Demand Consumed
18.3.3 Solution of the Classic Transportation Problem

18.4 Transhipment Problem

18.4.1 Mathematical Formulation of the Transhipment Problem
18.4.2 Solution of the Transhipment Problem Using Excel Solver

18.5 Job Assignment Problem

18.5.1 Mathematical Formulation of the Job Assignment Problem
18.5.2 Solution of the Job Assignment Problem Using Excel Solver

18.6 Shortest Path Problem

18.6.1 Mathematical Formulation of the Shortest Path Problem
18.6.2 Solution of the Shortest Path Problem Using Excel Solver

18.7 Maximum Flow Problem

18.7.1 Mathematical Formulation of the Maximum Flow Problem
18.7.2 Solution of the Maximum Flow Problem Using Excel Solver

18.8 Exercises

19. Integer Programming

19.1 Introduction
19.2 Mathematical Formulation of a General Model for Integer Programming and/or Binary and Linear Relaxation
19.3 The Knapsack Problem

19.3.1 Modeling of the Knapsack Problem
19.3.2 Solution of the Knapsack Problem Using Excel Solver

19.4 The Capital Budgeting Problem as a Model of Binary Programming

18.4.1 Solution of the Capital Budgeting Problem as a Model of Binary Programming Using Excel Solver

19.5 The Traveling Salesman Problem

19.5.1 Modeling of the Traveling Salesman Problem
19.5.2 Solution of the Traveling Salesman Problem Using Excel Solver

19.6 The Facility Location Problem

19.6.1 Modeling of the Facility Location Problem
19.6.2 Solution of the Facility Location Problem Using Excel Solver

19.7 The Staff Scheduling Problem

19.7.1 Solution of the Staff Scheduling Problem Using Excel Solver

19.8 Exercisess

20. Simulation and Risk Analysis

20.1 Introduction to Simulation
20.2 The Monte Carlo Method
20.3 Monte Carlo Simulation in Excel

20.3.1 Generation of Random Numbers and Probability Distributions in Excel
20.3.2 Practical Examples

20.4 Final Remarks
20.5 Exercisess

Part VIII Other Topics

21. Design and Analysis of Experiments

21.1 Introduction
21.2 Steps in the Design of Experiments
21.3 The Four Principles of Experimental Design
21.4 Types of Experimental Design

21.4.1 Completely Randomized Design (CRD)
21.4.2 Randomized Block Design (RBD)
21.4.3 Factorial Design (FD)

21.5 One-Way Analysis of Variance
21.6 Factorial ANOVA
21.7 Final Remarks
21.8 Exercises

22. Statistical Process Control

22.1 Introduction
22.2 Estimating the Process Mean and Variability
22.3 Control Charts for Variables

22.3.1 Control Charts for X̅ and R
22.3.2 Control Charts for X̅ and S

22.4 Control Charts for Attributes

22.4.1 P Chart (Defective Fraction)
22.4.2 np Chart (Number of Defective Products)
22.4.3 C Chart (Total Number of Defects per Unit)
22.4.4 U Chart (Average Number of Defects per Unit)

22.5 Process Capability

22.5.1 C_p Index
22.5.2 C_pk Index
22.5.3 C_pm and C_pmk Index

22.6 Final Remarks
22.7 Exercises

23. Data Mining and Multilevel Modeling

23.1 Introduction to Data Mining
23.2 Multilevel Modeling
23.3 Nested Data Structures
23.4 Hierarchical Linear Models

23.4.1 Two-Level Hierarchical Linear Models With Clustered Data (HLM2)
23.4.2 Three-Level Hierarchical Linear Models With Repeated Measures (HLM3)

23.5 Estimation of Hierarchical Linear Models in Stata

23.5.1 Estimation of a Two-Level Hierarchical Linear Model With Clustered Data in Stata
23.5.2 Estimation of a Three-Level Hierarchical Linear Model With Repeated Measures in Stata

23.6 Estimation of Hierarchical Linear Models in SPSS

23.6.1 Estimation of a Two-Level Hierarchical Linear Model With Clustered Data in SPSS
23.6.2 Estimation of a Three-Level Hierarchical Linear Model With Repeated Measures in SPSS

23.7 Final Remarks
23.8 Exercises
Appendix

A.1 Hierarchical Nonlinear Models

Answers
Appendices
References
Index

Data Science for Business and Decision Making

Comment from the Stata technical group

Table of contents

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies

Stata/MP4 Annual License (download)

Data Science for Business and Decision Making

Comment from the Stata technical group

Table of contents

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies