Stata 18 is here! Explore all the new features >
Data Science for Business and Decision Making 

Click to enlarge See the back cover 
As an Amazon Associate, StataCorp earns a small referral credit from
qualifying purchases made from affiliate links on our site.
VitalSource eBook details
VitalSource eBooks are read using the Bookshelf How to access your eBook1) Visit Bookshelf online to sign in or create an account.
2) Once logged in, click redeem in the upper right corner. Enter your eBook code. Your eBook code will be in your order confirmation email under the eBook's title. 3) The eBook will be added to your library. You may then download Bookshelf on other devices and sync your library to view the eBook.
Bookshelf is available on the following:
Bookshelf allows you to have 2 computers and 2 mobile devices activated at any given time.
I was amazed at the VitalSource way of presenting the books....
Everything looks perfectly typeset, but yet you can "flip"
through the book in the same way you would "flip" through a very
long web page in your web browser. And best of all, whenever I
have my tablet with me, my books are just a swipe away.
— Michael Mitchell
Senior statistician at the USC Children's Data Network, author of four Stata Press books, and former UCLA statistical consultant who envisioned and designed the UCLA Statistical Consulting Resources website. Return policy for eBooksStata Press eBooks are nonreturnable and nonrefundable. ×eBook not available for this title 


Comment from the Stata technical groupData Science for Business and Decision Making, by Luiz Paulo Fávero and Patrícia Belfiore, is an introductory text ideal for students and researchers. It covers key concepts of data science and demonstrates how to perform analyses in Stata, Excel, and SPSS. This book covers a range of statistical concepts, from descriptive statistics to multilevel models. The authors begin by describing the different types of variables, which is essential for discussing the appropriate descriptive statistics and regression models for each type. Then, they cover probabilistic statistics, statistical inference, and advanced topics, including exploratory data analysis and generalized linear models. Noting the importance of optimization models in the decisionmaking process, the authors dedicate individual chapters to linear, network, and integer programming models. The remaining chapters discuss quality control, data mining, and multilevel models. This book is both a conceptual and applied guide to data science. There are conceptual problems with worked solutions throughout each chapter. These allow readers to continuously apply the concepts they learn. Additionally, the software exercises provide readers handson practice using Stata, Excel, or SPSS. Datasets for each exercise are made available online.  
Table of contentsView table of contents >> Part 1 Foundations of Businesss Data Analysis
1. Introduction to Data Analysis and Decision Making
1.1 Introduction: Hierarchy Between Data, Information, and Knowledge
1.2 Overview of the Book 1.3 Final Remarks 2. Types of Variables and Measurement and Accuracy Scales
2.1 Introduction
2.2 Types of Variables
2.2.1 Nonmetric or Qualitative Variables
2.3 Types of Variables x Scales of Measurement2.2.2 Metric or Quantitative Variables
2.3.1 Nonmetric Variables—Nominal Scale
2.4 Types of Variables x Number of Categories and Scales of Accuracy2.3.2 Nonmetric Variables—Ordinal Scale 2.3.3 Quantitative Variable—Interval Scale 2.3.4 Quantitative Variable—Ratio Scale
2.4.1 Dichotomous or Binary Variable (Dummy)
2.5 Final Remarks2.4.2 Polychotomous Variable 2.4.3 Discrete Quantitative Variable 2.4.4 Continuous Quantitative Variable 2.6 Exercises Part II Descriptive Statistics
3. Univariate Descriptive Statistics
3.1 Introduction
3.2 Frequency Distribution Table
3.2.1 Frequency Distribution Table for Qualitative Variables
3.3 Graphical Representation of the Results3.2.2 Frequency Distribution Table for Discrete Data 3.2.3 Frequency Distribution Table for Continuous Data Grouped into Classes
3.3.1 Graphical Representation for Qualitative Variables
3.4 The Most Common SummaryMeasures in Univariate Descriptive Statistics3.3.2 Graphical Representation for Quantitative Variables
3.4.1 Measures of Position or Location
3.5 A Practical Example in Excel3.4.2 Measures of Dispersion or Variability 3.4.3 Measures of Shape 3.6 A Practical Example on SPSS
3.6.1 Frequencies Option
3.7 A Practical Example on Stata3.6.2 Descriptives Option 3.6.3 Explore Option
3.7.1 Univariate Frequency Distribution Tables on Stata
3.8 Final Remarks3.7.2 Summary of Univariate Descriptive Statistics on Stata 3.7.3 Calculating Percentiles on Stata 3.7.4 Charts on Stata: Histograms, StemandLeaf, and Boxplots 3.9 Exercises 4. Bivariate Descriptive Statistics
4.1 Introduction
4.2 Association Between Two Qualitative Variables
4.2.1 Joint Frequency Distribution Tables
4.3 Correlation Between Two Quantitative Variables4.2.2 Measures of Association
4.3.1 Joint Frequency Distribution Tables
4.4 Final Remarks4.3.2 Graphical Representation Through a Scatter Plot 4.3.3 Measures of Correlation 4.5 Exercises Part III Probabilistic Statistics
5. Introduction to Probability
5.1 Introduction
5.2 Terminology and Concepts
5.2.1 Random Experiment
5.3 Definition of Probability5.2.2 Sample Space 5.2.3 Events 5.2.4 Unions, Intersections, and Complements 5.2.5 Independent Events 5.2.6 Mutually Exclusive Events 5.4 Basic Probability Rules
5.4.1 Probability Variation Field
5.5 Conditional Probability5.4.2 Probability of the Sample Space 5.4.3 Probability of an Empty Set 5.4.4 Probability Addition Rule 5.4.5 Probability of a Complementary Event 5.4.6 Probability Multiplication Rule for Independent Events
5.5.1 Probability Multiplication Rule
5.6 Bayes' Theorem5.7 Combinatorial Analysiss
5.7.1 Arrangements
5.8 Final Remarks5.7.2 Combinations 5.7.3 Permutations 5.9 Exercises 6. Random Variables and Probability Distributions
6.1 Introduction
6.2 Random Variables
6.2.1 Discrete Random Variable
6.3 Probability Distributions for Discrete Random Variables6.2.2 Continuous Random Variable
6.3.1 Discrete Uniform Distribution
6.4 Probability Distributions for Continuous Random Variables6.3.2 Bernoulli Distribution 6.3.3 Binomial Distribution 6.3.4 Geometric Distribution 6.3.5 Negative Binomial Distribution 6.3.6 Hypergeometric Distribution 6.3.7 Poisson Distribution
6.4.1 Uniform Distribution
6.5 Final Remarks6.4.2 Normal Distribution 6.4.3 Exponential Distribution 6.4.4 Gamma Distribution 6.4.5 ChiSquare Distribution 6.4.6 Student's t Distribution 6.4.7 Snedecor's F Distribution 6.6 Exercises Part IV Statistical Inference
7. Sampling
7.1 Introduction
7.2 Probability or Random Sampling
7.2.1 Simple Random Sampling
7.3 Nonprobability or Nonrandom Sampling7.2.2 Systematic Sampling 7.2.3 Stratified Sampling 7.2.4 Cluster Sampling
7.3.1 Convenience Sampling
7.4 Sample Size7.3.2 Judgmental or Purposive Sampling 7.3.3 Quota Sampling 7.3.4 Geometric Propagation or Snowball Sampling
7.4.1 Size of a Simple Random Sample
7.5 Final Remarks7.4.2 Size of the Systematic Sample 7.4.3 Size of the Stratified Sample 7.4.4 Size of a Cluster Sample 7.6 Exercises 8. Estimation
8.1 Introduction
8.2 Point and Interval Estimation
8.2.1 Point Estimation
8.3 Point Estimation Methods8.2.2 Interval Estimation
8.3.1 Method of Moments
8.4 Interval Estimation or Confidence Intervals8.3.2 Ordinary Least Squares 8.3.3 Maximum Likelihood Estimation
8.4.1 Confidence Interval for the Population Mean (μ)
8.5 Final Remarks8.4.2 Confidence Interval for Proportions 8.4.3 Confidence Interval for the Population Variance 8.6 Exercises 9. Hypotheses Tests
9.1 Introduction
9.2 Parametric Tests 9.3 Univariate Tests for Normality
9.3.1 KolmogorovSmirnov Test
9.4 Tests for the Homogeneity of Variances9.3.2 ShapiroWilk Test 9.3.3 ShapiroFrancia Test 9.3.4 Solving Tests for Normality by Using SPSS Software 9.3.5 Solving Tests for Normality by Using Stata
9.4.1 Bartlett's χ^{2} Test
9.5 Hypotheses Tests Regarding a Population Mean (μ) From One Random Sample9.4.2 Cochran's C Test 9.4.3 Hartley's F_{max} Test 9.4.4 Levene's FTest 9.4.5 Solving Levene's Test by Using SPSS Software 9.4.6 Solving Levene's Test by Using the Stata Software
9.5.1 Z Test When the Population Standard Deviation (σ) Is Known and the Distribution Is Normal
9.6 Student's tTest to Compare Two Population Means From Two Independent Random Samples9.5.2 Student's tTest When the Population Standard Deviation (σ) Is Not Known 9.5.3 Solving Student's tTest for a Single Sample by Using SPSS Software 9.5.4 Solving Student's tTest for a Single Sample by Using Stata Software
Case 1: σ^{2}_{1}≠σ^{2}_{2}
9.7 Student's tTest to Compare Two Population Means From Two Paired Random SamplesCase 2: σ^{2}_{1}=;σ^{2}_{2} 9.6.1 Solving Student's tTest From Two Independent Samples by Using SPSS Software 9.6.2 Solving Student's tTest From Two Independent Samples by Usi ng Stata Software
9.7.1 Solving Student's tTest From Two Paired Sampless by Using SPSS Software
9.8 ANOVA to Compare the Means of More Than Two Populations9.7.2 Solving Student's tTest From Two Paired Sampless by Using Stata
9.8.1 OneWay ANOVA
9.9 Final Remarks9.8.2 Factorial ANOVA 9.10 Exercises 10. Nonparametric Tests
10.1 Introduction
10.2 Tests for One Sample
10.2.1 Binomial Tests
10.3 Tests for Two Paired Samples10.2.2 ChiSquare Test (χ^{2}) for One Sample 10.2.3 Sign Test for One Sample
10.3.1 McNemar Test
10.4 Tests for Two Independent Samples10.3.2 Sign Test for Two Paired Samples 10.3.3 Wilcoxon Test
10.4.1 ChiSquare Test (χ^{2}) for Two Independent Samples
10.5 Tests for k Paired Samples10.4.2 MannWhitney U Test
10.5.1 Cochran's Q Tests
10.6 Tests for k Independent Samples10.5.2 Friedman's Test
10.6.1 The χ^{2} Test for k Independent Samples
10.7 Final Remarks10.6.2 KruskalWallis Test 10.8 Exercises Part V Multivariate Exploratory Data Analysis
11. Cluster Analysis
11.1 Introduction
11.2 Cluster Analysis
11.2.1 Defining Distance or Similarity Measures in Cluster Analysis
11.3 Cluster Analysis with Hierarchical and Nonhierarchical Agglomeration Schedules in SPSS11.2.2 Agglomeration Schedules in Cluster Analysis
11.3.1 Elaborating Hierarchical Agglomeration Schedules in SPSS
11.4 Cluster Analysis With Hierarchical and Nonhierarchical Agglomeration Schedules in Stata11.3.2 Elaborating Nonhierarchical KMeans Agglomeration Schedules in SPSS
11.4.1 Elaborating Hierarchical Agglomeration Schedules in Stata
11.5 Final Remarks11.4.2 Elaborating Nonhierarchical KMeans Agglomeration Schedules in Stata 11.6 Exercises Appendix
A.1 Detecting Multivariate Outliers
12. Principal Component Factor Analysis
12.1 Introduction
12.2 Principal Component Factor Analysis
12.2.1 Pearson's Linear Correlation and the Concept of Factor
12.3 Principal Component Factor Analysis in SPSS12.2.2 Overall Adequacy of the Factor Analysis: KaiseMeyerOlkin Statistic and Bartlett's Test of Sphericity 12.2.3 Defining the Principal Component Factors: Determining the Eigenvalues and Eigenvectors of Correlation Matrix ρ and Calculating the Factor Scores 12.2.4 Factor Loadings and Communalities 12.2.5 Factor Rotation 12.2.6 A Practical Example of the Principal Component Factor Analysis 12.4 Principal Component Factor Analysis in Stata 12.5 Final Remarks 12.6 Exercises Appendix: Cronbach's Alpha
A.1 Brief Presentation
A.2 Determining Cronbach's Alpha Algebraically A.3 Determining Cronbach's Alpha in SPSS A.4 Determining Cronbach's Alpha in Stata Part VI Generalized Linear Models
13. Simple and Multiple Regression Models
13.1 Introduction
13.2 Linear Regression Models
13.2.1 Estimation of the Linear Regression Model by Ordinary Least Squares
13.3 Presuppositions of Regression Models Estimated by OLS13.2.2 Explanatory Power of the Regression Model: Coefficient of Determination R^{2} 13.2.3 General Statistical Significance of the Regression Model and Each of Its Parameters 13.2.4 Construction of the Confidence Intervals of the Model Pareamters and Elaboration of Predictions 13.2.5 Estimation of Multiple Linear Regression Models 13.2.6 Dummy Variables in Regression Models
13.3.1 Normality of Residuals
13.4 Nonlinear Regression Models13.3.2 The Multicollinearity Problem 13.3.3 The Problem of Heteroskedasticity 13.3.4 The Autocorrelation of Residuals Problem 13.3.5 Detection of Specification Problems: Linktest and RESET Test
13.4.1 The BoxCox Transformation: The General Regression Model
13.5 Estimation of Regression Models in Stata13.6 Estimation of Regression Models in SPSS 13.7 Final Remarks 13.8 Exercises Appendix: Quantile Regression Models
A.1 A Brief Introduction
A.2 Example: Quantile Regression Model in Stata 14. Binary and Multinomial Logistic Regression Models
14.1 Introduction
14.2 The Binary Logistic Regression Model
14.2.1 Estimation of the Binary Logistic Regression Model by Maximum Likelihood
14.3 The Multinomial Logistic Regression Model14.2.2 General Statistical Significance of the Binary Logistic Regression Model and Each of Its Parameters 14.2.3 Construction of the Confidence Intervals of the Parameters for the Binary Logistic Regression Model 14.2.4 Cutoff, Sensitivity Analysis, Overall Model Efficiency, Sensitivity, and Specificity
14.3.1 Estimation of the Multinomial Logistic Regression Model by Maximum Likelihood
14.4 Estimation of Binary and Multinomial Logistic Regression Models in Stata14.3.2 General Statistical Significance of the Multinomial Logistic Regression Model and Each of Its Parameters 14.3.3 Construction of the Confidence Intervals of the Parameters for the Multinomial Logistic Regression Model
14.4.1 Binary Logistic Regression in Stata
14.5 Estimation of Binary and Multinomial Logistic Regression Models in SPSS14.4.2 Multinomial Logistic Regression in Stata
14.5.1 Binary Logistic Regression in SPSS
14.6 Final Remarks14.5.2 Multinomial Logistic Regression in SPSS 14.7 Exercises Appendix: Probit Regression Models
A.1 A Brief Introductionn
A.2 Example: Probit Regression Model in Stata 15. Regression Models for Count Data: Poisson and Negative Binomial
15.1 Introduction
15.2 The Poisson Regression Model
15.2.1 Estimation of the Poisson Regression Model by Maximum Likelihood
15.3 The Negative Binomial Regression Model15.2.2 General Statistical Significance of the Poisson Regression Model and Each of Its Parameters 15.2.3 Construction of the Confidence Intervals of the Parameters for the Poisson Regression Model 15.2.4 Test to Verify Overdispersion in Poisson Regression Models
15.3.1 Estimation of the Negative Binomial Regression Model by Maximum Likelihood
15.4 Estimating Regression Models for Count Data in Stata15.3.2 General Statistical Significance of the Negative Binomial Regression Model and Each of Its Parameters 15.3.3 Construction of the Confidence Intervals of the Parameters for the Negative Binomial Regression Model
15.4.1 Poisson Regression Model in Stata
15.5 Regression Model Estimation for Count Data in SPSS15.4.2 Negative Binomial Regression Model in Stata
15.5.1 Poisson Regression Model in SPSS
15.6 Final Remarks15.5.2 Negative Binomial Regression Model in SPSS 15.7 Exercises Appendix: ZeroInflated Regression Models
A.1 Brief Introduction
A.2 Example: ZeroInflated Poisson Regression Model in Stata A.3 Example: ZeroInflated Negative Binomial Regression Model in Stata Part VII Optimization Models and Simulation
16. Introduction to Optimization Models: General Formulations and Business Modeling
16.1 Introduction to Optimization Models
16.2 Introduction to Linear Programming Models 16.3 Mathematical Formulation of a General Linear Programming Model 16.4 Linear Programming Model in the Standard and Canonical Forms
16.4.1 Linear Programming Model in the Standard Form
16.5 Assumptions of the Linear Programming Model16.4.2 Linear Programming Model in the Canonical Form 16.4.3 Transformations Into the Standard or Canonical Form
16.5.1 Proportionality
16.6 Modeling Business Problems Using Linear Programming16.5.2 Additivity 16.5.3 Divisibility and Nonnegativity 16.5.4 Certainty
16.6.1 Production Mix Problem
16.7 Final Remarks16.6.2 Blending or Mixing Problem 16.6.3 Diet Problem 16.6.4 Capital Budget Problems 16.6.5 Portfolio Selection Problem 16.6.6 Production and Inventory Problem 16.6.7 Aggregated Planning Problem 16.8 Exercises 17. Solution of Linear Programming Problems
17.1 Introduction
17.2 Graphical Solution of a Linear Programming Problem
17.2.1 Linear Programming Maximization Problem with a Single Optimal Solution
17.3 Analytical Solution of a Linear Programming Problem in Which m < n17.2.2 Linear Programming Minimization Problem With a Single Optimal Solution 17.2.3 Special Cases 17.4 The Simplex Method
17.4.1 Logic of the Simplex Method
17.5 Solution by Using a Computer17.4.2 Analytical Solution of the Simplex method for Maximization Problems 17.4.3 Tabular Form of the Simplex Method for Maximization Problems 17.4.4 The Simplex Method for Minimization Problems Special Cases of the Simplex Method
17.5.1 Solver in Excel
17.6 Sensitivity Analysis17.5.2 Solution of the Examples found in Section 16.6 of Chapter 16 using Solver in Excel 17.5.3 Solver Error Messages for Unlimited and Infeasible Solutions 17.5.4 Result Analysis by Using the Solver Answer and Limits Reports
17.6.1 Alteration in one of the Objective Function Coefficients (Graphical Solution)
17.7 Exercises17.6.2 Alteration in One of the Constants on the RightHand Side of the Constraint and Concept of Shadow Price (Graphical Solution) 17.6.3 Reduced Cost 17.6.4 Sensitivity Analysis With Solver in Excel 18. Network Programming
18.1 Introduction
18.2 Terminology of Graphs and Networks 18.3 Classic Transportation Problem
18.3.1 Mathematical Formulation of the Classic Transportation Problem
18.4 Transhipment Problem18.3.2 Balancing the Transportation Problem When the Total Supply Capacity Is Not Equal to the Total Demand Consumed 18.3.3 Solution of the Classic Transportation Problem
18.4.1 Mathematical Formulation of the Transhipment Problem
18.5 Job Assignment Problem18.4.2 Solution of the Transhipment Problem Using Excel Solver
18.5.1 Mathematical Formulation of the Job Assignment Problem
18.6 Shortest Path Problem18.5.2 Solution of the Job Assignment Problem Using Excel Solver
18.6.1 Mathematical Formulation of the Shortest Path Problem
18.7 Maximum Flow Problem18.6.2 Solution of the Shortest Path Problem Using Excel Solver
18.7.1 Mathematical Formulation of the Maximum Flow Problem
18.8 Exercises18.7.2 Solution of the Maximum Flow Problem Using Excel Solver 19. Integer Programming
19.1 Introduction
19.2 Mathematical Formulation of a General Model for Integer Programming and/or Binary and Linear Relaxation 19.3 The Knapsack Problem
19.3.1 Modeling of the Knapsack Problem
19.4 The Capital Budgeting Problem as a Model of Binary Programming19.3.2 Solution of the Knapsack Problem Using Excel Solver
18.4.1 Solution of the Capital Budgeting Problem as a Model of Binary Programming Using Excel Solver
19.5 The Traveling Salesman Problem
19.5.1 Modeling of the Traveling Salesman Problem
19.6 The Facility Location Problem19.5.2 Solution of the Traveling Salesman Problem Using Excel Solver
19.6.1 Modeling of the Facility Location Problem
19.7 The Staff Scheduling Problem19.6.2 Solution of the Facility Location Problem Using Excel Solver
19.7.1 Solution of the Staff Scheduling Problem Using Excel Solver
19.8 Exercisess20. Simulation and Risk Analysis
20.1 Introduction to Simulation
20.2 The Monte Carlo Method 20.3 Monte Carlo Simulation in Excel
20.3.1 Generation of Random Numbers and Probability Distributions in Excel
20.4 Final Remarks20.3.2 Practical Examples 20.5 Exercisess Part VIII Other Topics
21. Design and Analysis of Experiments
21.1 Introduction
21.2 Steps in the Design of Experiments 21.3 The Four Principles of Experimental Design 21.4 Types of Experimental Design
21.4.1 Completely Randomized Design (CRD)
21.5 OneWay Analysis of Variance21.4.2 Randomized Block Design (RBD) 21.4.3 Factorial Design (FD) 21.6 Factorial ANOVA 21.7 Final Remarks 21.8 Exercises 22. Statistical Process Control
22.1 Introduction
22.2 Estimating the Process Mean and Variability 22.3 Control Charts for Variables
22.3.1 Control Charts for X̅ and R
22.4 Control Charts for Attributes22.3.2 Control Charts for X̅ and S
22.4.1 P Chart (Defective Fraction)
22.5 Process Capability22.4.2 np Chart (Number of Defective Products) 22.4.3 C Chart (Total Number of Defects per Unit) 22.4.4 U Chart (Average Number of Defects per Unit)
22.5.1 C_{p} Index
22.6 Final Remarks22.5.2 C_{pk} Index 22.5.3 C_{pm} and C_{pmk} Index 22.7 Exercises 23. Data Mining and Multilevel Modeling
23.1 Introduction to Data Mining
23.2 Multilevel Modeling 23.3 Nested Data Structures 23.4 Hierarchical Linear Models
23.4.1 TwoLevel Hierarchical Linear Models With Clustered Data (HLM2)
23.5 Estimation of Hierarchical Linear Models in Stata23.4.2 ThreeLevel Hierarchical Linear Models With Repeated Measures (HLM3)
23.5.1 Estimation of a TwoLevel Hierarchical Linear Model With Clustered Data in Stata
23.6 Estimation of Hierarchical Linear Models in SPSS23.5.2 Estimation of a ThreeLevel Hierarchical Linear Model With Repeated Measures in Stata
23.6.1 Estimation of a TwoLevel Hierarchical Linear Model With Clustered Data in SPSS
23.7 Final Remarks23.6.2 Estimation of a ThreeLevel Hierarchical Linear Model With Repeated Measures in SPSS 23.8 Exercises Appendix
A.1 Hierarchical Nonlinear Models
Answers
Appendices References Index 
Learn
NetCourses
Classroom & web training
Onsite training
Webinars
Video tutorials
Thirdparty courses
Web resources
Teaching with Stata
© Copyright 1996–2023 StataCorp LLC. All rights reserved.
×
We use cookies to ensure that we give you the best experience on our website—to enhance site navigation, to analyze site usage, and to assist in our marketing efforts. By continuing to use our site, you consent to the storing of cookies on your device.
Cookie Settings
Last updated: 16 November 2022
StataCorp LLC (StataCorp) strives to provide our users with exceptional products and services. To do so, we must collect personal information from you. This information is necessary to conduct business with our existing and potential customers. We collect and use this information only where we may legally do so. This policy explains what personal information we collect, how we use it, and what rights you have to that information.
These cookies are essential for our website to function and do not store any personally identifiable information. These cookies cannot be disabled.
This website uses cookies to provide you with a better user experience. A cookie is a small piece of data our website stores on a site visitor's hard drive and accesses each time you visit so we can improve your access to our site, better understand how you use our site, and serve you content that may be of interest to you. For instance, we store a cookie when you log in to our shopping cart so that we can maintain your shopping cart should you not complete checkout. These cookies do not directly store your personal information, but they do support the ability to uniquely identify your internet browser and device.
Please note: Clearing your browser cookies at any time will undo preferences saved here. The option selected here will apply only to the device you are currently using.