Using Stata to run Discrete Choice Experiments.

Experimental design, discrete choice tools and interfaces to bayesian statistical software - An illustration.

Introduction

Discrete Choice Experiments, a.k.a. choice based conjoint

  • Statistical experimental design (within a web survey)

  • Decision theory - Discrete choice demand theory

  • Econometrics - Discrete choice models - MNL, Mixed logit, …, in a panel data setting

Revealed vs Stated preferences

Revealed Preferences

  • Observational data - results from consumer behavior in the marketplace

  • Prices result from market equilibrium - price is typically endogenous

  • Explanatory variables have little variability in the market.

  • Explanatory variables are collinear in the market.

  • Abundant data (in IT systems).

  • Complex statistical models

Stated Preferences

  • Stated preferences - survey data

  • Price variation is constructed by analyst - price is exogenous by construction.

  • Can calculate demand for new products or products with new features

  • Products for which there is no market.

  • Relatively inexpensive data

  • Relatively simple models (hopefully)

  • Relatively complex survey design

Example - Demand for Masters Degree

8 attributes, each with 2 ou 3 levels.

1 attributes - fees with 6 levels

Attribute Levels
Ranking Finantial Times top 20; Finantial Times top 100; Not ranked
Generic/specific levels Generic (Econonics/Management); MSc Finance; MSc Marketing
Attribute Levels
Duration levels 1 year equivalent; 2 years equivalent
Full/Part-time Full-time; Part-time evenings; Part-time weekends
International accreditation Yes; No
Internship Yes; No
Merit scholarship Available; Not available
Attribute Levels
job prospects 100 % employed upon termination; 80 % employed after 3 months; < 80 % employed after 6 months
Tuition fee 5 000; 9 000; 13 000; 17 000; 21 000; 25 000 EUR

A total of \(3x3x2x3x2x2x3x6=7776\) distinct possibilities

If we combine them into groups of 3 we have a total of 78,333,933,600 choice sets

There were only 24 questions !!!

Screenshot

Screenshot

Experimental design

  • Many options

  • Here - done by hand !!! (using Kuhfeld’s / Sloane’s orthogonal arrays libraries)

  • In Stata

    • Random design
    • User-written commands for fractional factorial designs
    • User-written commands for DCE design
  • Other specific software for DCE - eg NGene

  • Other generic software for statistical experiments

Demand Theory

Consumer \(n\) has utility for product \(i\) given by \(U_{ni}\)

\[U_{ni} =U(x_i,p_i,v_n)\]

where \(x_i\) are product characteristics, \(p_i\) is the price, \(v_n\) are parameters that characterize consumer preferences.

Traditional specification

\[U_{ni} = - p_i\alpha_n +x_i\beta_n +\varepsilon_{ni}\]

Here

\[v_n=(\alpha_n,\beta_n,\varepsilon_{ni})\]

In the simplest case \(\alpha_n=\alpha\), \(\beta_n=\beta\) and \(\varepsilon_{ni}\) has a type I (Gumbel) extreme value distribution.

The consumer chooses product \(i\) with maximum utility. The probability of doing so is:

\[s_i=\frac{\exp(V_i)}{\sum_k \exp(V_k)}\]

where

\[V_i=- p_i\alpha +x_i\beta\]

Stata commands


global xvars "out_opt datt1_top100 datt1_nrank  datt2_fin  datt2_mark datt3_2y datt4_ptev datt4_ptwe datt5_noia datt6_noint datt7_noms datt8_emp80 datt8_empl80"

clogit choice_m $xvars fees, group(gid)
est sto mnl_2018_2023

local cmd "nlcom" 
foreach v in $xvars {
local cmd "`cmd' (`v':-_b[`v']/_b[fees])" 
}
`cmd', post
est sto wtp_2018_2023

Results

62 students in total.

----------------------------------------
    Variable |     mnl           wtp  
-------------+--------------------------
choice_m     |
     out_opt |  -2.751***    -72.834***           
datt1_top100 |  -0.342***     -9.063**            
 datt1_nrank |  -1.072***    -28.385***           
   datt2_fin |  -0.709***    -18.763***           
  datt2_mark |  -1.238***    -32.761***           
    datt3_2y |  -0.433***    -11.451***           
  datt4_ptev |   0.092         2.441              
  datt4_ptwe |   0.081         2.136              
  datt5_noia |  -0.301***     -7.978***           
 datt6_noint |  -0.320***     -8.483***           
  datt7_noms |  -0.112        -2.954              
 datt8_emp80 |  -0.873***    -23.100***           
datt8_empl80 |  -1.237***    -32.754***           
        fees |  -0.038***               
-------------+--------------------------

MNL coefs

MNL WTP

MNL coefs

MNL coefs

Bayesian models

Use stata interface to python to run models in stan.

Stan model - MNL

model {
    vector[NOBS] xb;
    array[NRES,NCHO] vector[NALT] V;
    
    // Utilities
    xb=x*beta; 
    for (n in 1:NOBS){
        V[r_id[n],c_id[n],a_id[n]] = xb[n] ; 
    }
    
    // LogL
    for (i in 1:NRES){
        for (j in 1:NCHO){
      target+= categorical_logit_lpmf(y[i,j]|V[i,j]); 
        }
    }

  beta  ~ normal(0, 100.0);
}

Stan model - Mixed Logit

Only a few lines of code change.

model {
  
    array[NRES,NCHO] vector[NALT] V;

  // Utilities
  for (n in 1:NOBS){
    V[r_id[n],c_id[n],a_id[n]] = x[n]*b_n[r_id[n]] ;
  }
  
  // LogL  
  for (i in 1:NRES){
    for (j in 1:NCHO){
      target+=categorical_logit_lpmf(y[i,j]|V[i,j]);
    }
  }

//Priors

  b_m ~ normal(0,10.0); //hyperprior
  b_s ~ cauchy(0,2.5);  //hyperprior
  for (i in 1:NRES){
      eta_i[i] ~ normal(0,1);
    }

}

Run via Stata

Prepare


python:
import os
import numpy as np
from sfi import Data, Macro
import cmdstanpy as stan
from cmdstanpy import CmdStanModel
stan.set_cmdstan_path(os.path.join('c:\\','cmdstan'))
end

Send data to Python


python
xvars=["out_opt","datt1_top100","datt1_nrank","datt2_fin","datt2_mark","datt3_2y","datt4_ptev","datt4_ptwe","datt5_noia","datt6_noint","datt7_noms","datt8_emp80","datt8_empl80","fees"]
y=np.asarray(Data.get("choice_m"))
x=np.asarray(Data.get(xvars))
ids=np.asarray(Data.get(["idy","chset"]))
alt=np.asarray(Data.get("alt"))

end

Compile stan models


python:
fil_mod_mnl=os.path.join('stan','mod_mnl.stan')
mod_mnl = CmdStanModel(stan_file=fil_mod_mnl)
end

python:
fil_mod_mxl=os.path.join('stan','mod_mxl.stan')
mod_mxl = CmdStanModel(stan_file=fil_mod_mxl)
end

Sample from posterior


python:
fit_mnl_bayes = mod_mnl.sample(
   data = stan_data
  ,seed = 123
  ,chains = 4
  ,parallel_chains = 4
  ,refresh = 500 
)
end

Results …

OUtside option

Not ranked

Marketing

Fees

Thank you !