Using Stata to run Discrete Choice Experiments.

Experimental design, discrete choice tools and interfaces to bayesian statistical software - An illustration.

Introduction

Discrete Choice Experiments, a.k.a. choice based conjoint

Statistical experimental design (within a web survey)
Decision theory - Discrete choice demand theory
Econometrics - Discrete choice models - MNL, Mixed logit, …, in a panel data setting

Revealed vs Stated preferences

Revealed Preferences

Observational data - results from consumer behavior in the marketplace
Prices result from market equilibrium - price is typically endogenous
Explanatory variables have little variability in the market.
Explanatory variables are collinear in the market.
Abundant data (in IT systems).
Complex statistical models

Stated Preferences

Stated preferences - survey data
Price variation is constructed by analyst - price is exogenous by construction.
Can calculate demand for new products or products with new features
Products for which there is no market.
Relatively inexpensive data
Relatively simple models (hopefully)
Relatively complex survey design

Example - Demand for Masters Degree

8 attributes, each with 2 ou 3 levels.

1 attributes - fees with 6 levels

Attribute	Levels
Ranking	Finantial Times top 20; Finantial Times top 100; Not ranked
Generic/specific levels	Generic (Econonics/Management); MSc Finance; MSc Marketing

Attribute	Levels
Duration levels	1 year equivalent; 2 years equivalent
Full/Part-time	Full-time; Part-time evenings; Part-time weekends
International accreditation	Yes; No
Internship	Yes; No
Merit scholarship	Available; Not available

Attribute	Levels
job prospects	100 % employed upon termination; 80 % employed after 3 months; < 80 % employed after 6 months
Tuition fee	5 000; 9 000; 13 000; 17 000; 21 000; 25 000 EUR

A total of \(3x3x2x3x2x2x3x6=7776\) distinct possibilities

If we combine them into groups of 3 we have a total of 78,333,933,600 choice sets

There were only 24 questions !!!

Screenshot

Experimental design

Many options
Here - done by hand !!! (using Kuhfeld’s / Sloane’s orthogonal arrays libraries)
In Stata
- Random design
- User-written commands for fractional factorial designs
- User-written commands for DCE design
Other specific software for DCE - eg NGene
Other generic software for statistical experiments

Demand Theory

Consumer \(n\) has utility for product \(i\) given by \(U_{ni}\)

\[U_{ni} =U(x_i,p_i,v_n)\]

where \(x_i\) are product characteristics, \(p_i\) is the price, \(v_n\) are parameters that characterize consumer preferences.

Traditional specification

\[U_{ni} = - p_i\alpha_n +x_i\beta_n +\varepsilon_{ni}\]

Here

\[v_n=(\alpha_n,\beta_n,\varepsilon_{ni})\]

In the simplest case \(\alpha_n=\alpha\), \(\beta_n=\beta\) and \(\varepsilon_{ni}\) has a type I (Gumbel) extreme value distribution.

The consumer chooses product \(i\) with maximum utility. The probability of doing so is:

\[s_i=\frac{\exp(V_i)}{\sum_k \exp(V_k)}\]

where

\[V_i=- p_i\alpha +x_i\beta\]

Stata commands


global xvars "out_opt datt1_top100 datt1_nrank  datt2_fin  datt2_mark datt3_2y datt4_ptev datt4_ptwe datt5_noia datt6_noint datt7_noms datt8_emp80 datt8_empl80"

clogit choice_m $xvars fees, group(gid)
est sto mnl_2018_2023

local cmd "nlcom" 
foreach v in $xvars {
local cmd "`cmd' (`v':-_b[`v']/_b[fees])" 
}
`cmd', post
est sto wtp_2018_2023

Results

62 students in total.

----------------------------------------
    Variable |     mnl           wtp  
-------------+--------------------------
choice_m     |
     out_opt |  -2.751***    -72.834***           
datt1_top100 |  -0.342***     -9.063**            
 datt1_nrank |  -1.072***    -28.385***           
   datt2_fin |  -0.709***    -18.763***           
  datt2_mark |  -1.238***    -32.761***           
    datt3_2y |  -0.433***    -11.451***           
  datt4_ptev |   0.092         2.441              
  datt4_ptwe |   0.081         2.136              
  datt5_noia |  -0.301***     -7.978***           
 datt6_noint |  -0.320***     -8.483***           
  datt7_noms |  -0.112        -2.954              
 datt8_emp80 |  -0.873***    -23.100***           
datt8_empl80 |  -1.237***    -32.754***           
        fees |  -0.038***               
-------------+--------------------------

MNL coefs

Bayesian models

Use stata interface to python to run models in stan.

Stan model - MNL

model {
    vector[NOBS] xb;
    array[NRES,NCHO] vector[NALT] V;
    
    // Utilities
    xb=x*beta; 
    for (n in 1:NOBS){
        V[r_id[n],c_id[n],a_id[n]] = xb[n] ; 
    }
    
    // LogL
    for (i in 1:NRES){
        for (j in 1:NCHO){
      target+= categorical_logit_lpmf(y[i,j]|V[i,j]); 
        }
    }

  beta  ~ normal(0, 100.0);
}

Stan model - Mixed Logit

Only a few lines of code change.

model {
  
    array[NRES,NCHO] vector[NALT] V;

  // Utilities
  for (n in 1:NOBS){
    V[r_id[n],c_id[n],a_id[n]] = x[n]*b_n[r_id[n]] ;
  }
  
  // LogL  
  for (i in 1:NRES){
    for (j in 1:NCHO){
      target+=categorical_logit_lpmf(y[i,j]|V[i,j]);
    }
  }

//Priors

  b_m ~ normal(0,10.0); //hyperprior
  b_s ~ cauchy(0,2.5);  //hyperprior
  for (i in 1:NRES){
      eta_i[i] ~ normal(0,1);
    }

}

Run via Stata

Prepare


python:
import os
import numpy as np
from sfi import Data, Macro
import cmdstanpy as stan
from cmdstanpy import CmdStanModel
stan.set_cmdstan_path(os.path.join('c:\\','cmdstan'))
end

Send data to Python


python
xvars=["out_opt","datt1_top100","datt1_nrank","datt2_fin","datt2_mark","datt3_2y","datt4_ptev","datt4_ptwe","datt5_noia","datt6_noint","datt7_noms","datt8_emp80","datt8_empl80","fees"]
y=np.asarray(Data.get("choice_m"))
x=np.asarray(Data.get(xvars))
ids=np.asarray(Data.get(["idy","chset"]))
alt=np.asarray(Data.get("alt"))

end

Compile stan models


python:
fil_mod_mnl=os.path.join('stan','mod_mnl.stan')
mod_mnl = CmdStanModel(stan_file=fil_mod_mnl)
end


python:
fil_mod_mxl=os.path.join('stan','mod_mxl.stan')
mod_mxl = CmdStanModel(stan_file=fil_mod_mxl)
end

Sample from posterior


python:
fit_mnl_bayes = mod_mnl.sample(
   data = stan_data
  ,seed = 123
  ,chains = 4
  ,parallel_chains = 4
  ,refresh = 500 
)
end

Results …

OUtside option

Not ranked

Marketing

Fees

Thank you !