贝叶斯计量经济分析与Stata应用

王群勇 (南开大学数量经济研究所, [email protected])

内容

引言

马尔科夫链蒙特卡洛模拟(MCMC)

计量模型的贝叶斯估计

特定模型的似然函数或后验分布

贝叶斯预测

贝叶斯计量方法

贝叶斯方法提供了与传统频数方法不同的视角,参数估计、模型选择等。

贝叶斯估计通过蒙特卡洛模拟获得参数的抽样分布,更适合于复杂的模型。

贝叶斯推断的优势:将多种不确定性,包括数据、模型和参数,融合到统一的框架内。

内容

引言

马尔科夫链蒙特卡洛模拟(MCMC)

计量模型的贝叶斯估计

特定模型的似然函数或后验分布

贝叶斯预测

马尔科夫链

随机过程\(X_t\)取不同数值\(S=(1,2,\cdots, s)\),转移概率为 \[ p_{ij} = P(X_{t+1}=j|X_t=i), i,j \in S. \]

比如, \[ P=\begin{bmatrix} p_{11} & p_{12} \\ p_{21} & p_{22} \end{bmatrix} = \begin{bmatrix} 1-\alpha & \alpha \\ \beta & 1-\beta \end{bmatrix}. \]

马尔可夫特征:随机变量的\(t\)期状态只取决于\(t-1\)期状态。即下一步去哪里只取决于变量当前所在的位置,而不取决于是怎么到达当前位置的。

马尔科夫链:满足马尔科夫特征的随机过程。

马尔科夫链

如果\(\pi_t=(\pi_1, \cdots, \pi_s)\),那么\(\pi_{t+1} = \pi_t P\). \[ \begin{bmatrix} \pi_{1,t+1} & \pi_{2,t+1} \end{bmatrix} = \begin{bmatrix} \pi_{1t} & \pi_{2t} \end{bmatrix} \begin{bmatrix} p_{11} & p_{12} \\ p_{21} & p_{22} \end{bmatrix} \]

\(\pi_{t+n} = \pi_{t+n-1} P=\pi_{t+n-2} P^2=\pi_{t} P^n\).

平稳分布:如果分布\(\pi_t\)不再变化,即\(\pi = \pi P\)\(\pi\)称为平稳分布。

\(\pi (I-P) = 0\),所以\(\pi\)为P矩阵特征值1对应的特征向量。

如果转移矩阵为 \[ \begin{bmatrix} 0.75 & 0.25 \\ 0.125 & 0.875 \end{bmatrix} \] 可以求出其对应的平稳分布为(1/3, 2/3).

马尔科夫链

对于一个转移矩阵,平稳分布是不是唯一的? \[ \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} \]

无穷多平稳分布:转移矩阵为 \[ \begin{bmatrix} 1/3 & 2/3 & 0 \\ 1/3 & 2/3 & 0 \\ 0 & 0 & 1 \end{bmatrix} \]

\(P^{(n)} = \pi_{t} P^n\),如果\(P^{(n)}_{ij}>0\),称\(j\)\(i\)可达(accessible)。

如果\(P^{(n)}_{ij}>0\)\(P^{(n)}_{ji}>0\),称\(i\)\(j\)互通(communicate)。一组互通的状态构成互通类(communicate class)。

只包含一个互通类的马尔科夫链称为不可约的(irreducible)。

马尔科夫链

定理:不可约的马尔科夫链具有唯一的平稳分布。

含义:对于目标分布,如果能找到其对应的马尔科夫链,那么总可以到达该分布。

如何找到马尔科夫链:MCMC

MCMC

MCMC: Gibbs抽样(Geman and Geman, 1984), Metropolis-Hastings抽样(Metropolis et al., 1953; Hastings, 1970).

MH抽样;设目标分布为\(f(x)\)

  1. 给定\(x_t\), 定义工具分布\(q\), 从\(q(y|x_t\))\(抽取随机数\)y_t$

  2. 定义接受概率(acceptance probability): \(\rho(x_t, y_t)=\min \left[ \frac{f(y_t)}{f(x_t)} \frac{q(x_t|y_t)}{q(y_t|x_t)}, 1 \right]\), \[ x_{t+1} = \begin{cases} y_{t} & \rho(x_t, y_t) \\ x_{t} & 1- \rho(x_t, y_t) \\ \end{cases} \]

对称情形:\(\rho(x_t, y_t)=\min \left[ \frac{f(y_t)}{f(x_t)}, 1 \right]\),

独立情形:\(\rho(x_t, y_t)=\min \left[ \frac{f(y_t)}{f(x_t)} \frac{q(x_t)}{q(y_t)}, 1 \right]\),

MH抽样

target: Beta(2.7, 6.3); proposal: uniform(0,1).

. clear

. set obs 5000
Number of observations (_N) was 0, now 5,000.

. gen x=.
(5,000 missing values generated)

. mata:
───────────────────────────────────────────────── mata (type end to exit) ───────────────────────────────────────────────────
: n = 5000

: x = runiform(n,1)

: for (i=2; i<=n; i++) {
>  y = runiform(1,1)
>  rho = betaden(2.7, 6.3, y)/betaden(2.7, 6.3, x[i-1])
>  x[i] = runiform(1,1)<rho ? y : x[i-1]
> }

: st_store((1,n), "x", x)

: end
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

. twoway (function y=betaden(2.7,6.3,x), range(0 1)) (kdensity x, lp(dash))


MH抽样

target: Cauchy; proposal: t(1).

. clear

. set obs 2000
Number of observations (_N) was 0, now 2,000.

. gen x=.
(2,000 missing values generated)

. mata:
───────────────────────────────────────────────── mata (type end to exit) ───────────────────────────────────────────────────
: n = 5000 // burn-in sample:3000

: x = J(n,1,0) // initializing

: for (i=2; i<=n; i++) {
>  y = rt(1,1,1) // proposal density: t(1)
>  denom = cauchyden(0,1,x[i-1])*tden(1,y)
>  rho = cauchyden(0,1,y)*tden(1,x[i-1])/denom
>  x[i] = runiform(1,1)<rho ? y : x[i-1]
> }

: st_store((1,2000), "x", x[3001..n])

: end
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

. twoway (function y=1/(c(pi)*(1+x^2)), range(-10 10)) ///
> (kdensity x if abs(x)<10, lp(dash))


Random-walk MH抽样

给定\(x_t\), 抽取随机数\(y_t = x_t + u_t, u_t \sim q(u)\)

target: Beta(2.7, 6.3); proposal: uniform(0,1).

. clear

. set obs 5000
Number of observations (_N) was 0, now 5,000.

. gen x=.
(5,000 missing values generated)

. mata:
───────────────────────────────────────────────── mata (type end to exit) ───────────────────────────────────────────────────
: n = 5000

: x = runiform(n,1)

: for (i=2; i<=n; i++) {
>  y = x[i-1] + rnormal(1,1,0,1)
>  rho = betaden(2.7, 6.3, y)/betaden(2.7, 6.3, x[i-1])
>  x[i] = runiform(1,1)<rho ? y : x[i-1]
> }

: st_store((1,n), "x", x)

: end
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

. twoway (function y=betaden(2.7,6.3,x), range(0 1)) (kdensity x, lp(dash)) , legend(off) xtitle("")


Gibbs抽样

将随机变量分为两组,\((x_t, y_t)\),联合密度为\(f(x,y)\)

  1. 抽取\(x_0\)

  2. \(t=1,2,\cdots\),抽取\(y_{t+1} \sim f(y|x_t)\), \(x_{t+1} \sim f(x|y_{t+1})\)

例:二元正态分布。

  1. 抽取\(x_0=0\)

  2. \(t=1,2,\cdots\),抽取 \[ \begin{eqnarray*} y_{t+1} | x_t & \sim & N \left( \frac{\sigma_1}{\sigma_2} \rho x_t,(1-\rho^2) \sigma_1^2 \right) \\ x_{t+1} | y_{t+1} & \sim & N \left( \frac{\sigma_2}{\sigma_1} \rho y_{t+1},(1-\rho^2) \sigma_2^2 \right) \\ \end{eqnarray*} \]

可以直接扩展到随机变量分为多组的情况。

Gibbs抽样

对协方差矩阵的抽样一般采用Gibbs抽样,inverse Wishart为最广泛采用的先验分布。

适应性MCMC

\(T_0\)为burn-in样本量,\(T\)为MCMC的样本量,整个MCMC迭代的次数为\(T_{total}=T_0+(T-1)\times thinning + 1\)

MH抽样生成试验值的公式为: \(\theta_{*} = \theta_{t-1} + e, e \sim N(0, \rho_k^2 \Sigma_k)\),所谓适应性MH抽样是指每隔一定的区间调整\(\rho_k\)\(\Sigma_k\),以实现最优的接受率(TAR)。Gelman, Gilks and Roberts (1997)证明,单一参数的TAR为0.44,多个参数的TAR为0.234。

设Alen为每个适应性区间的长度(adaptation(every()))。可以设置最小迭代次数(adaptation(miniter()))和最大迭代次数(adaptation(maxiter()))。

适应性MH算法的步骤

初始化:\(\theta_t = \theta_0^{(f)}\)\(\rho_0=2.38/\sqrt{d}\),其中d为参数个数。\(\Sigma_0 = I\)或者由covariance()选项自行设定。

  1. 生成试验值: \(\theta_{*} = \theta_{t-1} + e, e \sim N(0, \rho_k^2 \Sigma_k)\)

  2. 接受概率为 \[ \alpha(\theta_{*}|\theta_{t-1}) = \text{min} \left{ \frac{p(\theta_{*}|y)}{p(\theta_{t-1}|y)}, 1 \right}. \]

其中,\(p(\theta_{*}|y) = f(y|\theta) p(\theta)\)\(\theta\)的后验分布。

\(\rho_k\)\(\Sigma_k\)的更新公式为: \[ \rho_k = \rho_{k-1} \exp( \beta_k [\Phi^{-1}(AR_k/2) - \Phi^{-1}(TAR/2) ] ) \]

其中, \[ AR_k = (1-\alpha) AR_{k-1} + \alpha AP \]

适应性MH算法的步骤

AP为接受概率\(\alpha(\theta_{*}|\theta_{t-1})\)的均值。\(\alpha\)的默认值为0.75,或者由adaptation(alpha())自行设定。

\[ \Sigma_k = (1-\beta_k) \Sigma_{k-1} + \beta_k \hat{\Sigma}_k, \beta_k = \beta_0/k^{\gamma}. \]

其中,\(\Sigma_0 = I\)或者由covariance()选项自行设定;\(\hat{\Sigma}_k\)为MCMC样本的协方差。\(\beta_0\)的默认值为0.8,或者由adaptation(beta())自行设定。\(\gamma\)的默认值为0,或者由adaptation(gamma())自行设定。

内容

引言

马尔科夫链蒙特卡洛模拟(MCMC)

计量模型的贝叶斯估计与检验

特定模型的似然函数或后验分布

贝叶斯预测

贝叶斯定理

\[ \pi(\theta|y) = \frac{f(y|\theta) \pi(\theta)}{f(y)} \propto f(y|\theta) \pi(\theta). \]

其中,\(f(y|\theta) \pi(\theta)\)叫做贝叶斯核(kernel)。

贝叶斯估计

如果先验分布\(\pi(\theta)\)对后验分布的影响非常小, 称其是无信息的(noninformative, vague, diffuse, flat)。

如果\(\pi(\theta)\)的积分不等于1,称其是不恰当的(improper)。

Jeffreys先验:\(\pi(\theta) \propto |I(\theta)|^{1/2}\),其中,\(I(\theta)\)为Fisher信息矩阵, \[ I(\theta) = -E\left[ \frac{\partial^2 \log(p(y|\theta))}{\partial \theta^2} \right]. \]

线性模型的贝叶斯估计

模型: \(y=x\beta + u\)

似然函数: \(y_i \sim Normal(x_i \beta, \sigma^2)\)

先验: \[ \begin{eqnarray*} \beta & \sim & \text{Mumltivariate-Normal}(\beta_0, V_0) \\ \sigma^2 & \sim & \text{Inverse-Gamma}(\alpha_0/2, \delta_0/2) \\ \end{eqnarray*} \]

note: inverse-gamma distribution

一般的逆Gamma分布,InverseGamma[\(\alpha, \beta,\gamma,\mu\)] 定义域为(\(\mu, \infty\)),\(\mu\)为位置参数,\(\alpha, \gamma\)为形状参数,\(\beta\)为刻度参数。

InverseGamma[\(\alpha, \beta\)]表示InverseGamma[\(\alpha, \beta, 1, 0\)], pdf为 \[ pdf(x) = \frac{\exp(-\beta/x) (\beta/x)^{\alpha}}{x \Gamma(\alpha)} \]

\[ E(x) = \frac{\beta}{\alpha-1} (\text{ for } \alpha>1); Var(x) = \frac{\beta^2}{(\alpha-2)(\alpha-1)^2} (\text{ for } \alpha>2). \]

如果\(x\sim Gamma(\alpha, \beta)\),那么\(1/x \sim Inv-Gamma(\alpha, 1/beta)\).

线性模型

边际后验分布: \[ \beta | \sigma^2,y \sim \text{Mumltivariate-Normal} \left( \beta_1, V_1 \right) \]

其中, \[ V_1 = \left[ \sigma^{-2} X'X + V_0^{-1} \right]^{-1}, \beta_1 = V_1 \left[ \sigma^{-2} X'y + V_0^{-1} \beta_0 \right] \]

\[ \sigma^2 | \beta,y \sim \text{Inverse-Gamma}\left( (\alpha_0+n)/2, [\delta_0+(y-x\beta)'(y-x\beta)]/2 \right) . \]

线性模型Gibbs抽样

\(\beta\) are sampled in one block (Greenberg, 2013).

  1. 设定初始值\(\sigma^{2(0)}\).

  2. \(t\)次迭代, 从下面的分布抽样 \[ \begin{eqnarray*} \beta^{(t)} | \sigma^2,y & \sim & \text{MNormal} \left( \beta_1, V_1 \right) \\ \sigma^2 | \beta,y & \sim & \text{IG}\left( (\alpha_0+n)/2, [\delta_0+(y-x\beta^{(t)})'(y-x\beta^{(t)})]/2 \right) . \end{eqnarray*} \]

Stata指令

bayes [, bayesopts] : estimation_command [, estopts]

bayesopts note
gibbs 只适用于regress, mvreg等模型
normalprior(#) 回归系数的正态先验分布的标准差,默认值为100
igammaprior(# #) 回归模型的方差的先验分布:inverse-gamma分布(shape, scale),默认值为igammaprior(0.01 0.01)
iwishartprior(#) 随机效应协方差矩阵的先验分布:Inverse-wishart分布(degree of freedom)
prior(priorspec) 自己设定先验分布

其中,priorspec的形式为:{[eqname:]param[, matrix]}, priordist。比如,

prior(educ, normal(0, 9)) prior(age, uniform(-3,3))
prior({dep:}, mvnormal(3, mvec, vmat))

例: MH抽样

. use mroz, clear

. bayes,  normalprior(10) igammaprior(0.01 0.01) : regress lwage educ age exper
  
Burn-in ...
Simulation ...

Model summary
──────────────────────────────────────────────────────────────────────────────
Likelihood: 
  lwage ~ regress(xb_lwage,{sigma2})

Priors: 
  {lwage:educ age exper _cons} ~ normal(0,100)                             (1)
                      {sigma2} ~ igamma(.01,.01)
──────────────────────────────────────────────────────────────────────────────
(1) Parameters are elements of the linear form xb_lwage.

Bayesian linear regression                       MCMC iterations  =     12,500
Random-walk Metropolis–Hastings sampling         Burn-in          =      2,500
                                                 MCMC sample size =     10,000
                                                 Number of obs    =        428
                                                 Acceptance rate  =      .3342
                                                 Efficiency:  min =     .06385
                                                              avg =      .0982
Log marginal-likelihood =  -467.9285                          max =      .2099
 
─────────────┬────────────────────────────────────────────────────────────────
             │                                                Equal-tailed
             │      Mean   Std. dev.     MCSE     Median  [95% cred. interval]
─────────────┼────────────────────────────────────────────────────────────────
lwage        │
        educ │  .1087602   .0140838   .000557   .1092518    .081044   .1356251
         age │ -.0011846   .0048595   .000184  -.0011781  -.0108816    .008066
       exper │  .0160051   .0044727   .000167   .0162019   .0069124   .0243769
       _cons │ -.3458634   .2641602   .009615  -.3501628   -.885464   .1888002
─────────────┼────────────────────────────────────────────────────────────────
      sigma2 │  .4509639   .0315607   .000689   .4500597    .394099   .5204984
─────────────┴────────────────────────────────────────────────────────────────
Note: Default priors are used for some model parameters.

例: Gibbs抽样

. use mroz, clear

. bayes,  gibbs normalprior(10) igammaprior(0.01 0.01) : regress lwage educ age exper
  
Burn-in ...
Simulation ...

Model summary
──────────────────────────────────────────────────────────────────────────────
Likelihood: 
  lwage ~ normal(xb_lwage,{sigma2})

Priors: 
  {lwage:educ age exper _cons} ~ normal(0,100)                             (1)
                      {sigma2} ~ igamma(.01,.01)
──────────────────────────────────────────────────────────────────────────────
(1) Parameters are elements of the linear form xb_lwage.

Bayesian linear regression                       MCMC iterations  =     12,500
Gibbs sampling                                   Burn-in          =      2,500
                                                 MCMC sample size =     10,000
                                                 Number of obs    =        428
                                                 Acceptance rate  =          1
                                                 Efficiency:  min =      .9761
                                                              avg =      .9952
Log marginal-likelihood = -467.84857                          max =          1
 
─────────────┬────────────────────────────────────────────────────────────────
             │                                                Equal-tailed
             │      Mean   Std. dev.     MCSE     Median  [95% cred. interval]
─────────────┼────────────────────────────────────────────────────────────────
lwage        │
        educ │  .1092317   .0142794   .000143   .1093622   .0815355   .1376194
         age │ -.0013859   .0048177   .000048  -.0013814  -.0109141   .0080745
       exper │  .0163677   .0046453   .000046   .0163429   .0072553   .0253992
       _cons │ -.3475648   .2653856   .002654  -.3446139  -.8709864   .1654805
─────────────┼────────────────────────────────────────────────────────────────
      sigma2 │  .4504266   .0311056   .000315   .4491257    .393057   .5144107
─────────────┴────────────────────────────────────────────────────────────────
Note: Default priors are used for some model parameters.

内容

引言

马尔科夫链蒙特卡洛模拟(MCMC)

计量模型的贝叶斯估计与检验

特定模型的似然函数或后验分布

贝叶斯预测

MCMC诊断

收敛性(是否达到平稳分布)

Efficiency: 大于0.1,视作良好。低于0.01,严重关注。

MCMC诊断

bayesgraph type which

其中,type包括:diag, trace, histogram, kdensity, ac, cusum, matrix.

which包括: _all, {[*eqname*:]*param*}

MCMC诊断:综合

. bayesgraph diag {lwage:educ}

MCMC诊断: 自相关图

. bayesgraph ac {lwage:}, byparm


贝叶斯估计的描述统计

所有参数:

. bayesstats ic

Bayesian information criteria

─────────────┬────────────────────────────────
             │       DIC    log(ML)    log(BF)
─────────────┼────────────────────────────────
      active │  877.4558  -467.8486          .
─────────────┴────────────────────────────────
Note: Marginal likelihood (ML) is computed
      using Laplace–Metropolis approximation.

. bayesstats ess

Efficiency summaries    MCMC sample size =    10,000
                        Efficiency:  min =     .9761
                                     avg =     .9952
                                     max =         1
 
─────────────┬──────────────────────────────────────
             │        ESS   Corr. time    Efficiency
─────────────┼──────────────────────────────────────
lwage        │
        educ │   10000.00         1.00        1.0000
         age │   10000.00         1.00        1.0000
       exper │   10000.00         1.00        1.0000
       _cons │   10000.00         1.00        1.0000
─────────────┼──────────────────────────────────────
      sigma2 │    9761.25         1.02        0.9761
─────────────┴──────────────────────────────────────

贝叶斯推断

所有参数:

bayesgraph trace {lwage:}, byparm
bayesgraph kdensity {lwage:}, byparm
bayesgraph hist {lwage:}, byparm
bayesgraph ac {lwage:}, byparm

单个参数:

bayesgraph trace {lwage:educ}
bayesgraph kdensity {lwage:educ}
bayesgraph hist {lwage:educ}
bayesgraph ac {lwage:educ}

模型选择

bayesstats ic: 三个统计指标。

模型选择

贝叶斯因子 (Jeffreys, 1961): \[ BF_{12} = \frac{P(y|M_1)}{P(y|M_2)} = \frac{m_1(y)}{m_2(y)}. \]

posterior odds: \[ \frac{P(M_1|y)}{P(M_2|y)} = \frac{P(y|M_1)P(M_1)}{P(y|M_2)P(M_2)} = BF_{12} \frac{P(M_1)}{P(M_2)}. \]

模型选择

\(\log_{10}(bf)\) bf 结论
0 - 1/2 1 - 3.2
1/2 - 1 3.2 - 10 较强 (substantial)
1 - 2 10 - 100 强 (strong)
>2
> 100 一致 (decisive)

Kass and Raftery (1995): 贝叶斯因子:

\(2\log_{e}(bf)\) bf 结论
0 - 2 1 - 3
2 - 6 3 - 20 positive
6 - 10 20 - 150 strong
>10 > 150 very strong

模型比较

bayestest model ...

可以用于比较不同的先验、不同的后验分布、不同的回归函数、不同的解释变量等。

条件:不同模型使用完全相同的样本;具有恰当的后验分布;MC收敛。

对参数落在某个区间的检验:

bayestest interval ({y:x1},lower(.9) upper(1.02)) ({y:x2},lower(-1.1) upper(-.8)), joint

先验分布的设定

Stata对不同模型的不同类型的参数设置了不同的先验分布。

回归系数的先验分布默认为独立的正态分布\(N(0, \sigma_{prior}^2)\),其中\(\sigma_{prior}^2=10,000\)

正值的参数(比如方差)默认先验分布为Inverse-Gamma(\(\alpha, \beta\))。默认值为\(\alpha=0.01\), \(\beta=0.01\)

排序选择模型的切点:先验分布为Unifrom(0,1)。

多水平模型:随机效应协方差矩阵

内容

引言

马尔科夫链蒙特卡洛模拟(MCMC)

计量模型的贝叶斯估计与检验

特定模型的似然函数或后验分布

贝叶斯预测

bayesmh的用法

bayesmh可以用于估计单方程线性模型、单方程非线性模型、多方程线性模型、多方程非线性模型、多水平模型、概率分布等。

bayesmh depvar [indepvarspec] [if] [in] [weight], likelihood(modelspec) prior(priorspec) llevaluator(log-likelihood) evaluator(log-postdist) [options]

三种用法:

likelihood()

似然函数即概率密度函数,不同分布涉及到的参数也不同。比如,正态分布有均值和方差两个参数;泊松分布只有一个参数,既为均值也是方差。

bayesmh中,分布的均值参数通过模型来设定。分布的形式和其它参数则通过likelihood()来设定。比如,

likelihood()

likelihood() 似然函数 分布形式
normal(var) 正态分布 \(y\sim N(xb, var)\)
lognormal(var) 对数正态分布 \(\ln(y)\sim N(xb, var)\)
exp 指数分布 \(y \sim \exp(xb)\)
probit 贝努里分布 \(P(y=1)=\Phi(xb)\)
logit 贝努里分布 \(P(y=1)=\Lambda(xb)\)
oprobit 多项式分布,概率为排序probit \(P(y=j)=P(c_{j-1}<y^*<c_j)\)
ologit 多项式分布,概率为排序logit \(P(y=j)=P(c_{j-1}<y^*<c_j)\)
poisson 泊松分布 \(P(y=j)=Poisson(xb)\)
binomial(n) 二项分布 \(P(y=j)=Binoamial(n,)\)
stexp 指数生存函数
stgamma(lns) Gamma生存函数(对数标度参数lhs)
stloglogistic(lns) log-logistic生存函数(对数标度参数lhs)
stlognormal(lnstd) 对数正态生存函数(对数标准差lnstd)
stweibull(lnp) Weibull生存函数(对数形状参数lnp)
llf() 对数似然函数的表达式

先验分布

Stata对不同模型的不同类型的参数设置了不同的先验分布。

回归系数的先验分布默认为独立的正态分布\(N(0, \sigma_{prior}^2)\),其中\(\sigma_{prior}^2=10,000\)

正值的参数(比如方差)默认先验分布为Inverse-Gamma(\(\alpha, \beta\))。默认值为\(\alpha=0.01\), \(\beta=0.01\)

排序选择模型的切点:先验分布为Unifrom(0,1)。

多水平模型:随机效应协方差矩阵

先验分布(一元)

priordist note
normal(mu, var) Normal(m, v)
t(mu, s2, df) location-scale t with mean mu, squared scale sigma2, and degrees of freedom df
lognormal(mu, var) log-normal(mu, var)
uniform(a,b) uniform on (a,b)
gamma(a,b) gamma with shape a and scale b
igamma(a,b) inverse-gamma with shape a and scale b
exponential(b) gamma with scale b
beta(a,b) beta with shape a and b
laplace(mu,b) laplace with location loc and scale b
cauchy(loc,b) beta with shape a and b
chi2(df) chi2 with degree of freedom
pareto(a,b) pareto with shape a and scale b
jeffreys Jeffreys prior

先验分布(多元)

priordist note
mvnormal(d, mu, var) d-dimensions with mean (mu) and vaiance (var)
mvnormal0(d, var) d-dimensions with zero-mean and vaiance (var)
zellnersg(d, g, mu, {var}) Zellner-g prior of dimension d with g dof, mean mu and variance {var}
zellnersg0(d, g, mu, {var}) Zellner-g prior of dimension d with g dof, zero-mean and variance {var}
dirichlet(a1, a2, …, ad) Dirichlet (multivariate beta) of dimension d with shape (a1,a2,…,ad)
wishart(d,df, V) Wishart of dimension d with DOF df and scale matrix V
iwishart(d,df, V) inverse-Wishart of dimension d with DOF df and scale matrix V
jeffreys(d) Jeffreys prior of d-variate normal dist.

先验分布 (离散)

priordist note
bernoulli(p) 0 or 1 (with probability p)
geometric(p) 0, 1, 2, … with geometric(p)
index(p1,…,pk) 1, 2, …, k with prob. (p1, p2, …, pk)
poisson(mu) 0, 1, 2, … with Poisson(mu)

先验分布 (其它形式)

priordist note
flat
density(f) generic density f
logdensity(logf) generic log-density logf

note: 扁平(flat)先验要谨慎使用。Flat先验是不恰当的(improper),即先验分布的积分并不为1(事实上为无穷大),可能导致不恰当的后验分布而使得无法进行贝叶斯推断。

Jeffreys先验:均值参数的Jeffreys先验即扁平先验: \(\pi(\mu)=1\);标准差参数的Jeffreys先验为倒数先验(inverse prior): \(\pi(\sigma)=1/\sigma\).

likelihood() + prior()

. use mroz, clear

. bayesmh lwage educ age exper, likelihood(normal({var}))  prior({lwage:}, flat) prior({var}, igamma(0.01, 0.01))
  
Burn-in ...
Simulation ...

Model summary
──────────────────────────────────────────────────────────────────────────────
Likelihood: 
  lwage ~ normal(xb_lwage,{var})

Priors: 
  {lwage:educ age exper _cons} ~ 1 (flat)                                  (1)
                         {var} ~ igamma(0.01,0.01)
──────────────────────────────────────────────────────────────────────────────
(1) Parameters are elements of the linear form xb_lwage.

Bayesian normal regression                       MCMC iterations  =     12,500
Random-walk Metropolis–Hastings sampling         Burn-in          =      2,500
                                                 MCMC sample size =     10,000
                                                 Number of obs    =        428
                                                 Acceptance rate  =      .2752
                                                 Efficiency:  min =     .03065
                                                              avg =     .04804
Log marginal-likelihood = -455.02345                          max =     .07165
 
─────────────┬────────────────────────────────────────────────────────────────
             │                                                Equal-tailed
             │      Mean   Std. dev.     MCSE     Median  [95% cred. interval]
─────────────┼────────────────────────────────────────────────────────────────
lwage        │
        educ │  .1093384   .0144143   .000675   .1094341   .0807039   .1364698
         age │  -.001229   .0048322   .000221  -.0012917  -.0106893   .0081376
       exper │  .0163394   .0045813   .000171   .0162681   .0074164   .0253667
       _cons │ -.3510312   .2650894   .012581  -.3514681  -.8570201   .1873868
─────────────┼────────────────────────────────────────────────────────────────
         var │  .4498116    .030934   .001767   .4483777   .3933691   .5150745
─────────────┴────────────────────────────────────────────────────────────────
Note: Adaptation tolerance is not met.

likelihood() + prior(): informative先验

. bayesmh lwage educ age exper, likelihood(normal({var})) prior({lwage:}, normal(0,10000)) prior({var},igamma(0.01, 0.01))
  
Burn-in ...
Simulation ...

Model summary
──────────────────────────────────────────────────────────────────────────────
Likelihood: 
  lwage ~ normal(xb_lwage,{var})

Priors: 
  {lwage:educ age exper _cons} ~ normal(0,10000)                           (1)
                         {var} ~ igamma(0.01,0.01)
──────────────────────────────────────────────────────────────────────────────
(1) Parameters are elements of the linear form xb_lwage.

Bayesian normal regression                       MCMC iterations  =     12,500
Random-walk Metropolis–Hastings sampling         Burn-in          =      2,500
                                                 MCMC sample size =     10,000
                                                 Number of obs    =        428
                                                 Acceptance rate  =       .253
                                                 Efficiency:  min =     .03411
                                                              avg =     .04046
Log marginal-likelihood = -477.13055                          max =     .05554
 
─────────────┬────────────────────────────────────────────────────────────────
             │                                                Equal-tailed
             │      Mean   Std. dev.     MCSE     Median  [95% cred. interval]
─────────────┼────────────────────────────────────────────────────────────────
lwage        │
        educ │  .1086604   .0143401   .000608   .1092476   .0799753   .1362948
         age │ -.0018785   .0048822   .000256  -.0019494  -.0110175   .0084529
       exper │  .0169774   .0044795   .000243   .0168361    .008705   .0264318
       _cons │ -.3258351   .2698549   .013326  -.3212487   -.857303   .1921515
─────────────┼────────────────────────────────────────────────────────────────
         var │   .450493   .0319438   .001698   .4494177   .3939383   .5191762
─────────────┴────────────────────────────────────────────────────────────────

likelihood() + prior()

. bayesmh inlf age educ kids*, likelihood(logit) prior({inlf:}, flat)
  
Burn-in ...
Simulation ...

Model summary
──────────────────────────────────────────────────────────────────────────────
Likelihood: 
  inlf ~ logit(xb_inlf)

Prior: 
  {inlf:age educ kidslt6 kidsge6 _cons} ~ 1 (flat)                         (1)
──────────────────────────────────────────────────────────────────────────────
(1) Parameters are elements of the linear form xb_inlf.

Bayesian logistic regression                     MCMC iterations  =     12,500
Random-walk Metropolis–Hastings sampling         Burn-in          =      2,500
                                                 MCMC sample size =     10,000
                                                 Number of obs    =        753
                                                 Acceptance rate  =      .1607
                                                 Efficiency:  min =     .02098
                                                              avg =     .04698
Log marginal-likelihood = -475.37675                          max =     .05996
 
─────────────┬────────────────────────────────────────────────────────────────
             │                                                Equal-tailed
        inlf │      Mean   Std. dev.     MCSE     Median  [95% cred. interval]
─────────────┼────────────────────────────────────────────────────────────────
         age │ -.0647113   .0128096   .000581  -.0650733   -.089756  -.0393152
        educ │  .2020967   .0383368   .001757   .2038444   .1280694   .2722752
     kidslt6 │ -1.496497   .1902067   .013131  -1.483813  -1.880425  -1.124643
     kidsge6 │ -.0939495   .0671387   .002742  -.0927716  -.2155951   .0316578
       _cons │  1.041166   .7975184   .033216   1.062803  -.5794387   2.618541
─────────────┴────────────────────────────────────────────────────────────────

likelihood() + prior(): 分别设定参数的先验

. bayesmh lwage educ age exper, likelihood(normal({var})) prior({lwage:educ}, normal(0.1, 0.4)) prior({lwage:age}, normal(-0.
> 01, 1)) prior({lwage:exper}, normal(0.1, 1)) prior({lwage:_cons}, normal(2, 10)) prior({var},igamma(0.01, 0.01))
  
Burn-in ...
Simulation ...

Model summary
──────────────────────────────────────────────────────────────────────────────
Likelihood: 
  lwage ~ normal(xb_lwage,{var})

Priors: 
   {lwage:educ} ~ normal(0.1,0.4)                                          (1)
    {lwage:age} ~ normal(-0.01,1)                                          (1)
  {lwage:exper} ~ normal(0.1,1)                                            (1)
  {lwage:_cons} ~ normal(2,10)                                             (1)
          {var} ~ igamma(0.01,0.01)
──────────────────────────────────────────────────────────────────────────────
(1) Parameters are elements of the linear form xb_lwage.

Bayesian normal regression                       MCMC iterations  =     12,500
Random-walk Metropolis–Hastings sampling         Burn-in          =      2,500
                                                 MCMC sample size =     10,000
                                                 Number of obs    =        428
                                                 Acceptance rate  =      .2158
                                                 Efficiency:  min =    .001864
                                                              avg =     .02366
Log marginal-likelihood =   -459.584                          max =     .08343
 
─────────────┬────────────────────────────────────────────────────────────────
             │                                                Equal-tailed
             │      Mean   Std. dev.     MCSE     Median  [95% cred. interval]
─────────────┼────────────────────────────────────────────────────────────────
lwage        │
        educ │  .1080999   .0163228   .003015   .1085795    .076095   .1408135
         age │ -.0016291   .0051764   .000886  -.0016677  -.0122493   .0081089
       exper │  .0164365    .004882   .000169   .0164146   .0069865   .0262597
       _cons │ -.3244789   .3187475   .073838  -.3318201   -.977361   .3921511
─────────────┼────────────────────────────────────────────────────────────────
         var │  .4539588    .030861    .00189   .4522007   .3942803   .5171512
─────────────┴────────────────────────────────────────────────────────────────
Note: There is a high autocorrelation after 500 lags.

likelihood() + prior(): 多个参数的联合先验分布

. qui reg lwage educ age exper

. matrix vec=e(b)

. matrix cov=e(V)

. bayesmh lwage educ age exper, likelihood(normal({var})) prior({lwage:}, mvnormal(4,vec,cov)) prior({var},igamma(0.01, 0.01)
> )
  
Burn-in ...
Simulation ...

Model summary
──────────────────────────────────────────────────────────────────────────────
Likelihood: 
  lwage ~ normal(xb_lwage,{var})

Priors: 
  {lwage:educ age exper _cons} ~ mvnormal(4,vec,cov)                       (1)
                         {var} ~ igamma(0.01,0.01)
──────────────────────────────────────────────────────────────────────────────
(1) Parameters are elements of the linear form xb_lwage.

Bayesian normal regression                       MCMC iterations  =     12,500
Random-walk Metropolis–Hastings sampling         Burn-in          =      2,500
                                                 MCMC sample size =     10,000
                                                 Number of obs    =        428
                                                 Acceptance rate  =      .1926
                                                 Efficiency:  min =    .001502
                                                              avg =     .02856
Log marginal-likelihood = -442.16765                          max =     .07191
 
─────────────┬────────────────────────────────────────────────────────────────
             │                                                Equal-tailed
             │      Mean   Std. dev.     MCSE     Median  [95% cred. interval]
─────────────┼────────────────────────────────────────────────────────────────
lwage        │
        educ │  .1078397   .0078649   .001125   .1079189   .0934599   .1234291
         age │ -.0019469   .0027718   .000339  -.0018452  -.0075171   .0033644
       exper │  .0165112    .003203   .000119    .016684   .0101089   .0225106
       _cons │ -.3074617   .1048001   .027043  -.3049968  -.5511863  -.1230595
─────────────┼────────────────────────────────────────────────────────────────
         var │  .4480066   .0291848   .001213   .4463937   .3955503   .5106466
─────────────┴────────────────────────────────────────────────────────────────
Note: There is a high autocorrelation after 500 lags.

likelihood() + prior(): Zellner-g prior

prior({eq:}, normal(0,10000))设定各个参数是独立的。可以通过两种方式允许多个参数是相关的:

\[ \beta | \sigma^2 \sim MVN(\mu_0, g\sigma^2 (x'x)^{-1}) \]

其中,\(g\)为g-prior的自由度。

Stata设定:zellnersg(d, g, mu, sigsq)

其中,sigsq往往直接设定为{var},即zellnersg(d, g, mu, {var})

likelihood() + prior(): Zellner-g prior

. bayesmh lwage educ age exper, likelihood(normal({var})) prior({lwage:}, zellnersg(4,30,vec,{var})) prior({var},igamma(0.01,
>  0.01))
  
Burn-in ...
Simulation ...

Model summary
──────────────────────────────────────────────────────────────────────────────
Likelihood: 
  lwage ~ normal(xb_lwage,{var})

Priors: 
  {lwage:educ age exper _cons} ~ zellnersg(4,30,vec,{var})                 (1)
                         {var} ~ igamma(0.01,0.01)
──────────────────────────────────────────────────────────────────────────────
(1) Parameters are elements of the linear form xb_lwage.

Bayesian normal regression                       MCMC iterations  =     12,500
Random-walk Metropolis–Hastings sampling         Burn-in          =      2,500
                                                 MCMC sample size =     10,000
                                                 Number of obs    =        428
                                                 Acceptance rate  =      .1603
                                                 Efficiency:  min =     .00119
                                                              avg =     .01802
Log marginal-likelihood = -446.16778                          max =     .04551
 
─────────────┬────────────────────────────────────────────────────────────────
             │                                                Equal-tailed
             │      Mean   Std. dev.     MCSE     Median  [95% cred. interval]
─────────────┼────────────────────────────────────────────────────────────────
lwage        │
        educ │  .1012417   .0115183   .001895   .1013191   .0791962   .1233749
         age │ -.0037958   .0041686   .000711  -.0037038  -.0119122   .0040406
       exper │    .01706   .0043192   .000227    .017137   .0083099    .025188
       _cons │ -.1531522   .1799528   .052174  -.1626343    -.46208   .1529861
─────────────┼────────────────────────────────────────────────────────────────
         var │  .4450226    .032087   .001504   .4429504    .387583   .5111025
─────────────┴────────────────────────────────────────────────────────────────
Note: There is a high autocorrelation after 500 lags.

likelihood() + prior(): user-defined prior

Gelman et al. (2014): Logit回归模型中,系数的先验分布为Cauchy分布,位置参数为\(a = 0\), 刻度参数为\(b = 2.5\)

bayesmh y x, likelihood(logit) prior({y:x}, logdensity(ln(2.5)-ln(2.5^2+{y:x}^2)-ln(_pi))) prior({y:_cons}, logdensity(ln(10)-ln(10^2+{y:_cons}^2)-ln(_pi)))

内容

引言

马尔科夫链蒙特卡洛模拟(MCMC)

计量模型的贝叶斯估计与检验

特定模型的似然函数或后验分布

贝叶斯预测

非线性模型

非线性模型需要明确指定参数和变量构成的表达式。

参数以大括号括起来,比如{b0}, {b1}等。

例:bayesmh y={b0} + {b1}*x1^{p} + {b2}*x2, 表示\(y=b_0+b_1 x_1^{p} + b_2 x_2\)

非线性单方程模型:eqname : dep = subexp

例: bayesmh mpg = {b0}+{b1}*price^{b2}, likelihood(…) prior(…)

非线性多方程模型:(eqname : dep = subexp) (…..)

非线性模型

估计CES生产函数;预测当capital=1.5, labor=2时的产出。

. use production, clear

. bayesmh lnoutput = ({b0}-1/{rho=1}*ln({delta=0.5}*capital^(-1*{rho})+(1-{delta})*labor^(-1*{rho}))), likelihood(normal({sig
> 2})) prior({sig2},  igamma(0.01, 0.01))  prior({b0}, normal(0, 100)) prior({delta}, uniform(0, 1))  prior({rho}, gamma(1, 1
> ))  block({b0 delta rho sig2}, split)  rseed(16) saving(ces_mcmc, replace)
  
Burn-in ...
Simulation ...

Model summary
──────────────────────────────────────────────────────────────────────────────
Likelihood: 
  lnoutput ~ normal({b0}-1/{rho=1}*ln({delta=0.5}*capital^(-1*{rho})+(1-{delta })*labor^(-1*{rho})),{sig2})

Priors: 
   {sig2} ~ igamma(0.01,0.01)
     {b0} ~ normal(0,100)
  {delta} ~ uniform(0,1)
    {rho} ~ gamma(1,1)
──────────────────────────────────────────────────────────────────────────────

Bayesian normal regression                       MCMC iterations  =     12,500
Random-walk Metropolis–Hastings sampling         Burn-in          =      2,500
                                                 MCMC sample size =     10,000
                                                 Number of obs    =        100
                                                 Acceptance rate  =      .4385
                                                 Efficiency:  min =     .02283
                                                              avg =      .1225
Log marginal-likelihood = -93.731032                          max =      .2461
 
─────────────┬────────────────────────────────────────────────────────────────
             │                                                Equal-tailed
             │      Mean   Std. dev.     MCSE     Median  [95% cred. interval]
─────────────┼────────────────────────────────────────────────────────────────
          b0 │  3.829453   .1047645     .0059   3.829384   3.633927   4.027828
       delta │  .4830306   .0636595   .001283   .4820599   .3540333   .6146103
         rho │  1.838675   .8665057   .057343   1.624436   .8088482   4.053971
        sig2 │  .3098183    .043935   .001009   .3062381    .233141   .4054633
─────────────┴────────────────────────────────────────────────────────────────

file ces_mcmc.dta saved.

多方程模型

线性多方程模型:(eqname : dep indepvars [, noconstant]) (……)

例:

bayesmh (eq:mpg price weight) (eq2:price weight foreign), likelihood(…) prior(…)

表示模型:

  • mpg = {eq:price}*price + {eq:weight}*weight + {eq:_cons}

  • price = {eq2:weight}*weight + {eq2:foreign}*foreign + {eq2:_cons}

先验分布中对参数的指定

先验分布设定:prior( parameter, dist )

基本规则:参数以大括号括起来。比如,

evaluator

如果似然函数或先验分布比较复杂,不能由内置的分布函数或表达式来表示,可以写似然函数或后验分布程序来实现。

evaluator()设置对数后验分布,用llevaluator()设置对数似然函数。

设置自己的对数后验分布:

bayesmh depvar indepvars [if] [in] , evaluator(program, spec)

设置自己的对数似然函数:

bayesmh depvar indepvars [if] [in] , llevaluator(program, spec)

evaluator的基本格式

基本架构:

program progname
args lnden xb1 [xb2 ...] [ modelparams]
  ... computations ...
  scalar `lnden' = ...
end
  1. args跟着对数似然函数(lnden)、线性表达式(xb1, xb2, …)、模型其它参数。

  2. bayesmh设定的模型必须要与evaluator程序的args保持一致。比如,args lnden xb1 var,对应的bayesmh指令:

. bayesmh y x1 x2, evaluator(normaljeffreys, parameters({var}))

evaluator的基本格式

例:args lnden xb1 xb2 tau var1 var2,对应的bayesmh指令:

. bayesmh (y1 x1 x2) (y2 x1 z2), evaluator(normaljeffreys, parameters({tau} {var1} {var2}))

其中,(y1 x1 x2) (y2 x1 z2): 模型包含两个方程,因变量分别为y1、y2,自变量分别为(x1, x2)和(x1, z2)。

evaluator定义的global macro

model global macro note
单方程 $MH_y 因变量
$MH_x1, $MH_x2, … 第1个方程的第1、2、…个自变量
$MH_xn 第1个方程的自变量的个数
多方程 $MH_y1, $MH_y2, $MH_y3, … 第1、2、3、…个方程的因变量
$MH_y1x1, $MH_y1x2, … 第1个方程的第1、2、…个自变量
$MH_y2x1, $MH_y2x2, … 第2个方程的第1、2、…个自变量
$MH_y1xn, $MH_y2xn, … 第1、2、…个方程的自变量的个数
通用 $MH_touse 样本范围
$MH_n 样本量
$MH_b 系数向量
$MH_bn 参数个数
$MH_m1, $MH_m2, … 第1、2、…个矩阵参数
$MH_mn 矩阵参数的个数
$MH_extravars 其它变量

回归系数的先验为flat(密度为1,对数为0,因此在后验分布中不用考虑),方差的先验分布设定为Jeffreys先验。

. type normaljeffreys.ado
program normaljeffreys
        version 16.1
        args lnp xb var
        /* compute log likelihood */
        tempname sd
        scalar `sd' = sqrt(`var')
        tempvar lnfj
        quietly generate double `lnfj' = lnnormalden($MH_y,`xb',`sd') if $MH_touse
        quietly summarize `lnfj', meanonly
        if r(N) < $MH_n {
                scalar `lnp' = .
                exit
        }
        tempname lnf
        scalar `lnf' = r(sum)
        /* compute log prior */
        tempname lnprior
        /* case 1: variance: Jeffereys prior, coefficient: flat prior. */
        scalar `lnprior' = -2*ln(`sd')
                
        /* compute log posterior */
        scalar `lnp' = `lnf' + `lnprior'
end

由于已经在evaluator设置了对数后验分布,bayesmh指令中不能再设置prior()或likelihood()。

. use mroz, clear

. bayesmh lwage educ age, evaluator(normaljeffreys, parameters({var}))
  
Burn-in ...
note: invalid initial state.
Simulation ...

Model summary
──────────────────────────────────────────────────────────────────────────────
Posterior: 
  lwage ~ normaljeffreys(xb_lwage,{var})
──────────────────────────────────────────────────────────────────────────────

Bayesian regression                              MCMC iterations  =     12,500
Random-walk Metropolis–Hastings sampling         Burn-in          =      2,500
                                                 MCMC sample size =     10,000
                                                 Number of obs    =        428
                                                 Acceptance rate  =      .2765
                                                 Efficiency:  min =     .06077
                                                              avg =     .07804
Log marginal-likelihood = -452.12848                          max =     .09761
 
─────────────┬────────────────────────────────────────────────────────────────
             │                                                Equal-tailed
             │      Mean   Std. dev.     MCSE     Median  [95% cred. interval]
─────────────┼────────────────────────────────────────────────────────────────
lwage        │
        educ │  .1099913    .013842   .000443   .1105567   .0814029   .1356931
         age │  .0067848   .0043219   .000175   .0067199  -.0016034   .0153288
       _cons │ -.4874439   .2635526   .009471  -.4873747  -1.001772   .0526065
─────────────┼────────────────────────────────────────────────────────────────
         var │  .4626251   .0312279    .00113   .4615565   .4051353   .5263745
─────────────┴────────────────────────────────────────────────────────────────

等价的做法:likelihood() + prior()

bayesmh lwage educ age, likelihood(normal({var})) prior({mpg:}, flat) prior({var}, jeffreys) 

例:llevaluator

可以单独写对数似然函数,先验分布采用内置的分布。

例:注意被注释掉的几行语句。

. type normallnf.ado
program normallnf
        version 16.1
        args lnf xb var // lnp --> lnf
        tempname sd
        scalar `sd' = sqrt(`var')
        tempvar lnfj
        quietly generate double `lnfj'=lnnormalden($MH_y,`xb',`sd') if $MH_touse
        quietly summarize `lnfj', meanonly
        if r(N) < $MH_n {
                scalar `lnf' = . // lnp --> lnf
                exit
        }
        // tempname lnf
        scalar `lnf' = r(sum)
        // tempname lnprior
        // scalar `lnprior' = -2*ln(`sd')
        // scalar `lnp' = `lnf' + `lnprior'
end

例: llevaluator() + prior()

. bayesmh lwage educ age, llevaluator(normallnf, parameters({var})) prior({lwage:}, flat) prior({var}, jeffreys) 
  
Burn-in ...
note: invalid initial state.
Simulation ...

Model summary
──────────────────────────────────────────────────────────────────────────────
Likelihood: 
  lwage ~ normallnf(xb_lwage,{var})

Priors: 
  {lwage:educ age _cons} ~ 1 (flat)                                        (1)
                   {var} ~ jeffreys
──────────────────────────────────────────────────────────────────────────────
(1) Parameters are elements of the linear form xb_lwage.

Bayesian regression                              MCMC iterations  =     12,500
Random-walk Metropolis–Hastings sampling         Burn-in          =      2,500
                                                 MCMC sample size =     10,000
                                                 Number of obs    =        428
                                                 Acceptance rate  =      .1388
                                                 Efficiency:  min =    .001144
                                                              avg =    .005087
Log marginal-likelihood = -464.96066                          max =    .009786
 
─────────────┬────────────────────────────────────────────────────────────────
             │                                                Equal-tailed
             │      Mean   Std. dev.     MCSE     Median  [95% cred. interval]
─────────────┼────────────────────────────────────────────────────────────────
lwage        │
        educ │  .1713085    .013087   .002016   .1711292   .1469124   .1960288
         age │  .0262109   .0036357   .000504   .0263546   .0189688   .0330134
       _cons │ -2.103083   .1503325    .04444  -2.137789  -2.353303  -1.825682
─────────────┼────────────────────────────────────────────────────────────────
         var │  .5224401    .038405   .003882   .5194123    .448735    .603441
─────────────┴────────────────────────────────────────────────────────────────
Note: There is a high autocorrelation after 500 lags.

例: logit模型的贝叶斯估计

Logit模型的对数似然函数。

. type logitlnf.ado
program logitlnf
        version 16.1
        args lnf xb
        tempvar lnfj
        quietly generate `lnfj' = ln(invlogit(`xb')) if $MH_y == 1 & $MH_touse
        quietly replace `lnfj' = ln(invlogit(-`xb')) if $MH_y == 0 & $MH_touse
        quietly summarize `lnfj', meanonly
        if r(N) < $MH_n {
                scalar `lnf' = .
                exit
        }
        scalar `lnf' = r(sum)
end

例: logit模型的贝叶斯估计

. bayesmh inlf kids* educ age, llevaluator(logitlnf) prior({inlf:}, flat) 
  
Burn-in ...
Simulation ...

Model summary
──────────────────────────────────────────────────────────────────────────────
Likelihood: 
  inlf ~ logitlnf(xb_inlf)

Prior: 
  {inlf:kidslt6 kidsge6 educ age _cons} ~ 1 (flat)                         (1)
──────────────────────────────────────────────────────────────────────────────
(1) Parameters are elements of the linear form xb_inlf.

Bayesian regression                              MCMC iterations  =     12,500
Random-walk Metropolis–Hastings sampling         Burn-in          =      2,500
                                                 MCMC sample size =     10,000
                                                 Number of obs    =        753
                                                 Acceptance rate  =      .2605
                                                 Efficiency:  min =     .03826
                                                              avg =      .0633
Log marginal-likelihood = -475.38324                          max =      .0815
 
─────────────┬────────────────────────────────────────────────────────────────
             │                                                Equal-tailed
        inlf │      Mean   Std. dev.     MCSE     Median  [95% cred. interval]
─────────────┼────────────────────────────────────────────────────────────────
     kidslt6 │ -1.483831   .1929396   .007068  -1.476455  -1.871804  -1.096678
     kidsge6 │  -.096295   .0680707    .00348  -.0951521  -.2328876   .0339874
        educ │   .201044   .0379359   .001761    .200977    .129362    .275341
         age │ -.0638044   .0124056   .000435  -.0633938    -.08888  -.0399602
       _cons │  1.024522   .7731857   .028078     1.0043  -.4649703    2.55663
─────────────┴────────────────────────────────────────────────────────────────

例: 门限回归的贝叶斯估计

似然函数程序文件: normalthresh.ado

. sysuse taylor, clear

. gen ygap1 = L.ygap
(1 missing value generated)

. bayesmh (eq1:ffr l.ffr l.pi l.ygap) (eq2: ffr l.ffr l.pi l.ygap), ///
>  llevaluator(normalthresh, parameters({tau} {var1} {var2}) extravars(ygap1)) ///
>  prior({eq1:}, normal(0,0.5)) prior({var1}, igamma(2,1))  ///
>  prior({eq2:}, normal(0,0.5)) prior({var2}, igamma(2,1))  ///
>  prior({tau}, uniform(-2,2))  ///
>  initial({eq1:} 0 {eq2:} 0 {tau} -2 {var1} 1 {var2} 1)  ///
>  block({eq1:}) block({eq2:}) block({var1}) block({var2}) blocksummary ///
>  burnin(2000) mcmcsize(2000)
  
Burn-in ...
Simulation ...

Model summary
──────────────────────────────────────────────────────────────────────────────
Likelihood: 
  ffr ffr ~ normalthresh(xb_eq1,xb_eq2,{tau},{var1},{var2})

Priors: 
  {eq1:L.ffr L.pi L.ygap _cons} ~ normal(0,0.5)                            (1)
  {eq2:L.ffr L.pi L.ygap _cons} ~ normal(0,0.5)                            (2)
                          {tau} ~ uniform(-2,2)
                    {var1 var2} ~ igamma(2,1)
──────────────────────────────────────────────────────────────────────────────
(1) Parameters are elements of the linear form xb_eq1.
(2) Parameters are elements of the linear form xb_eq2.

Block summary
──────────────────────────────────────────────────────────────────────────────
   1:  {eq1:L.ffr L.pi L.ygap _cons}
   2:  {eq2:L.ffr L.pi L.ygap _cons}
   3:  {var1}
   4:  {var2}
   5:  {tau}
──────────────────────────────────────────────────────────────────────────────

Bayesian regression                              MCMC iterations  =      4,000
Random-walk Metropolis–Hastings sampling         Burn-in          =      2,000
                                                 MCMC sample size =      2,000
                                                 Number of obs    =        113
                                                 Acceptance rate  =      .3433
                                                 Efficiency:  min =     .03825
                                                              avg =     .08808
Log marginal-likelihood =  -132.2379                          max =       .261
 
─────────────┬────────────────────────────────────────────────────────────────
             │                                                Equal-tailed
             │      Mean   Std. dev.     MCSE     Median  [95% cred. interval]
─────────────┼────────────────────────────────────────────────────────────────
eq1          │
         ffr │
         L1. │  .4263503   .1543339   .013155   .4467987   .1165634   .7892567
             │
          pi │
         L1. │  1.005484   .2931961   .025911   .9806879   .3473836   1.622831
             │
        ygap │
         L1. │  .1055092   .2932414   .032391   .0986647  -.4638702   .6225725
             │
       _cons │ -.3022746   .5587455   .063881  -.2833094  -1.369574   .7045184
─────────────┼────────────────────────────────────────────────────────────────
eq2          │
         ffr │
         L1. │  .9153278   .0236862   .002067   .9171459   .8683087   .9619808
             │
          pi │
         L1. │  .3238712   .0551187   .004669   .3243831   .2182006   .4343945
             │
        ygap │
         L1. │  .0801297   .0641431    .00581   .0773504  -.0436599   .1989345
             │
       _cons │ -.3283278   .1315534   .011183  -.3253656  -.6088118  -.0825758
─────────────┼────────────────────────────────────────────────────────────────
         tau │ -1.144866   .0370001   .002212  -1.157775  -1.182468  -1.030312
        var1 │  2.230633   .7411225   .055102   2.107737   1.227005   4.183789
        var2 │  .2537051   .0384719   .001684   .2511057   .1879426   .3328634
─────────────┴────────────────────────────────────────────────────────────────
Note: Adaptation continues during simulation.

例: 门限回归的贝叶斯估计

. bayesgraph diagnostics {tau}

例: 门限回归的贝叶斯估计

. bayesstats ess

Efficiency summaries    MCMC sample size =     2,000
                        Efficiency:  min =    .03825
                                     avg =    .08808
                                     max =      .261
 
─────────────┬──────────────────────────────────────
             │        ESS   Corr. time    Efficiency
─────────────┼──────────────────────────────────────
eq1          │
         ffr │
         L1. │     137.64        14.53        0.0688
             │
          pi │
         L1. │     128.04        15.62        0.0640
             │
        ygap │
         L1. │      81.96        24.40        0.0410
             │
       _cons │      76.50        26.14        0.0383
─────────────┼──────────────────────────────────────
eq2          │
         ffr │
         L1. │     131.38        15.22        0.0657
             │
          pi │
         L1. │     139.38        14.35        0.0697
             │
        ygap │
         L1. │     121.88        16.41        0.0609
             │
       _cons │     138.39        14.45        0.0692
─────────────┼──────────────────────────────────────
         tau │     279.67         7.15        0.1398
        var1 │     180.90        11.06        0.0905
        var2 │     521.93         3.83        0.2610
─────────────┴──────────────────────────────────────

内容

引言

马尔科夫链蒙特卡洛模拟(MCMC)

计量模型的贝叶斯估计与检验

特定模型的似然函数或后验分布

贝叶斯预测

贝叶斯预测

参数是随机的,预测时从参数的后验分布随机抽样,得到参数的R个随机数,进而得到对每个观测值的预测,即每个观测值有R个预测(模拟)数值。

bayespredict {_ysim#[numlist]}

得到模拟的因变量的名称为:_ysim1_1, _ysim1_2, …,表示对第1个方程中第1、2、…个预测值的预测。

numlist:设定对哪些观测值进行预测。

. use production, clear

. qui bayesmh lnoutput = ({b0}-1/{rho=1}*ln({delta=0.5}*capital^(-1*{rho})+(1-{delta})*labor^(-1*{rho}))), likelihood(normal(
> {sig2})) prior({sig2},  igamma(0.01, 0.01))  prior({b0}, normal(0, 100)) prior({delta}, uniform(0, 1))  prior({rho}, gamma(
> 1, 1))  block({b0 delta rho sig2}, split)  rseed(16) saving(ces_mcmc, replace)

. 
. bayespredict {_ysim}, saving(ces_pred, replace) rseed(16)

Computing predictions ...

file ces_pred.dta saved.
file ces_pred.ster saved.

. describe _ysim1_1 _ysim1_2 _ysim1_3  using ces_pred

Variable      Storage   Display    Value
    name         type    format    label      Variable label
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
_ysim1_1        double  %10.0g                Simulated lnoutput, obs #1
_ysim1_2        double  %10.0g                Simulated lnoutput, obs #2
_ysim1_3        double  %10.0g                Simulated lnoutput, obs #3

贝叶斯预测

模拟的前5个观测值的统计指标。

. bayesstats summary {_ysim[1/5]} using ces_pred

Posterior summary statistics                      MCMC sample size =    10,000
 
─────────────┬────────────────────────────────────────────────────────────────
             │                                                Equal-tailed
             │      Mean   Std. dev.     MCSE     Median  [95% cred. interval]
─────────────┼────────────────────────────────────────────────────────────────
    _ysim1_1 │  3.111305    .564188   .005642   3.114486   2.013625   4.230432
    _ysim1_2 │   4.47848   .5647274   .006264   4.487651   3.362394   5.576457
    _ysim1_3 │  2.033675   .5562368   .005692   2.033444   .9360682   3.115028
    _ysim1_4 │  2.234464   .5692011   .005783   2.234235   1.130026   3.361621
    _ysim1_5 │  2.894212   .5655813   .005948   2.894129   1.789959   4.013839
─────────────┴────────────────────────────────────────────────────────────────

贝叶斯预测

预测值的其它统计函数:

bayesgraph histogram [label:]@*func(arg1* [, arg2)], saving(file)

bayesgraph histogram [label:]@*userprog, arg1* [arg2], saving(file)

其中,func为mata函数(该函数对列向量操作得到标量),比如mean, variance等。

arg1arg2为{_ysim[#]}, {_resid[#]}(残差), 或{_mu[#]}(因变量的期望)。 预测值的函数(比如均值、方差、偏度等):

. bayesgraph histogram (resvar:@variance({_resid})) using ces_pred

贝叶斯预测

可以直接得到R次预测数值的均值(或中位数、标准差)等统计量。

bayespredict newvar, mean median std cri outcome(depvar)

. bayespredict pmean, mean rseed(123)

Computing predictions ...

. twoway (kdensity lnoutput) (kdensity pmean, lp(dash)), legend(off) title(Observed vs replicated data)

贝叶斯预测

贝叶斯预测:对每个观测值,随机生成对应的因变量的S个随机数,计算其均值。

. set obs 101
Number of observations (_N) was 100, now 101.

. qui replace capital = 1.5 in 101

. qui replace labor = 2 in 101

. bayespredict pmean2, mean rseed(123)

Computing predictions ...

. bayespredict cilow ciupp, cri rseed(123)

Computing predictions ...

. list labor capital lnoutput pmean2 cilow ciupp in 96/101

     ┌─────────────────────────────────────────────────────────────────┐
     │    labor    capital   lnoutput     pmean2      cilow      ciupp │
     ├─────────────────────────────────────────────────────────────────┤
 96. │ .4023916   .2614668   2.737468   2.678401   1.561842   3.798991 │
 97. │ 3.801785   .6329353   3.619768   3.782753   2.692533   4.906854 │
 98. │ .3353206   .2557932   2.292713   2.597483   1.527138   3.708973 │
 99. │  31.5595    2.72437   5.026271   5.273558      4.144   6.378542 │
100. │  3.80443    .946865   3.363704   4.154781   3.045156   5.251562 │
     ├─────────────────────────────────────────────────────────────────┤
101. │        2        1.5          .   4.362791   3.242146   5.457026 │
     └─────────────────────────────────────────────────────────────────┘

贝叶斯预测

直接预测用户自定义的其它指标:

bayespredict [label:]@*func(arg1* [, arg2)], saving(file)

bayespredict [label:]@*userprog, arg1* [arg2], saving(file)

. bayespredict (mu:@mean({_mu})), saving(mc2, replace)

Computing predictions ...

file mc2.dta saved.
file mc2.ster saved.

贝叶斯预测

计算统计量的概率值。统计量可以利用mata函数来计算。

. bayesstats ppvalues (mean:@mean({_ysim})) (min:@min({_ysim})) (max:@max({_ysim})) using ces_pred

Posterior predictive summary   MCMC sample size =    10,000

─────────────┬─────────────────────────────────────────────
           T │      Mean   Std. dev.  E(T_obs)  P(T>=T_obs)
─────────────┼─────────────────────────────────────────────
        mean │  3.045143   .0787588   3.044554        .5026
         min │  .5130189   .3401942   1.049675        .0365
         max │   5.84806   .3703789   5.703145         .626
─────────────┴─────────────────────────────────────────────
Note: P(T>=T_obs) close to 0 or 1 indicates lack of fit.

这些统计量的概率值越接近0.5,表明模拟值与实际值越吻合。

贝叶斯预测

可以通过mata函数或者ado程序(或者二者混合使用)定义自己的统计量。

例:偏度和峰度。

. mata
───────────────────────────────────────────────── mata (type end to exit) ───────────────────────────────────────────────────
: real scalar skew(real colvector x) {
skew() already exists
r(3000);

:   return (sqrt(length(x))*sum((x:-mean(x)):^3)/(sum((x:-mean(x)):^2)^1.5))
'return' found where almost anything else expected
r(3000);

: }
expression invalid
r(3000);

: end
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
. capture prog drop kurt

. program kurt
  1.   version 17
  2.   args kurtosis x
  3.   quietly summarize `x', detail
  4.   scalar `kurtosis' = r(kurtosis)
  5. end

贝叶斯预测

. bayespredict (skewness:@skew({_ysim})) (kurtosis:@kurt {_ysim}), saving(ces_teststat, replace) rseed(16)

Computing predictions ...

file ces_teststat.dta saved.
file ces_teststat.ster saved.

贝叶斯预测

. bayesstats summary {skewness} {kurtosis} using ces_teststat

Posterior summary statistics                      MCMC sample size =    10,000
 
─────────────┬────────────────────────────────────────────────────────────────
             │                                                Equal-tailed
             │      Mean   Std. dev.     MCSE     Median  [95% cred. interval]
─────────────┼────────────────────────────────────────────────────────────────
    skewness │  .1352403   .1606844   .001607   .1350271   -.181804   .4555612
    kurtosis │  2.646662   .2969499   .003685   2.613873   2.165103   3.318124
─────────────┴────────────────────────────────────────────────────────────────

. bayesstats ppvalues {skewness} {kurtosis} using ces_teststat

Posterior predictive summary   MCMC sample size =    10,000

─────────────┬─────────────────────────────────────────────
           T │      Mean   Std. dev.  E(T_obs)  P(T>=T_obs)
─────────────┼─────────────────────────────────────────────
    skewness │  .1352403   .1606844   .1562801        .4449
    kurtosis │  2.646662   .2969499   2.252134        .9359
─────────────┴─────────────────────────────────────────────
Note: P(T>=T_obs) close to 0 or 1 indicates lack of fit.

贝叶斯预测

预测时所模拟生成的前3个变量为:

. bayesreps yrep*, nreps(3) rseed(123)

Computing predictions ...

. list lnoutput yrep1 yrep2 yrep3 in 1/5

     ┌───────────────────────────────────────────┐
     │ lnoutput      yrep1      yrep2      yrep3 │
     ├───────────────────────────────────────────┤
  1. │ 2.933451   3.397611   3.147219   4.039656 │
  2. │ 4.613716   4.896418   4.197628   5.014223 │
  3. │ 1.654005   1.758073   1.464527   2.596067 │
  4. │ 2.025361    2.30658    1.71803   2.535869 │
  5. │ 3.165065   2.221415   2.840843   3.330061 │
     └───────────────────────────────────────────┘

贝叶斯预测也可以用来评估模型的拟合优度。利用后验分布随机模拟因变量的取值,并与实际值进行比较。