[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: discriminatory power of a probit model

From   Maarten buis <>
Subject   Re: st: discriminatory power of a probit model
Date   Sun, 2 Mar 2008 19:36:43 +0000 (GMT)

--- Martin Weiss <> wrote:
> *********
> sysuse auto, clear
> probit foreign weight, nolog
> estat clas
> probit foreign weight mpg, nolog
> estat clas
> *********
> can I prefer one specification of covariates in a probit model
> over the other on the basis of the correctly classified cases as
> provided at the bottom of the classification table? If so, is there
> a confidence interval that would let me judge whether the difference 
> between two models is significant? 

There is another option: model 2 is model 1 when the coefficient of mpg
is equal to 0. This is an assumption you can test using the wald test
(the test that is immediately displayed in the output of -probit-), or
if you have multiple variables, the likelihood ratio test (-lrtest-). 

The problem with the proportion correctly classified is that it depends
on the distribution of your dependent variable: if success is rare and
everybody is classified as a failure than the proportion correclty
specified is still large. In that case, adding an explanatory variable
isn't going to do much. This characteristic of the proportion correctly
specified is illustrated in the example below. The effect of x is the
same in each probit, all that is different is the constant, that is,
the proportion of successes. This dramatically influences how much
adding x to the model increasses the proportion correctly specified,
even though the effect of x is the same in all models.

*------------ begin example ---------------------
set more off
capture program drop sim
program define sim, rclass
	drop _all
	set obs 500
	gen x = invnorm(uniform())
	gen byte y1 = uniform() < normal(x) 
	probit y1
	estat class
	local p1 = r(P_corr)
	probit y1 x
	estat class
	return scalar diff1 = r(P_corr) - `p1'

	gen byte y2 = uniform() < normal(x-1) 
	probit y2 
	estat class
	local p2 = r(P_corr)
	probit y2 x
	estat class
	return scalar diff2 = r(P_corr) - `p2'

	gen byte y3 = uniform() < normal(x-2) 
	probit y3 
	estat class
	local p3 = r(P_corr)
	probit y3 x
	estat class
	return scalar diff3 = r(P_corr) - `p3'
simulate diff1=r(diff1)  ///
         diff2=r(diff2)  ///
         diff3=r(diff3), ///
         reps(100):  sim
*---------------- end example ----------------------
(For more on how to use examples I sent to the Statalist, see )

In general when it comes to selecting a model I would not rely on a
single statistic. Some quotes along this line can be found here:
The book "Regression Models for Categorical Dependent Variables Using
Stata" by J. Scott Long and Jeremy Freese contains a good
discussion of all the things you should take into account when
selecting a model.

Hope this helps,

Maarten L. Buis
Department of Social Research Methodology
Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands

visiting address:
Buitenveldertselaan 3 (Metropolitan), room Z434

+31 20 5986715

Rise to the challenge for Sport Relief with Yahoo! For Good
*   For searches and help try:

© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index