Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: High-dimensional product/sum combinations in likelihood calculations


From   Matthew Baker <matthew.baker@hunter.cuny.edu>
To   statalist@hsphsun2.harvard.edu
Subject   st: High-dimensional product/sum combinations in likelihood calculations
Date   Thu, 27 Oct 2011 11:45:57 -0400

Dear Listers --

I'm curious if anyone has insight into the following problem in which
in the course of a likelihood computation. The problem is one caused
by a requirement that probabilities be computed and then averaged
before being inserted into a likelihood function. To take an example,
consider grouped data in which there are some amount of groups, and
the mean in each group is chosen to be either m1 or m2 with some
probability p. The group log-likelihood will then be something like
L[g]=ln( p*prob(x|m1)+(1-p)*prob(x|m2)). The problem is that the
probabilities inside the log can be extremely small in problems where
the group sizes are large. To take an example:

/* Begin Example */

clear all
set seed 8675309

/* Begin by generating fictional data with 20 groups and 10000 total
observations    */
/* with probability .4, the group mean is 1, while with prob .6, the
group mean is 1.5 */

mata

	groups=20
	obs=10000
	
	p=.4
	m1=1
	m2=1.5
	
	G=1:+sort(round(runiform(obs,1)*(groups-1)),1)
	id=panelsetup(G,1)
	Data=J(obs,1,1)
	for (i=1;i<=rows(id);i++) {
		D=panelsubmatrix(G,i,id)
		if (runiform(1,1)>p) Data[id[i,1]::id[i,2],.]=rnormal(rows(D),1,m1,1)
		else Data[id[i,1]::id[i,2],.]=rnormal(rows(D),1,m2,1)
							  }

/* Now, write a function that evaluates the log-likelihood of the data */

real colvector obj_fun_2(b,x,G)
{
	real scalar i,p,m1,m2
	real matrix id,D1,D2
	p=b[1]
	m1=b[2]
	m2=b[3]
	dens1=normalden(x:-m1)
	dens2=normalden(x:-m2)
	id=panelsetup(G,1)
	LL=J(rows(id),1,.)
	for (i=1;i<=rows(id);i++) {
		D1=panelsubmatrix(dens1,i,id)
		D2=panelsubmatrix(dens2,i,id)
		P1=exp(colsum(ln(D1)))
		P2=exp(colsum(ln(D2)))
		LL[i]=p*P1+(1-p)*P2
							  }
	return(ln(LL))
}

/* All is not well, as LL has a series of blank entries */

obj_fun_2((p,m1,m2),Data,G)

end

/* End example */

The final command issued in Mata above returns the result, which I am
fairly certain is occurring due to small numbers:

	+----------------+
1	-406.0532458
2	.
3	.
4	.
5	.
6	.
7	.
8	.
9	.
10	.
11	-668.8969238
12	.
13	.
14	.
15	.
16	.
17	.
18	.
19	.
20	-372.8149279
	+----------------+

Does anyone have any ideas about dealing with these sorts of problems?

Thanks in advance for the help!

Matt Baker

-- 
Dr. Matthew J. Baker
Department of Economics
Hunter College and the Graduate Center, CUNY
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index