Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Matthew Baker <matthew.baker@hunter.cuny.edu> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
st: High-dimensional product/sum combinations in likelihood calculations |

Date |
Thu, 27 Oct 2011 11:45:57 -0400 |

Dear Listers -- I'm curious if anyone has insight into the following problem in which in the course of a likelihood computation. The problem is one caused by a requirement that probabilities be computed and then averaged before being inserted into a likelihood function. To take an example, consider grouped data in which there are some amount of groups, and the mean in each group is chosen to be either m1 or m2 with some probability p. The group log-likelihood will then be something like L[g]=ln( p*prob(x|m1)+(1-p)*prob(x|m2)). The problem is that the probabilities inside the log can be extremely small in problems where the group sizes are large. To take an example: /* Begin Example */ clear all set seed 8675309 /* Begin by generating fictional data with 20 groups and 10000 total observations */ /* with probability .4, the group mean is 1, while with prob .6, the group mean is 1.5 */ mata groups=20 obs=10000 p=.4 m1=1 m2=1.5 G=1:+sort(round(runiform(obs,1)*(groups-1)),1) id=panelsetup(G,1) Data=J(obs,1,1) for (i=1;i<=rows(id);i++) { D=panelsubmatrix(G,i,id) if (runiform(1,1)>p) Data[id[i,1]::id[i,2],.]=rnormal(rows(D),1,m1,1) else Data[id[i,1]::id[i,2],.]=rnormal(rows(D),1,m2,1) } /* Now, write a function that evaluates the log-likelihood of the data */ real colvector obj_fun_2(b,x,G) { real scalar i,p,m1,m2 real matrix id,D1,D2 p=b[1] m1=b[2] m2=b[3] dens1=normalden(x:-m1) dens2=normalden(x:-m2) id=panelsetup(G,1) LL=J(rows(id),1,.) for (i=1;i<=rows(id);i++) { D1=panelsubmatrix(dens1,i,id) D2=panelsubmatrix(dens2,i,id) P1=exp(colsum(ln(D1))) P2=exp(colsum(ln(D2))) LL[i]=p*P1+(1-p)*P2 } return(ln(LL)) } /* All is not well, as LL has a series of blank entries */ obj_fun_2((p,m1,m2),Data,G) end /* End example */ The final command issued in Mata above returns the result, which I am fairly certain is occurring due to small numbers: +----------------+ 1 -406.0532458 2 . 3 . 4 . 5 . 6 . 7 . 8 . 9 . 10 . 11 -668.8969238 12 . 13 . 14 . 15 . 16 . 17 . 18 . 19 . 20 -372.8149279 +----------------+ Does anyone have any ideas about dealing with these sorts of problems? Thanks in advance for the help! Matt Baker -- Dr. Matthew J. Baker Department of Economics Hunter College and the Graduate Center, CUNY * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**Re: st: NLSUR Quaids** - Next by Date:
**st: ztnb model** - Previous by thread:
**st: NLSUR Quaids** - Next by thread:
**st: ztnb model** - Index(es):