Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Assistance regarding a maximum likelihood model


From   Cameron McIntosh <cnm100@hotmail.com>
To   STATA LIST <statalist@hsphsun2.harvard.edu>
Subject   RE: st: Assistance regarding a maximum likelihood model
Date   Tue, 13 Sep 2011 23:42:31 -0400

Hi Jeremy,
Thanks for the clarification. I had asked about correlation given the usual issues with multicollinearity in regression, but also because I think non-independence of counts (hours, in this case) tends to make Poisson distributions overdispersed, which would require an alternative approach, for example a negative binomial model (you could do some statistical tests for overdispersion):
Hilbe, J.M. (2011). Negative Binomial Regression (2nd ed.). Cambridge, UK: Cambridge University Press.
And yes, ideally the starting values would be close approximations of the coefficient values, which would depend to some extent on what link function you are using. I assume it must be Poisson and not linear, logit, or probit? Anyway, you could do some univariate Poisson regressions for each of your activity outcomes separately, and then plug those coefficients into the multivariate model as start values. Perhaps even OLS estimates for your multivariate model, based on regressing log(yi + *small constant to eliminate zeroes, if needed*) on xi might serve as reasonable starting values too.  You should be aware that Poisson regression is prone to convergence problems, which may be augmented in a complex model such as yours:
Santos Silva, J.M.C., & Tenrey, S. (January 6, 2011) poisson: Some convergence issues.http://personal.lse.ac.uk/tenreyro/poisson.pdf
Santos Silva, J.M.C.  & Tenrey, S. (2010). On the existence of the maximum likelihood estimates in poisson regression. Economics Letters, 107, 310-312.http://personal.lse.ac.uk/TENREYRO/poissonpml.pdf
So your start values will be important. Hopefully the search procedure is able to optimize your starting point. 
Cam 

----------------------------------------
> From: jkea1937@uni.sydney.edu.au
> To: statalist@hsphsun2.harvard.edu
> Subject: RE: st: Assistance regarding a maximum likelihood model
> Date: Wed, 14 Sep 2011 02:07:52 +0000
>
> Thanks Cam for your response.
>
> From the stata help file:
> - ml check: verifies that the log-likelihood evaluator you have written works. We strongly recommend using this command.
> - ml search: searches for (better) initial values. We recommend using this command.
> - ml maximize: maximizes the likelihood function and reports results.
>
> The activities are not highly correlated, the highest between two (of four) is 25%. There may be some extent of increases across the board but this isn't guaranteed by any stretch.
>
> ml initial is where you can specify each of the starting values, but it asks for a very large number. Should the starting values be what I expect the coefficient to be? I can determine values from expectations but I'm not sure what the inital value should represent.
>
> Thanks again,
> Jeremy
>
>
> ________________________________________
> From: owner-statalist@hsphsun2.harvard.edu [owner-statalist@hsphsun2.harvard.edu] on behalf of Cameron McIntosh [cnm100@hotmail.com]
> Sent: Wednesday, 14 September 2011 11:51 AM
> To: STATA LIST
> Subject: RE: st: Assistance regarding a maximum likelihood model
>
> Hi Jeremy,
> I don't exactly know what "ml check/search/maximize" is supposed to do, but it sounds like you are talking about a Poisson model. Is that what you're estimating below? Are the different activities highly correlated (i.e., those who contribute more hours to one activity tend to contribute more to all of them)? Or is there no such pattern?
> If "initial values" means the same thing as starting values for the estimates, these should ideally be a best guess informed by theory and prior empirical work. But in my experience with different programs, it often doesn't matter and the default values work OK. You might want to try a range of plausible starting values and see if you always get convergence to the same optimum, and then you have some confidence that it's the global max.
> Best,
> Cam
> ----------------------------------------
> > From: jkea1937@uni.sydney.edu.au
> > To: statalist@hsphsun2.harvard.edu
> > Subject: st: Assistance regarding a maximum likelihood model
> > Date: Wed, 14 Sep 2011 01:23:05 +0000
> >
> > I am completing a study on contributions of volunteer hours in a not-for-profit organisation, factoring in a number of motives and also controls (i.e. time constraints). These hours have been disaggregated into different activities, with potentially different motives for the hours in each).
> >
> > I am not a statistics genius but I'm trying to get my head around the model. I'm looking at a maximum likelihood model in order to try to predict where someone would sit (in terms of hours contributed).
> >
> > I'm having some issues with the 'ml search' command. I understand helping the model find initial values would help, but I'm not sure what to look in terms of the initial values (code below for reference).
> >
> > Any assistance is greatly appreciated, and I can try clarify any questions others have.
> >
> > Many thanks.
> > Jeremy
> >
> > Do-file:
> >
> > capture program drop mixing3
> > program mixing3
> > version 9.1
> > args lj xb1 xb2 xb3 lo1 lo2 ls1 ls2 ls3
> > tempvar f1 f2 f3 p p1 p2 p3 s1 s2 s3
> > quietly {
> > gen double `s1'=exp(`ls1')
> > gen double `s2'=exp(`ls2')
> > gen double `s3'=exp(`ls3')
> > gen double `p'= 1 + exp(`lo1') + exp(`lo2')
> > gen double `p1'=1/`p'
> > gen double `p2'=exp(`lo1')/`p'
> > gen double `p3'=exp(`lo2')/`p'
> > gen double `f1' = normden($ML_y1, `xb1', `s1')
> > gen double `f2' = normden($ML_y1, `xb2', `s2')
> > gen double `f3' = normden($ML_y1, `xb3', `s3')
> > replace `lj'=ln(`p1'*`f1' + `p2'*`f2' + `p3'*`f3')
> > }
> > end
> > ml model lf mixing3 /*
> > */(xb1: vol_hrs= x1 x2 x3 x4 x5 x6 ) /*
> > */(xb2: vol_hrs= x1 x2 x3 x4 x5 x6 ) /*
> > */(xb3: vol_hrs= x1 x2 x3 x4 x5 x6 ) /*
> > */(lo1: motive1 motive2 motive3 motive4)/*
> > */(lo2: motive1 motive2 motive3 motive4)/*
> > */(lsd1: ) /*
> > */(lsd2: ) /*
> > */(lsd3: )
> > ml check
> > ml search
> > ml maximize
> >
> > *
> > * For searches and help try:
> > * http://www.stata.com/help.cgi?search
> > * http://www.stata.com/support/statalist/faq
> > * http://www.ats.ucla.edu/stat/stata/
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
 		 	   		  
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index