# Re: st: MLE - numerical derivatives problem

 From Jason DeBacker <[email protected]> To [email protected] Subject Re: st: MLE - numerical derivatives problem Date Tue, 4 Apr 2006 16:27:21 -0500

I only am interested in 200 parameters of the over 2500 I estimate- it is the case that I need to estimate them all simultaneously in order to identify those 200.

The nonlinearity is not great. Here is my code for the model:

mleval `atc' = `b', eq(1)
mleval `btc1' = `b', eq(2)
mleval `btc2' = `b', eq(3)
...
mleval `btc98' = `b', eq(99)
mleval `btc99' = `b', eq(100)
mleval `x' = `b', eq(101)
mleval `sigma' = `b', eq(102)
quietly {
gen double `res' = \$ML_y1 - `atc' - `btc1'*`x' - `btc2'*`x' - `btc3'*`x' -`btc4'*`x' - `btc5'*`x' - `btc6'*`x' - `btc7'*`x' - `btc8'*`x' - `btc9'*`x' - `btc10'*`x' - `btc11'*`x' - `btc12'*`x' - `btc13'*`x' - `btc14'*`x' - `btc15'*`x' - `btc16'*`x' - `btc17'*`x' - `btc18'*`x' - `btc19'*`x' - `btc20'*`x' - `btc21'*`x' - `btc22'*`x' - `btc23'*`x' - `btc24'*`x' - `btc25'*`x' - `btc26'*`x' - `btc27'*`x' - `btc28'*`x' - `btc29'*`x' - `btc30'*`x' - `btc31'*`x' - `btc32'*`x' - `btc33'*`x' - `btc34'*`x' - `btc35'*`x' - `btc36'*`x' - `btc37'*`x' - `btc38'*`x' - `btc39'*`x' - `btc40'*`x' - `btc41'*`x' - `btc42'*`x' - `btc43'*`x' - `btc44'*`x' - `btc45'*`x' - `btc46'*`x' - `btc47'*`x' - `btc48'*`x' - `btc49'*`x' - `btc50'*`x' - `btc51'*`x' - `btc52'*`x' - `btc53'*`x' - `btc54'*`x' - `btc55'*`x' - `btc56'*`x' - `btc57'*`x' - `btc58'*`x' - `btc59'*`x' - `btc60'*`x' - `btc61'*`x' - `btc62'*`x' - `btc63'*`x' - `btc64'*`x' - `btc65'*`x' - `btc66'*`x' - `btc67'*`x' - `btc68'*`x' - `btc69'*`x' - `btc70'*`x' - `btc71'*`x' - `btc72'*`x' - `btc73'*`x' - `btc74'*`x' - `btc75'*`x' - `btc76'*`x' - `btc77'*`x' - `btc78'*`x' - `btc79'*`x' - `btc80'*`x' - `btc81'*`x' - `btc82'*`x' - `btc83'*`x' - `btc84'*`x' - `btc85'*`x' - `btc86'*`x' - `btc87'*`x' - `btc88'*`x' - `btc89'*`x' - `btc90'*`x' - `btc91'*`x' - `btc92'*`x' - `btc93'*`x' - `btc94'*`x' - `btc95'*`x' - `btc96'*`x' - `btc97'*`x' - `btc98'*`x' - `btc99'*`x'
by i: gen `T' = cond(_n==_N,_N,.)
mlsum `lnf' = ln(norm(`res'/`sigma'))+ln(1/`sigma') if `T' ~= .
}

And for the ml command:

ml model d0 mladj (nominal = ch1-ch99, nocons) (nominal = ch1, nocons) (nominal = ch2, nocons) ... (nominal = ch98, nocons) (nominal = ch99, nocons) (nominal = name1-name2494, nocons) /sigma ;

nominal is a continuous variable, ch1-ch99 are dummy variables, name1-name2494 are dummy variables.

I'm not 100% sure I've written the code right, but I've had success estimating a similar model. I think the approach is feasible as I am just trying to replicate the results of a paper.

Sincerely,
Jason DeBacker

On Apr 4, 2006, at 2:01 PM, Stas Kolenikov wrote:

Sounds like a not-so-practical model. How much time is it going to
take you to look at all 2500 coefficients???

Also, what kind of nonlinearity do you have? Generally diagnosing
nonlinearity requires a lot of data; if you want to deal with 2500
variables, then you would probably need 100 thousands observations to
identify both linear and nonlinear effects.

You do remember that introducing dummy variables only leads to the
fixed effect estimation in the linear models, as the underlying
statistical principle is that of sufficient statistics: when one
exists, fixed effect estimation is feasible, otherwise it is not; thus
the class of models is restricted to the linear, logit and Poisson
models.

Finally, your model may not be (empirically) identified in the region
near the maximum; in the simplest textbook case, you may have all the
dummies along with the constant, and then some random values -ml
check- picks may indeed give different likelihoods, but not near the
top. In more complex cases, you may have your likelihood collapse for
certain combinations of parameters: suppose you have something like
b1*b2*b3 in your likelihood, and it is estimated to be zero. If the
true value of b1 is zero, then neither b2 nor b3 are identified. This
does happen in nonlinear models sometimes, and it can be cured by
reparameterization. You need to do your paper-and-pencil work on that
though.

On 4/4/06, Jason DeBacker <[email protected]> wrote:
Hi Statalisters,

I'm having trouble with the estimation of a non-linear likelihood
function on panel data. My program passes all the tests in ml check
and goes through ml seach, but in ml max I get the following error:
"could not calculate numerical derivatives
flat or discontinuous region encountered"

The model I'm trying to estimate is in Groseclose, Levitt, and Snyder's
1999 article in the APSR. I think the problem might be that since
there are a lot of dummy variables in the model and each one seldom
takes on a value other than zero (at most a particular dummy variable =
1 in only 1% of the obs), there are problems calculating the numerical
derivatives. I've been able to estimate a simplified version of the
model when I have dummy variables that takes on a value of one 50% of
the time.

Does it sound like I have a correct diagnosis of the problem?
Suggestions to fix this?

I'd like to run this in Stata so I could get standard errors more
easily, but I have started writing a Matlab program to do the
estimation. Additionally, Matlab is a bit cumbersome with the number
of dummy variables I have (over 2500). Will Matlab be able to estimate
the model faster than Stata?

Thanks very much for any help.

Sincerely,
Jason

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/

--
Stas Kolenikov
http://stas.kolenikov.name

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/