Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: nlogitrum and nlogit: unbalanced data


From   John Fulton <[email protected]>
To   [email protected]
Subject   Re: st: nlogitrum and nlogit: unbalanced data
Date   Tue, 09 Jan 2007 10:07:15 -0500

Anders Alexandersson requested a .do file and an example, as well as suggested I try nlogitdn.
(I'm resending this because it never arrived on my statalist, so I assume something prevented it from arriving in anyone else's. Apologies for any double-posting.)

Thank you very much Anders.
I apologize for the vagueness of my code example - I didn't even point out which nest was degenerate.

Here is the set of data manipulation commands on my actual data.
One thing I should mention is that into the degenerate branch I group a bunch of different covariate values: I am studying migrants, and people who do not change residence over a time period are lumped together in the degenerate branch. People who do change residence are modeled as choosing one of 48 U.S. state destinations. The states are nested for unobservable correlates of choice.

One effect of this is that the degenerate nest contains lots of different values on the covariates, as opposed to the classic examples where the degenerate branch is, e.g., "train." I don't think this is a problem, because degenerate "train," e.g., also is modeled with different choice- and chooser-specific characteristics; so my approach is more a question of degree than of kind.

Another effect is obviously that the choice sets vary between choosers. Each person has C-1 non-degenerate choices, but any two individuals in the population can share as few as C-1-1 choices in the nondegenerate nests, and over the total population there are no mutually completely shared choice sets.

That said, here's the code. I've put pseudo-code in <> for the sake of saving space and promoting clarity:

*#delimit ;
nlogitgen top = bottom(nonmovers3:99, movers:<a lot of values>);
/* Note there are 49 values. "Nonmovers" contains 1 value (99), and
"movers" contains
the other 48. */;
nlogitgen middle = bottom(nonmovers2:99,
group2_1:<31 or 32 unique values>,
group2_2:<17 or 16 unique values>)
/* Note the group2_1 and group2_2 values are mutually exclusive and
exhaustive subsets
of all the "movers" values listed in the "top" nest.
The number of values summed across group2_1 and group2_2 is 48,
with one choice from
either nest categorized in the degenerate nest for each case.
*/;
gen var2=(middle==2)*black1;
gen var3=(top==2)*cpi1965;
nlogit chosen (bottom=var1) (middle=var2) (top=var3) [fw=count],
group(group_id);*

* nlogitrum chosen var1 var2 var3 [fw=count], group(group_id) nests(bottom middle top);*

nlogit appears to run the job fine. At least, it gives no errors; whether the results are useful is another matter I'm trying to determine.
nlogitrum gives an "unbalanced data" error.

I tried to recreate the problem using a degenerate nest in the restaurant data:

*nlogitgen type = restaurant(fast: 1, family: 2| 3 | 4 | 5, fancy:
6 | 7)*
*gen incFast=(type==1)*income*
*gen incFancy=(type==3)*income*
*gen kidFast=(type==1)*kids*
*gen kidFancy=(type==3)*kids*
*nlogitrum chosen cost rating distance incFast incFancy kidFast
kidFancy, group(family_id) nests(rest type)*

The result is not an "unbalanced data" error, but rather "not concave" in the max likelihood function. I don't let it run very long, so I don't know if it ever converges; nor do I monkey with the max options. Nlogit runs the model fine.

I didn't run nlogitdn because I was looking at the degenerate models in table 5, for which Heiss doesn't use nlogitdn either (as far as I can tell).
In fact, unless I'm flying completely backwards here, nlogitdn is part of the method one uses to "trick" nlogit (i.e. NNNL) into producing rum-consistent estimations (i.e. RUMNL). At least that's what Heiss (SJ'02) appears to say. It is not what one uses to structure data to get nlogitrum to run. But I am not very familiar with the "ins and outs" of this.

Again, thank you very much.
John.

Anders Alexandersson wrote:

John Fulton <[email protected]> wrote:

I ran a model using nlogit, but nlogitrum kicks up an "unbalanced data"
error.
I've read Heiss's 2002 article but haven't gleaned much in the way of
what the problem might be, let alone the solution.

I don't think I've made a syntax error, but I've included the relevant
command lines below.
This model has a degenerate nest and three levels. My indepvars for the
top two nests are dummy interactions between chooser-specific
characteristics and the nest variable - as is done in the "restaurant"
example - so "var2" is "var2*(middle==1)" for all values. Similarly,
"var3" is "var3*(top==1)."

Here's the syntax for nlogitrum. It outputs the correct tree, then gives
an "unbalanced data" error.
nlogitrum chosen var1 var2 var3 [fw=count], group(group_id) nests(bottom
middle top)

Here's the nlogit command that works:
nlogit chosen (bottom=var1) (middle=var2) (top=var3) [fw=count] ,
group(group_id)


NB that my "levels" are reverse coded from stata's, so that "altsetb"
variable is my "1" level, altset2 is "2" and altset1 is "3".
Because you didn't seem to use -nlogitdn- but have a degenerate nest,
I guess that is the problem. Please provide a do-file that reproduces
the problem, e.g. using the Stata restaurant data, because I'm not
sure what exactly you typed.

Anders Alexandersson
[email protected]
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index