Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: gologit2 model

 From Maarten buis To statalist@hsphsun2.harvard.edu Subject Re: st: gologit2 model Date Wed, 13 Oct 2010 21:26:24 +0100 (BST)

```--- On Wed, 13/10/10, Nilam Prasai wrote:
> Furthermore, what would be the good sample size to run
> maximum likelihood estimates

Depends on a lot of things, to name two: the number of
parameters and the variance in your dependent variable,
i.e. how much information you are trying to extract from
the data, and how much information is present in the data.

-gologit2- in its default formulation tries to estimate
a lot of parameters. Say your dependent variable contains
5 categories, then you are trying to estimate 4 equations,
that is 4 parameters for everey explanatory variable in
your model plus 4 constants. Say you have 4 explanatory
variables, then you are estimating 20 parameters. An
absolute minimum would be 10 observations per parameter,
and often you need a lot more, so in that case we would
require a minimum of 200 observations. I don't think that
that would be enough to trust the standard errors, but in
very well behaved data, this may be enough to get reasonable
point estimates. For the standard errors to be correct, the
asymptotics need to start kicking in. I would not be
surprised if that would require a 100 observations per
parameter or more, so leading to a minimum sample size of
2000.

A common problem with models like -gologit2- and -mlogit-
are categories of the dependent variable that contain
few observations. This is a variation on low variance
in the dependent variable. In that case you probably
do not have enough information to estimate the parameters
of the corresponding equation.

Anyhow, the real minimum number of observations depends
on all the details of your model and data, and the best
way of getting an idea about that is to run a simulation.
For example below is one such simulation. You just need
two -gologit2- models and the references to the
parameter of interest, in my case the parameter of male
in the first equation. Than play around with the sample
size to see when trouble starts occuring (this is the
number of -sim- in the -simulate- command). The example
below also requires Ian White's -simsum- wich you can
described in:

Ian White (2010) "simsum: Analyses of simulation studies
including Monte Carlo error". The Stata Journal, 10(3):
369--385.
<http://www.stata-journal.com/article.html?article=st0200>

*----------------------- begin example ------------------
use http://www.indiana.edu/~jslsoc/stata/spex_data/ordwarm2.dta, clear
save c:/temp/dat, replace
gologit2 warm yr89 male white age ed prst,
tempname true
scalar `true' = [#1]_b[male]

program drop _all
program define sim, rclass
args N
use c:/temp/dat, clear
bsample `N'
gologit2 warm yr89 male white age ed prst,
return scalar b = [#1]_b[male]
return scalar se = [#1]_se[male]
end
simulate b=r(b) se=r(se), reps(1000): sim 500

simsum b, se(se) true(`true') mcse
*----------------------- end example -----------------------
(For more on examples I sent to the Statalist see:
http://www.maartenbuis.nl/example_faq )

Hope this helps,
Maarten

--------------------------
Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany

http://www.maartenbuis.nl
--------------------------

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```