Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: gologit2 model

From   Maarten buis <>
Subject   Re: st: gologit2 model
Date   Wed, 13 Oct 2010 21:26:24 +0100 (BST)

--- On Wed, 13/10/10, Nilam Prasai wrote:
> Furthermore, what would be the good sample size to run
> maximum likelihood estimates

Depends on a lot of things, to name two: the number of
parameters and the variance in your dependent variable, 
i.e. how much information you are trying to extract from
the data, and how much information is present in the data.

-gologit2- in its default formulation tries to estimate
a lot of parameters. Say your dependent variable contains
5 categories, then you are trying to estimate 4 equations,
that is 4 parameters for everey explanatory variable in 
your model plus 4 constants. Say you have 4 explanatory 
variables, then you are estimating 20 parameters. An 
absolute minimum would be 10 observations per parameter,
and often you need a lot more, so in that case we would
require a minimum of 200 observations. I don't think that
that would be enough to trust the standard errors, but in
very well behaved data, this may be enough to get reasonable
point estimates. For the standard errors to be correct, the
asymptotics need to start kicking in. I would not be 
surprised if that would require a 100 observations per 
parameter or more, so leading to a minimum sample size of

A common problem with models like -gologit2- and -mlogit-
are categories of the dependent variable that contain 
few observations. This is a variation on low variance
in the dependent variable. In that case you probably
do not have enough information to estimate the parameters
of the corresponding equation.

Anyhow, the real minimum number of observations depends
on all the details of your model and data, and the best
way of getting an idea about that is to run a simulation.
For example below is one such simulation. You just need
to replace the example data with your data, adjust the
two -gologit2- models and the references to the 
parameter of interest, in my case the parameter of male
in the first equation. Than play around with the sample
size to see when trouble starts occuring (this is the 
number of -sim- in the -simulate- command). The example
below also requires Ian White's -simsum- wich you can
download by typing -ssc install simsum- and which is 
described in:

Ian White (2010) "simsum: Analyses of simulation studies 
including Monte Carlo error". The Stata Journal, 10(3):

*----------------------- begin example ------------------
use, clear
save c:/temp/dat, replace
gologit2 warm yr89 male white age ed prst, 
tempname true
scalar `true' = [#1]_b[male]

program drop _all
program define sim, rclass
	args N
	use c:/temp/dat, clear
	bsample `N' 
	gologit2 warm yr89 male white age ed prst, 
	return scalar b = [#1]_b[male]
	return scalar se = [#1]_se[male]
simulate b=r(b) se=r(se), reps(1000): sim 500

simsum b, se(se) true(`true') mcse
*----------------------- end example -----------------------
(For more on examples I sent to the Statalist see: )

Hope this helps,

Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen


*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index