Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: How to simulate data from ordered probit --abnormality of results?


From   Nblunch@worldbank.org
To   statalist@hsphsun2.harvard.edu
Subject   st: How to simulate data from ordered probit --abnormality of results?
Date   Sun, 30 May 2004 16:03:12 -0400



Dear Statalisters,

As part of a larger simulation exercise, I am starting out with simple probits
and ordered probits (I have never tried this before and therefore wanted to get
the basics right before moving on to something more complicated).

While the results for the ordered probit are largely consistent with the data
process that I specified in the sense that the estimated parameters (except for
one, namely the one for X1 below) are included in the 95 percent confidence
interval, I still thought these a bit "off" compared to what I have expected.  I
realize that due to the (pseudo) randomness of the variables, the results would
likely deviate a bit.

As a newbie in this arena I wonder if these results are "normal" -- or am I
doing something wrong?  For example, the estimated standard deviations on the
cut points are much larger than the values I specified and, again, while the
estimated parameters mostly fall within the 95 percent confidence interval, they
do seem a bit "off" compared to the values I specified...?  I realize that the
fit is not that good, maybe that is part of the problem?  If so, how do I go
about ensuring a good (but not too good, since then observations will drop
out!!) fit of the regression?

Your help and insights will be greatly appreciated -- Thanks!!

Cheers,

Niels-Hugo



Here follows the relevant parts of the log-file:

  . /* Create the residual (u) and the X's: */
.
. matrix m =   (0, 31.4, 10.4, 6.85, 10.8, 8.5, 6.4)

. matrix sdm = (1, 7.3, 4.7, 1.13, 2.4, 0.2, 0.3)

. drawnorm u X1 X2 X3 X4 X5 X6, n(3000) seed(19712004) means(m) sds(sdm)
(obs 3000)

. /* Create the cut points: */
.
. matrix cp = (0.77, 1.29)

. matrix sdcp = (0.1, 0.2)

. drawnorm Cut1 Cut2, n(3000) seed(19712004) means(cp) sds(sdcp)

. summarize

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
           u |      3000   -.0027312    1.006306  -3.117829     3.4684
          X1 |      3000    31.18638    7.139536   5.131144   55.13468
          X2 |      3000    10.37326    4.799704  -7.207477   29.47523
          X3 |      3000    6.857711    1.119025   2.623326    11.1235
          X4 |      3000    10.82347    2.468898   1.783469    18.8689
-------------+--------------------------------------------------------
          X5 |      3000    8.505496    .1961301   7.764948   9.130346
          X6 |      3000    6.387869    .3051463    5.26484   7.604205
        Cut1 |      3000    .7697269    .1006306   .4582171    1.11684
        Cut2 |      3000    1.284147    .1956037   .5703053   1.940265

.
. /* Specify the regression parameters: */
.
. matrix betas = (-.005, -.04, .14, 0.03, 0.01, 0.07)

. matrix colnames betas = X1 X2 X3 X4 X5 X6

.
. /* Generate the dependent variable: */
.
. matrix score z = betas

. gen y = 0 if z + u <= Cut1
(2082 missing values generated)

. replace y = 1 if z + u > Cut1 & z + u <= Cut2
(652 real changes made)

. replace y = 2 if z + u > Cut2
(1430 real changes made)

.
. tab y

          y |      Freq.     Percent        Cum.
------------+-----------------------------------
          0 |        918       30.60       30.60
          1 |        652       21.73       52.33
          2 |      1,430       47.67      100.00
------------+-----------------------------------
      Total |      3,000      100.00

.
. oprobit y X1 X2 X3 X4 X5 X6

Iteration 0:   log likelihood = -3141.7719
Iteration 1:   log likelihood = -3036.6214
Iteration 2:   log likelihood =  -3036.493
Iteration 3:   log likelihood =  -3036.493

Ordered probit estimates                          Number of obs   =       3000
                                                  LR chi2(6)      =     210.56
                                                  Prob > chi2     =     0.0000
Log likelihood =  -3036.493                       Pseudo R2       =     0.0335

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          X1 |  -.0202611   .0030312    -6.68   0.000     -.026202   -.0143201
          X2 |  -.0461186   .0045333   -10.17   0.000    -.0550036   -.0372336
          X3 |   .1245861   .0192731     6.46   0.000     .0868115    .1623606
          X4 |   .0362435   .0086893     4.17   0.000     .0192128    .0532741
          X5 |  -.1937575    .108436    -1.79   0.074    -.4062881    .0187731
          X6 |   .0422267   .0701143     0.60   0.547    -.0951948    .1796481
-------------+----------------------------------------------------------------
       _cut1 |  -1.776959   1.053154          (Ancillary parameters)
       _cut2 |  -1.184145   1.053026
------------------------------------------------------------------------------

.

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index