Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Re: Polychoric PCA error message

From   Yashin <>
Subject   st: Re: Polychoric PCA error message
Date   Fri, 22 Feb 2013 18:50:34 -0500

Dear Statalisters,

While executing polychoric PCA to produce an asset/wealth index, three
iterations of the following message appeared:

  numerical derivatives are approximate
  nearby values are missing

I understand that the first principal component should be the wealth
index; it contains negative values, with proportion of explained
variance = ~25%.

Principal component analysis

 k  |  Eigenvalues  |  Proportion explained  |  Cum. explained


  1 |    7.955020   |    0.248594            |   0.248594

Question: Are there any methods to double check that the procedure
performed reasonably well and that I can use this 1st PC as an asset
index? 25% explained variance and persistence of error messages
suggest that I should be cautious.

In case it helps, I have copied below a previous issue I have had with
the same dataset,\ which led me to remove five variables, resulting in
no zeros in correlation matrix.

Thank you for any comments or suggestions!



On Tue, 29 Jan 2013 22:05:25, Stas Kolenikov <> wrote:

A zero cell means that the underlying two normal variables have a
correlation of 1 -- or at least that's the maximum likelihood
estimate. Visually, the normal distribution is degenerately
concentrated on a line that passes outside of zero. With a correlation
of 1, ML estimation breaks down: the maximizer runs out of the sample
space, and produces missing values for negative definite matrices that
have correlations > 1; and if a solution is claimed to exist, it can
not stable, as we work with poorly defined matrices that are unstable
to invert (in the sense of finite accuracy arithmetics). In your
example, not only you have zero cells that make estimation of the
correlations difficult; the small marginal proportions will not make
-polychoric- very happy, either.

Vika Savalei wrote about the existing tweaks
(, but
I don't have anything implemented in the code.


On 29 January 2013 22:34, Yashin <> wrote:
Dear Statalisters:

I am trying to run polychoric PCA from Stas Kolenikov on a data subset
(wealth index) that--pre-winnowing--has 32 dichotomous variables, four
ordinal variables, and one continuous variable. I am getting the
following error messages, repeated times:

could not calculate numerical derivatives
missing values encountered
numerical derivatives are approximate
nearby values are missing

I found the following thread addressing this issue,


and similarly I also found that for those coefficients in the
correlation matrix that are either zero or > 0.9, the 2x2 tables
invariably have a cell with small numbers (usually 0, and in other
cases 1, 2, 3 and in one case a 7). In this case, this would not be a
structural zero but a sampling zero.

I have related questions I am hoping someone might help shed light on:

1) When I examined the six 2x2 tables for variable pairs with
correlation coefficients > 0.9, they did not appear to be highly
correlated, and further, included one cell with 0

I'm copying a couple of examples below:

. /* tabulate high correlation pairs */

. tab vacuum carpet

           |        carpet
    vacuum |         0          1 |     Total
         0 |        21        835 |       856
         1 |         0        342 |       342
     Total |        21      1,177 |     1,198

. tab computer stove

           |         stove
  computer |         0          1 |     Total
         0 |        12      1,033 |     1,045
         1 |         0        146 |       146
     Total |        12      1,179 |     1,191

2) When I run the polychoric with only the dichotomous variables, and
then with the same variables plus the additional 5 variables described
above (ordinal and continuous), I get different correlation
coefficients in the correlation matrix for the same variable pairs.
How could this be? Sometimes the values are similar and yet different,
and in other cases the values are quite different (some of the
correlations > 0.9 when binary, ordinal and continuous variables are
included in the matrix become zero when only binary variables are
included in the matrix).

3) To address the issue of 2x2's with zeros, one colleague suggested
flattening in the previous thread ( )--I
wondered if there are other options.

Many thanks for any thoughts!


*   For searches and help try:

© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index