Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down at the end of May, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"JVerkuilen (Gmail)" <jvverkuilen@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Re: Polychoric PCA error message |

Date |
Sat, 23 Feb 2013 00:36:06 -0500 |

I would recommend looking at the original correlations (non polychoric) and one and two-way marginal tables. If there is an excess of zero cells or other pathological patterns, polychoric won't converge. Sometimes you can fix this up by judicious assumptions such as adding one to all cells in the table, but this needs to be handled with serious care. Search the archives, this was discussed recently. On Fri, Feb 22, 2013 at 6:50 PM, Yashin <yashin5@gmail.com> wrote: > Dear Statalisters, > > While executing polychoric PCA to produce an asset/wealth index, three > iterations of the following message appeared: > > numerical derivatives are approximate > nearby values are missing > > I understand that the first principal component should be the wealth > index; it contains negative values, with proportion of explained > variance = ~25%. > > Principal component analysis > > k | Eigenvalues | Proportion explained | Cum. explained > > ----+---------------+------------------------+------------------ > > 1 | 7.955020 | 0.248594 | 0.248594 > > > Question: Are there any methods to double check that the procedure > performed reasonably well and that I can use this 1st PC as an asset > index? 25% explained variance and persistence of error messages > suggest that I should be cautious. > > In case it helps, I have copied below a previous issue I have had with > the same dataset,\ which led me to remove five variables, resulting in > no zeros in correlation matrix. > > Thank you for any comments or suggestions! > > Yashin > > ________________________________ > > On Tue, 29 Jan 2013 22:05:25, Stas Kolenikov <skolenik@gmail.com> wrote: > > A zero cell means that the underlying two normal variables have a > correlation of 1 -- or at least that's the maximum likelihood > estimate. Visually, the normal distribution is degenerately > concentrated on a line that passes outside of zero. With a correlation > of 1, ML estimation breaks down: the maximizer runs out of the sample > space, and produces missing values for negative definite matrices that > have correlations > 1; and if a solution is claimed to exist, it can > not stable, as we work with poorly defined matrices that are unstable > to invert (in the sense of finite accuracy arithmetics). In your > example, not only you have zero cells that make estimation of the > correlations difficult; the small marginal proportions will not make > -polychoric- very happy, either. > > Vika Savalei wrote about the existing tweaks > (http://www.mat.ulaval.ca/fileadmin/Cours/STT-7620/Savalei11.pdf), but > I don't have anything implemented in the code. > > ________________________________ > > > > On 29 January 2013 22:34, Yashin <yashin5@gmail.com> wrote: > Dear Statalisters: > > I am trying to run polychoric PCA from Stas Kolenikov on a data subset > (wealth index) that--pre-winnowing--has 32 dichotomous variables, four > ordinal variables, and one continuous variable. I am getting the > following error messages, repeated times: > > could not calculate numerical derivatives > missing values encountered > numerical derivatives are approximate > nearby values are missing > > I found the following thread addressing this issue, > > http://www.stata.com/statalist/archive/2012-11/msg00826.html > > and similarly I also found that for those coefficients in the > correlation matrix that are either zero or > 0.9, the 2x2 tables > invariably have a cell with small numbers (usually 0, and in other > cases 1, 2, 3 and in one case a 7). In this case, this would not be a > structural zero but a sampling zero. > > I have related questions I am hoping someone might help shed light on: > > 1) When I examined the six 2x2 tables for variable pairs with > correlation coefficients > 0.9, they did not appear to be highly > correlated, and further, included one cell with 0 > > I'm copying a couple of examples below: > > . /* tabulate high correlation pairs */ > > . tab vacuum carpet > > | carpet > vacuum | 0 1 | Total > -----------+----------------------+---------- > 0 | 21 835 | 856 > 1 | 0 342 | 342 > -----------+----------------------+---------- > Total | 21 1,177 | 1,198 > > > . tab computer stove > > | stove > computer | 0 1 | Total > -----------+----------------------+---------- > 0 | 12 1,033 | 1,045 > 1 | 0 146 | 146 > -----------+----------------------+---------- > Total | 12 1,179 | 1,191 > > 2) When I run the polychoric with only the dichotomous variables, and > then with the same variables plus the additional 5 variables described > above (ordinal and continuous), I get different correlation > coefficients in the correlation matrix for the same variable pairs. > How could this be? Sometimes the values are similar and yet different, > and in other cases the values are quite different (some of the > correlations > 0.9 when binary, ordinal and continuous variables are > included in the matrix become zero when only binary variables are > included in the matrix). > > 3) To address the issue of 2x2's with zeros, one colleague suggested > flattening in the previous thread ( > http://www.stata.com/statalist/archive/2012-11/msg00829.html )--I > wondered if there are other options. > > Many thanks for any thoughts! > > Yashin > > -- > ysl > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ -- JVVerkuilen, PhD jvverkuilen@gmail.com http://lesswrong.com/ "Everybody loves progress but nobody likes change." ---Fortune cookie, 1/13/13. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Re: Polychoric PCA error message***From:*Nick Cox <njcoxstata@gmail.com>

**References**:**st: Re: Polychoric PCA error message***From:*Yashin <yashin5@gmail.com>

- Prev by Date:
**st: matcell option in tab command within a loop inside postfile** - Next by Date:
**st: how to generate parent variables matched to their children in household level data set?** - Previous by thread:
**st: Re: Polychoric PCA error message** - Next by thread:
**Re: st: Re: Polychoric PCA error message** - Index(es):