Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: Model identification in Stata sem()

 From John Antonakis To statalist@hsphsun2.harvard.edu Subject Re: st: Model identification in Stata sem() Date Sat, 08 Dec 2012 10:40:01 +0100

On a basic level, and with respect to the number of variables one has in the model or the constraints one makes, one can also work out, by hand, whether the model is under-identified, just-identified, or over-identified (necessary but not sufficient).
```
e.g.,
set seed 100
set obs 1000
gen x = rnormal()
gen x1 = x + rnormal()
gen x2 = x + rnormal()
sem (X ->x1 x2)

```
In the case of a two indicator confirmatory factor analysis, the model is not identified. There are 3 elements--v(v+1)/2--in the variance-covariance matrix from the v=2 variables (which gives two variances and a covariance) and what is being estimated is:
```
1. 1 loading for one of the indicators (the other is constrained to 1)
2. 1 variance of the latent variable
3. 2 variances of the disturbances

```
3-4 = -1 (model undefined). This model could be just-identified though, by making constraints (e.g., that the loadings are tau-equivalent, and constraining the variance of X to unity):
```
sem (X ->x1@a x2@a), var(X@1)

```
Now, if we introduce a third variable (e.g., y, predicted by the latent variable), we have--6 elements in the variance-covariance matrix:
```
1. 1 loading for one of the indicators (the other is constrained to 1)
2. 1 variance of the latent variable
3. 3 variances of the disturbances
4. 1 structural coefficient

6-6=0 (the model is just-identified)

```
Thus, a model could always be identified by adding more variables or making constraints.
```
```
Of course, there are other issues related to identification with respect to checking for local identification as Jay suggested below, empirical underidentification, etc. For the latter see page 50:
```
```
Kenny, D. A. (1979). Correlation and causality. New York, Wiley-Interscience. Kenny has made this book freely available here: http://davidakenny.net/books.htm
```
```
See also McDonald, R. P. (1982). A note on the investigation of local and global identifiability. Psychometrika, 47, 101-103.
```http://link.springer.com/article/10.1007%2FBF02293855

Best,
J.

__________________________________________

Prof. John Antonakis
Department of Organizational Behavior
University of Lausanne
Internef #618
CH-1015 Lausanne-Dorigny
Switzerland
Tel ++41 (0)21 692-3438
Fax ++41 (0)21 692-3305
http://www.hec.unil.ch/people/jantonakis

Associate Editor
__________________________________________

On 08.12.2012 07:26, JVerkuilen (Gmail) wrote:
```
```On Fri, Dec 7, 2012 at 1:19 PM, William Buchanan
<william@williambuchanan.net> wrote:
```
```Hi Robert,

On the slide (32) that you referenced, there may not be a "formal" warning in terms of any blaring error messages but the output that they show includes information (or more accurately a lack thereof) that would indicate problems with the model.  If you look at "chi2(-1)" and "Prob > chi2 = ." that serves as a subtle indication that the model is not identified.  Any time "." shows up in the output, it generally is an indication that there were problems fitting the model to the data and it should be investigated further.

```
```There are no truly general tests of identification of a model. A
number of algebraic tests exist in many cases and I suspect that other
SEM packages are checking them. You can check local identification by
computing the Jacobian matrix and checking its rank, which must be
full. Bekker, Merckens and Wansbeek (1994) wrote a nice book on the
topic and there are some other nice articles around which I can dig up
references to if desired.

A while back someone posted an example of a model fit by -sem- (an
exploratory factor analysis) and I showed that it was unidentified.
The sign was that the standard errors were whack, so one of the best
signs is that the standard errors are massive compared to what you'd
expect. It's easiest to see this in a standardized solution, because
in that case the standard errors should be proportional to 1/sqrt(n).
If they are not, that's a sure sign that one or more parameters is
unidentified, either in the population or empirically.

http://www.stata.com/statalist/archive/2012-10/msg00525.html
http://www.stata.com/statalist/archive/2012-10/msg00526.html

Bekker, P., Merckens, A., Wansbeek, T. (1994). Identification,
Equivalent Models and Computer Algebra. Academic Press.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
```
```
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
```