Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Does there exist measurement error when I got high Cronbach's alpha?

From   William Hauser <>
Subject   Re: st: Does there exist measurement error when I got high Cronbach's alpha?
Date   Mon, 18 Jul 2011 13:08:25 -0400

To reply to the list simply reply to this email and note that the
addressee is and do not change the
subject line (doing so will mess up the archiving/threading of the
message).  It is fine to address me personally but I have made my
reply public in case our discussion is helpful to others in the

My response to your question is as follows:

Your negative numbers reflect missing data.  It is missing in the
sense that the respondents could have answered the question (as
everyone 'has' personality traits) but did not for some reason.  Since
this sounds like survey data I would assume that the respondent either
refused to answer, felt like they 'didn't know', or quit the survey
before answering these questions.  In any case the missing responses
are probably not missing at random.  That is, they probably 'chose'
not to answer and did so for some purposeful reason and thus the
exclusion of these cases potentially biases your results.  For
example, it may be the the least extroverted respondents
systematically refused to answer the questions about their
personality.  In any case I would heartily recommend you do some
additional reading about missing data and the ways it may or may not
bias your results.  If the data are missing at random - for example,
as a function of the survey design (i.e. if only a random subset of
respondents were asked this set of questions) then you shouldn't worry
about the missing data.  In any case, even if the data are not missing
at random listwise deletion of cases is one solution although you
should be forthcoming about potential bias in your results.
Alternatively you may impute the missing data (stata command -mi) but
this is, no matter the sophistication, making up data under the
assumption that the patterns that hold in the data you have also hold
in the data that are missing (which may be a rather dubious

The bottom line is, there are a lot of respondents who didn't answer
and thus scored a '-2' although their true score for those questions
may be quite disparate from one another (some may be 5's and other
50's).  The -2 and -3 designations are substantively meaningless and
cannot be included in the analysis as is.  Either impute values for
these cases or drop the cases from the analysis by coding them as
missing.  Including them as is inflates the alpha coefficient because
a whole lot of people who scored a -2 on q1 also scored a -2 on the
other questions (thereby inflating the alpha coefficient) although,
again, their true score is not a -2 (the -2 is just a catch-all for
those who did not answer).  By including the -2 and -3 scores you're
in effect saying that the -2's all have the same level of 'intellect'
or conscientiousness when, in reality, the probably do not - what they
do have is a similar propensity to not answer questions.

Looking at the item table, it looks like the 'emo' and 'con' items
have less in common that the other items in the index.  They are more
unique or different from the other items although there are far better
ways to look at the unidimensionality of the items (see -help factor).

Lastly, what I mean by including the items as is, is why are you
creating this index in the first place?  I'm assuming you're creating
the index for use as a predictor variable in a regression analysis.
If that's the case then why not just plug the variables in as is (but
with the -2 and -3 coded as missing)?  They are associated but
probably not enough to be treated a unidimensional personality trait
(the alpha coefficient is borderline low - it really should be .8 or
higher, maybe .7).  If you are just calculating the alpha coefficient
to see if they are unidimensional or not then there are more precise
methods to see how the items covary (again, see -factor).

> Hi Will Hauser,
> Thanks very much for your response.
> Sorry to reply to your personal email, since I don't know how to give a response to you on the Statalist (, it seems like you were not posting your response there. Could you please tell me how to make my response publicly available like you did?
> For my research question, firstly, I would like to emphasize that the original responses for those people who didn't answer the question or didn't provide complete information in the data set were coded as "-3" and "-2", and I recoded them to "missing values". In fact, I am not quite sure whether I should include these observations with negative values when computing the alpha coefficients.
> In Stata, I used the -item suffix, and got 0.54, 0.55, 0.6, 0.55, 0.62 as alpha coefficients for the five personality measures (extraversion,  agreeableness,  conscientiousness, emotional stability, intellect), when excluding those observations with "-3" or "-2" as responses.
> Here is the command and the result table:
> alpha nd8ext nd8agr nd8con nd8emo nd8int, item std
> Test scale = mean(standardized items)
>                                                             average
>                              item-test     item-rest       interitem
> Item         |  Obs  Sign   correlation   correlation     correlation     alpha
> -------------+-----------------------------------------------------------------
> nd8ext       | 2112    +       0.6811        0.4466          0.2266      0.5395
> nd8agr       | 2112    +       0.6649        0.4236          0.2351      0.5515
> nd8con       | 2112    +       0.5885        0.3197          0.2755      0.6033
> nd8emo       | 2112    +       0.5629        0.2866          0.2890      0.6191
> nd8int       | 2112    +       0.6702        0.4310          0.2323      0.5477
> -------------+-----------------------------------------------------------------
> Test scale   |                                               0.2517      0.6271
> -------------------------------------------------------------------------------
> Lastly, what do you mean by "including the variables in your model as is"? Should I include all the other variables in my model to compute the alpha coefficients? Why should I do this?
> Many thanks.
> Best,
> Sharon

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index