Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | William Hauser <whauseriii@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: Does there exist measurement error when I got high Cronbach's alpha? |
Date | Mon, 18 Jul 2011 13:08:25 -0400 |
Sharon, To reply to the list simply reply to this email and note that the addressee is statalist@hsphsun2.harvard.edu and do not change the subject line (doing so will mess up the archiving/threading of the message). It is fine to address me personally but I have made my reply public in case our discussion is helpful to others in the future. My response to your question is as follows: Your negative numbers reflect missing data. It is missing in the sense that the respondents could have answered the question (as everyone 'has' personality traits) but did not for some reason. Since this sounds like survey data I would assume that the respondent either refused to answer, felt like they 'didn't know', or quit the survey before answering these questions. In any case the missing responses are probably not missing at random. That is, they probably 'chose' not to answer and did so for some purposeful reason and thus the exclusion of these cases potentially biases your results. For example, it may be the the least extroverted respondents systematically refused to answer the questions about their personality. In any case I would heartily recommend you do some additional reading about missing data and the ways it may or may not bias your results. If the data are missing at random - for example, as a function of the survey design (i.e. if only a random subset of respondents were asked this set of questions) then you shouldn't worry about the missing data. In any case, even if the data are not missing at random listwise deletion of cases is one solution although you should be forthcoming about potential bias in your results. Alternatively you may impute the missing data (stata command -mi) but this is, no matter the sophistication, making up data under the assumption that the patterns that hold in the data you have also hold in the data that are missing (which may be a rather dubious assumption). The bottom line is, there are a lot of respondents who didn't answer and thus scored a '-2' although their true score for those questions may be quite disparate from one another (some may be 5's and other 50's). The -2 and -3 designations are substantively meaningless and cannot be included in the analysis as is. Either impute values for these cases or drop the cases from the analysis by coding them as missing. Including them as is inflates the alpha coefficient because a whole lot of people who scored a -2 on q1 also scored a -2 on the other questions (thereby inflating the alpha coefficient) although, again, their true score is not a -2 (the -2 is just a catch-all for those who did not answer). By including the -2 and -3 scores you're in effect saying that the -2's all have the same level of 'intellect' or conscientiousness when, in reality, the probably do not - what they do have is a similar propensity to not answer questions. Looking at the item table, it looks like the 'emo' and 'con' items have less in common that the other items in the index. They are more unique or different from the other items although there are far better ways to look at the unidimensionality of the items (see -help factor). Lastly, what I mean by including the items as is, is why are you creating this index in the first place? I'm assuming you're creating the index for use as a predictor variable in a regression analysis. If that's the case then why not just plug the variables in as is (but with the -2 and -3 coded as missing)? They are associated but probably not enough to be treated a unidimensional personality trait (the alpha coefficient is borderline low - it really should be .8 or higher, maybe .7). If you are just calculating the alpha coefficient to see if they are unidimensional or not then there are more precise methods to see how the items covary (again, see -factor). > Hi Will Hauser, > > Thanks very much for your response. > > Sorry to reply to your personal email, since I don't know how to give a response to you on the Statalist (http://statalist.1588530.n2.nabble.com/), it seems like you were not posting your response there. Could you please tell me how to make my response publicly available like you did? > > For my research question, firstly, I would like to emphasize that the original responses for those people who didn't answer the question or didn't provide complete information in the data set were coded as "-3" and "-2", and I recoded them to "missing values". In fact, I am not quite sure whether I should include these observations with negative values when computing the alpha coefficients. > > In Stata, I used the -item suffix, and got 0.54, 0.55, 0.6, 0.55, 0.62 as alpha coefficients for the five personality measures (extraversion, agreeableness, conscientiousness, emotional stability, intellect), when excluding those observations with "-3" or "-2" as responses. > > Here is the command and the result table: > alpha nd8ext nd8agr nd8con nd8emo nd8int, item std > > Test scale = mean(standardized items) > > average > item-test item-rest interitem > Item | Obs Sign correlation correlation correlation alpha > -------------+----------------------------------------------------------------- > nd8ext | 2112 + 0.6811 0.4466 0.2266 0.5395 > nd8agr | 2112 + 0.6649 0.4236 0.2351 0.5515 > nd8con | 2112 + 0.5885 0.3197 0.2755 0.6033 > nd8emo | 2112 + 0.5629 0.2866 0.2890 0.6191 > nd8int | 2112 + 0.6702 0.4310 0.2323 0.5477 > -------------+----------------------------------------------------------------- > Test scale | 0.2517 0.6271 > ------------------------------------------------------------------------------- > > Lastly, what do you mean by "including the variables in your model as is"? Should I include all the other variables in my model to compute the alpha coefficients? Why should I do this? > > Many thanks. > > Best, > Sharon * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/