[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
Saifuddin Ahmed <sahmed@jhsph.edu> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: RE: Re: creating composite measures |

Date |
Fri, 23 Aug 2002 12:37:53 -0400 |

Another option is to use Latent Class Analysis (LCA) and estimate "latent score" as a composite score. LCA may be considered as a variant of factor analysis with categorical manifest variables. One problem, however, is that STATA does not have LCA facility. A free downloadable program, called LEM, is available at: http://www.kub.nl/faculteiten/fsw/organisatie/departementen/mto/software 2.html As Nick has pointed out, you have to be careful about masking the variables (guide by theory; examine first separately to see the association and effect directions; whether the variables have inverse association (with outcomes); etc). Another point, LEM can not handle missing values. In recent days, principal component analysis has stirred to generate composite index (e.g., creating wealth index from household items). Although one assumption for pc/factor analysis is that the manifest variables are continuous in metric scale, for generating pc score as a means of "data reduction", the assumption could be relaxed. Natural ordering, however, is warranted. Besides the two methods mentioned earlier, "classification of individuals by attributes" ( if viewed as "clustering of individuals") may be done with other statistical methods – potentially. One such method is implemented in STATA as "cluster analysis commands." [This para for info only; considering the distribution of 4 variables mentioned, this may not be a good choice; this method is more suitable with several manifest variables, specially when little theory is known about the underlying common variables for the clustering effects] Best wishes, Saifuddin Saifuddin Ahmed, MD, PhD Johns Hopkins Bloomberg School of Public Health ----- Original Message ----- From: Nick Cox <n.j.cox@durham.ac.uk> Date: Friday, August 23, 2002 5:53 am Subject: st: RE: Re: creating composite measures > Seth D. Hannah asked > > > > Can someone help me with creating a composite measure of > > prejudice from > > > four individual variables in my data set which measure prejudice. > > > the variables are: > > > > > > deasyblk: perception of blacks as easy to get along with > > > dwelfblk: perception of blacks as likely to be on welfare > > > dintlblk: perception of blacks as intelligent > > > drichblk: perception of blacks as rich or poor > > > > > > the variables are distributed as follows: > > > > > > . tab deasyblk > > > > > > easy to get along | > > > w/blacks | Freq. Percent > > Cum. > > > ---------------------+----------------------------------- > > > easy to get along w/ | 915 10.26 10.26 > > > 2 | 1052 > > 11.80 22.06 > > > 3 | 1379 > > 15.47 37.53 > > > neither | 2722 30.53 > > 68.06 > > > 5 | 1143 > > 12.82 80.88 > > > 6 | 638 > > 7.16 88.03 > > > hard to get along w/ | 547 6.14 94.17 > > > don't know... | 418 4.69 98.86 > > > missing | 102 1.14 100.00 > > > ---------------------+----------------------------------- > > > Total | 8916 100.00 > > > > > > . tab dwelfblk > > > > > > self-supporting: | > > > blacks | Freq. Percent Cum. > > > --------------------+----------------------------------- > > > prefer self-support | 754 8.46 8.46 > > > 2 | 521 > > 5.84 14.30 > > > 3 | 879 > > 9.86 24.16 > > > neither | 2132 23.91 48.07 > > > 5 | 1723 > > 19.32 67.40 > > > 6 | 1332 14.94 > > 82.34 > > > prefer welfare | 1046 11.73 94.07 > > > don't know... | 425 4.77 98.83 > > > missing | 104 1.17 100.00 > > > --------------------+----------------------------------- > > > Total | 8916 100.00 > > > > > > . tab dintlblk > > > > > > intelligence: | > > > blacks | Freq. Percent Cum. > > > --------------+----------------------------------- > > > intelligent | 723 8.11 8.11 > > > 2 | 807 9.05 17.16 > > > 3 | 1597 17.91 35.07 > > > neither | 3259 36.55 71.62 > > > 5 | 1255 14.08 85.70 > > > 6 | 479 5.37 91.0 > > > unintelligent | 207 2.32 93.39 > > > don't know... | 481 5.39 98.79 > > > missing | 108 1.21 100.00 > > > --------------+----------------------------------- > > > Total | 8916 100.00 > > > > > > . tab drichblk > > > > > > rich-poor: | > > > blacks | Freq. Percent Cum. > > > --------------+----------------------------------- > > > rich | 59 0.66 0.66 > > > 2 | 193 2.16 2.83 > > > 3 | 499 5.60 8.42 > > > neither | 2101 23.56 31.99 > > > 5 | 2506 28.11 60.09 > > > 6 | 2137 23.97 84.06 > > > poor | 970 10.88 94.9 > > > don't know... | 371 4.16 99.10 > > > missing | 80 0.90 100.00 > > > --------------+----------------------------------- > > > Total | 8916 100.00 > > > > > > What I want to do is combine these four variables into > > one measure of > > > prejudice, which will become a dependent variable in some > > of my models. > > > > > > The only way I could think to do it was to create a new > > variable prejblk > > > with numerical values 1 through 7 that equal the sums of > > the respective > > > 1 through 7's > > > from my four variables... > > > > > > gen prejblk=. > > > replace prejblk=1 if > > drichblk==1|dwelfblk==1|deasyblk==1|dintlblk==1 > > > replace prejblk=2 if > > drichblk==2|dwelfblk==2|deasyblk==2|dintlblk==2 > > > etc. > > > > > > somehow this doesn't seem right, please help! > > Bo Cutter > > > As a first step you may want to look at a factor analysis (Principal > > components). This analysis will look at how and whether > > you can reduce your > > 5 variables into one or more variables. > > Nick Winter > > > I would consider averaging the variables, after reversing the coding > for > > the ones that are coded with opposite "sense". (e.g., so that > higher > > scores on each indicates more tolerant attitudes) > > > Look at egen rmean(...) > > Why do you need a composite measure? It is often a good way > of blurring important distinctions. If in fact these measures > are highly related, then one will serve as well as any other. > If, as seems a little more likely, they measure rather > different things, it is not clear that any composite measure > will add much to looking separately at your different responses. > > In any case, any kind of averaging (means or PCA) has to be smart > about > don't knows and missings, which I guess are coded higher > than the other values. At first sight, the only clean way > to deal with those is to omit any observation with any don't know > or missing from the averaging. > > Also contemplate > > gra deasyblk dwelfblk dintlblk drichblk, matrix j(1) > > > Nick > n.j.cox@durham.ac.uk > > * > * For searches and help try: > * http://www.stata.com/support/faqs/res/findit.html > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**Re: st: RE: Generating random variates** - Next by Date:
**st: RE: RE: Generating random variates** - Previous by thread:
**st: RE: Re: creating composite measures** - Next by thread:
**st: sktest** - Index(es):

© Copyright 1996–2015 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |