Dr. Buis,
Thanks for offering additional, detailed advice. I also greatly appreciate
the command structure for using the z-scores (first advice). The percentile
suggestion also seems definitely worth exploring (second advice), and thanks
for clarifying the raw versus standardized variables distinction, in
checking for correlation. You have offered more than I anticipated. Again,
much thanks.
Joon.
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Maarten buis
Sent: Friday, November 17, 2006 1:04 AM
To: [email protected]
Subject: Re: st: composite index
--- "Joon G. Park" <[email protected]> wrote:
> I have the following three variables:
>
> Variable "A" (Scale: millions; range: from hundreds of thousands to
> millions; Unit: dollar)
>
> Variable "B" (Scale: hundreds; Range: from single digits to three
> double digits; Unit: "size")
>
> Variable "C" (Scale: hundreds; Range: from less than ten to tens of
> thousands; Unit: personnel)
>
> I would like to have equal weights attached for each variable that
> will comprise the composite index
A common solution is to make new variables that have the same unit,
i.e. standardize your variables. Normality is not an issue here, since
standardizing a variable says nothing about normality, it only makes
sure that the mean is zero and the standard deviation is one. Notice
that the mean of several standardised variables is itself not
standardised. So, it would be good to standardize that new variable
again. You could do that as follows:
sum A
gen za = (A-r(mean))/r(sd)
sum B
gen zb = (B-r(mean))/r(sd)
sum C
gen zc = (C-r(mean))/r(sd)
gen composite = za + zb + zc
sum composite
replace composite = (composite-r(mean))/r(sd)
An alternative to standardisation is to use percentile scores. This too
will ensure that the variables have the same unit. Notice that the
interpretation of standardised variables is usually also in terms of
percentiles, but with the added assumption that the normal distribution
is reasonably approximated: for instance a z-score close to -2 is
interpreted as small because it would be near the bottom 2.5% if the
distribution were normal. Percentile scores won't need this assumption,
but it will change the form of the distribution.
> An additional note: I am not sure if a factor analysis approach would
> also work, since a cursory exam of the data does not suggest a high
> level of positive correlation between variable C and the other two
> variables
The correlation of the raw variables does not need to be positive, as
long as you make sure that a large positive number means the same thing
in all your standardised variables, by reversing the sign for variables
that were coded in the reverse order. The thus standardised variables
should be strongly postively correlated. You want to add them up into
one composite variable because they measure the same thing, if they
meausure the same thing than they must be strongly correlated, if they
don't, then they don't meausure the same thing and you don't want to
add them to your composite index.
HTH,
Maarten
-----------------------------------------
Maarten L. Buis
Department of Social Research Methodology
Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands
visiting address:
Buitenveldertselaan 3 (Metropolitan), room Z434
+31 20 5986715
http://home.fsw.vu.nl/m.buis/
-----------------------------------------
Send instant messages to your online friends http://uk.messenger.yahoo.com
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/