Statalist The Stata Listserver

[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: composite index

From   Maarten buis <[email protected]>
To   [email protected]
Subject   Re: st: composite index
Date   Fri, 17 Nov 2006 07:03:58 +0000 (GMT)

--- "Joon G. Park" <[email protected]> wrote:
> I have the following three variables:  
> Variable "A" (Scale: millions; range: from hundreds of thousands to
> millions; Unit: dollar)
> Variable "B" (Scale: hundreds; Range: from single digits to three
> double digits; Unit: "size")
> Variable "C" (Scale: hundreds; Range: from less than ten to tens of
> thousands; Unit: personnel)
> I would like to have equal weights attached for each variable that
> will comprise the composite index

A common solution is to make new variables that have the same unit,
i.e. standardize your variables. Normality is not an issue here, since
standardizing a variable says nothing about normality, it only makes
sure that the mean is zero and the standard deviation is one. Notice
that the mean of several standardised variables is itself not
standardised. So, it would be good to standardize that new variable
again. You could do that as follows:

sum A
gen za = (A-r(mean))/r(sd)

sum B
gen zb = (B-r(mean))/r(sd)

sum C
gen zc = (C-r(mean))/r(sd)

gen composite = za + zb + zc
sum composite
replace composite = (composite-r(mean))/r(sd)

An alternative to standardisation is to use percentile scores. This too
will ensure that the variables have the same unit. Notice that the
interpretation of standardised variables is usually also in terms of
percentiles, but with the added assumption that the normal distribution
is reasonably approximated: for instance a z-score close to -2 is
interpreted as small because it would be near the bottom 2.5% if the
distribution were normal. Percentile scores won't need this assumption,
but it will change the form of the distribution.

> An additional note: I am not sure if a factor analysis approach would
> also work, since a cursory exam of the data does not suggest a high
> level of positive correlation between variable C and the other two
> variables

The correlation of the raw variables does not need to be positive, as
long as you make sure that a large positive number means the same thing
in all your standardised variables, by reversing the sign for variables
that were coded in the reverse order. The thus standardised variables
should be strongly postively correlated. You want to add them up into
one composite variable because they measure the same thing, if they
meausure the same thing than they must be strongly correlated, if they
don't, then they don't meausure the same thing and you don't want to
add them to your composite index.


Maarten L. Buis
Department of Social Research Methodology
Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands

visiting address:
Buitenveldertselaan 3 (Metropolitan), room Z434

+31 20 5986715

Send instant messages to your online friends 
*   For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index