Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Constructing socio-economic status scale using Principal Components Analysis

From	"MacQuarrie, Kerry" <[email protected]>
To	"[email protected]" <[email protected]>
Subject	RE: st: Constructing socio-economic status scale using Principal Components Analysis
Date	Wed, 28 Nov 2012 15:21:46 +0000

Dear Ameya,

It sounds like you're building your SES measure off of an index of household assets, building materials, etc.  You may want to look at the following resources:

Filmer D. and LH Pritchett (2001) "Estimating Wealth Effects without Expenditure Data or Tears: An Application to Educational Enrollments in States of India." Demography 38(1):115-132.

Rutstein, SO and K Johnson (2004) The DHS Wealth Index. DHS Comparative Reports 6. Calverton, MD: ORC Macro.

Rutstein, SO (2008) The DHS Wealth Index: Approaches for Rural and Urban Areas. DHS Working Papers No. 60. Calverton, MD: Macro International.

These latter two are available on the Measure DHS website (http://measuredhs.com) and they describe the process the DHS undertakes.  In this case, the DHS used PCA and retained the first, largest factor (which is constructed differently for each country).  However, from this factor scores, SES is broken into wealth quintiles and it is often these wealth quintiles that are used as a categorical variable (or rather a series of dichotomous variables for each quintile) as independent variables in analyses.  This may be a potential approach to model after.

I agree with others who have stated that the effective differences between pca and pcf factor analyses are small and emanate from convention as to whether one is trying an exercise in data reduction or to uncover the underlying structure in the data.

Good luck!
Kerry

-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Ameya Bondre
Sent: Wednesday, November 28, 2012 9:49 AM
To: [email protected]
Subject: Re: st: Constructing socio-economic status scale using Principal Components Analysis

thanks Nick and Maarten for the inputs...

The aim here is to assess the effect of SES on the probability that a child would be malnourished, or a mother would feed a more diversified diet to her children or attend a growth monitoring session. So, I want to use SES as a binary or categorical variable in logistic regressions. In some regressions, I would use it to control for SES, caste and other such "background variables". I am sorry, the data set has many more variables, 37 of those can potentially measure SES.

So, can factor1 as a continuous variable (ranging from -2 to 1.8) be used in the regression? I am finding that a bit difficult to interpret, so I thought I would have a SES scale instead that can be constructed from factor1?

Regarding "selecting a group of variables" which can predict SES, do you mean I can do that by just looking at the factor1 variable weights/scores? Is there a criteria to choose weights -  like variables with weights more than 0.10 point to a high SES, so that would make SES a binary variable?

Thank you,
Ameya

On Wed, Nov 28, 2012 at 2:03 AM, Nick Cox <[email protected]> wrote:
> For once I disagree partially with Maarten.
>
> On reading this again I have further comments:
>
> 1. The difference between -factor, pcf- and -pca- is small and 
> arguably immaterial as far as the results here are concerned. In 
> practice, the techniques are associated, however,  with very different 
> attitudes, -factor- often with a theology of latent variables and
> -pca- often with a mechanistic aim of data reduction.
>
> 2. However, it doesn't seem much of a gain for interpretation to 
> discard interpretable variables and replace them with a very fuzzy 
> concept of socio-economic status (SES), even if numbers are attached.
>
> 3. This is not just an attitude, as the factor analysis results show 
> that the technique has not been especially successful (18% of variance 
> captured by first factor).
>
> 4. If Ameya's variables are typical of data like this that I have 
> seen, most marginal distributions will be skewed and clumpy and the 
> correlation structure extremely sensitive to whether data are left as 
> they come or transformed in some suitable way(s).
>
> 5. Ameya's main concern is presumably to do the best job with the 
> dataset in hand, but this kind of procedure is not highly reproducible 
> by others working in similar territory, except naturally with the same 
> dataset of "about 37 variables". It is usually better to try to 
> identify say 5-10 socio-economic variables and use those as predictors 
> in a regression-like model.
>
> That said, much depends on the main aim of this project, which is not 
> clear. (Presumably, the measure of SES is not an end in itself.)
>
> Nick
>
> On Wed, Nov 28, 2012 at 9:18 AM, Maarten Buis <[email protected]> wrote:
>> On Wed, Nov 28, 2012 at 3:59 AM, Ameya Bondre wrote:
>>> I have a data-set with about 37 variables that can assess household 
>>> socio-economic status in a sample of about 6000 households. These 
>>> include variables measuring household wealth, access to water and 
>>> sanitation, rural households owning animals, etc.
>>>
>>> I used factor analysis (factor var1, var2, ...., pcf)
>>
>> I would say that factor analysis is incorrect for this problem. 
>> Factor analysis assumes that the latent concepts influence the 
>> observed variables. This makes sense for something like an intelligence test:
>> someone is more or less smart (the latent variable) and that 
>> influences the probability of answering a set of questions correctly 
>> (the observed variables). Conceptually, socio-economic status is just 
>> a pool of resources available to a person, family, or household: so 
>> it is the number and kind of animals, the wealth, a house with a 
>> concrete floor, etc. (the observed variables) that influence, or add 
>> up to, the socio-economic status (the latent variable).
>>
>> Some of the possible solutions available in Stata are discussed here:
>> <http://www.maartenbuis.nl/wp/prop.html>.
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

--
Dr. Ameya Bondre
Research Analyst, Tufts University, Boston, MA Master of Science in Public Health (MSPH) Johns Hopkins Bloomberg School of Public Health, Baltimore, MD MBBS, G.S Medical College and KEM Hospital, Mumbai, India
Phone: (781) 298-1668
Email: [email protected]
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: Constructing socio-economic status scale using Principal Components Analysis
  - From: Stas Kolenikov <[email protected]>

References:
- st: Constructing socio-economic status scale using Principal Components Analysis
  - From: Ameya Bondre <[email protected]>
- Re: st: Constructing socio-economic status scale using Principal Components Analysis
  - From: Maarten Buis <[email protected]>
- Re: st: Constructing socio-economic status scale using Principal Components Analysis
  - From: Nick Cox <[email protected]>
- Re: st: Constructing socio-economic status scale using Principal Components Analysis
  - From: Ameya Bondre <[email protected]>

Prev by Date: st: from Ian Watson: new version of tabout available on SSC
Next by Date: Re: st: Reply: Setting default .do file template?
Previous by thread: Re: st: Constructing socio-economic status scale using Principal Components Analysis
Next by thread: Re: st: Constructing socio-economic status scale using Principal Components Analysis
Index(es):
- Date
- Thread