Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: extract variables from big datasets


From   htseng@uchicago.edu
To   statalist@hsphsun2.harvard.edu
Subject   st: extract variables from big datasets
Date   Fri, 21 May 2004 11:31:13 -0500

Dear all, 

I am wondering how people extract variables from a large panel data set. 
Specifically, I am wondering whether it is better to extract all of the 
variables that I need at once or it is better to extract some variables first 
and add more variables later. I described my situations below, and I would 
appreciate it if the following two questions can be discussed. 

(1) How people extract variables in this situation? 
(2) Is there a rule or rule of thumb that I should follow?  

Situation:

I am working on a big panel data set (the Health and Retirement Study). This 
data set has variables on insurance status and on characteristics of insurance 
plans. My analysis has two goals. First, I want to know of the evolution of 
respondentsí insurance status across 4 waves. Secondly, for insured 
respondents, I want to know of the characteristics of their insurance plans 
across 4 waves.   

I have to extract variables from 4 cross-sectional waves to generate a panel 
dataset, and I am wondering whether I should extract all of the variables 
(variables on insurance status and on characteristics of insurance plans) at 
once or I should extract the variables in two steps. That is, extract variables 
on insurance status first and after I am done with the analysis on insurance 
status, extract variables on the characteristics of insurance plans. 

I attached below the tradeoff of these two approaches, and I would appreciate 
your comments. 

(1) Extract all of the variables at once

Good: I donít have to merge datasets twice. 

Bad: I extract too many variables and itís too overwhelming to do the analysis. 
I donít know exactly what the characteristics variables that I should extract. 

(2) Extract the variables in two steps. 

Good: I feel less overwhelmed. I can do one thing at a time. After I am done 
with analysis on insurance status, I know more about the context and therefore 
know more about the characteristics variables that I should extract. 

Bad: I have to merge datasets twice. Do I waste my time in merging the datasets 
twice? 

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index