[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
st: extract variables from big datasets
I am wondering how people extract variables from a large panel data set.
Specifically, I am wondering whether it is better to extract all of the
variables that I need at once or it is better to extract some variables first
and add more variables later. I described my situations below, and I would
appreciate it if the following two questions can be discussed.
(1) How people extract variables in this situation?
(2) Is there a rule or rule of thumb that I should follow?
I am working on a big panel data set (the Health and Retirement Study). This
data set has variables on insurance status and on characteristics of insurance
plans. My analysis has two goals. First, I want to know of the evolution of
respondentsí insurance status across 4 waves. Secondly, for insured
respondents, I want to know of the characteristics of their insurance plans
across 4 waves.
I have to extract variables from 4 cross-sectional waves to generate a panel
dataset, and I am wondering whether I should extract all of the variables
(variables on insurance status and on characteristics of insurance plans) at
once or I should extract the variables in two steps. That is, extract variables
on insurance status first and after I am done with the analysis on insurance
status, extract variables on the characteristics of insurance plans.
I attached below the tradeoff of these two approaches, and I would appreciate
(1) Extract all of the variables at once
Good: I donít have to merge datasets twice.
Bad: I extract too many variables and itís too overwhelming to do the analysis.
I donít know exactly what the characteristics variables that I should extract.
(2) Extract the variables in two steps.
Good: I feel less overwhelmed. I can do one thing at a time. After I am done
with analysis on insurance status, I know more about the context and therefore
know more about the characteristics variables that I should extract.
Bad: I have to merge datasets twice. Do I waste my time in merging the datasets
* For searches and help try: