[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: pool cross-section survey data

From   Steven Samuels <[email protected]>
To   [email protected]
Subject   Re: st: pool cross-section survey data
Date   Fri, 10 Oct 2008 10:00:11 -0400


1. Austin and Stas have already give good advice about doing a treatment comparison. Austin referred to a "super-stratum". By that he means, I believe, the following: Suppose survey 1 had I strata; survey 2 had J strata: survey 3 had K strata. You should create a new (super-stratum), so that the total of your strata is I+J+K. One way is: "egen new_strat = group(survey stratum)". If the original surveys were not stratified, then survey will be your new stratum variable.

2. I would like to know more detail about the treatment assignments and the two-stage and three-stage designs: What were the units at each stage of each survey? How the survey design and treatment design interact? For example, were different treatments applied to different survey strata? Did each PSU receive only one treatment? Were treatments randomly assigned to areas? Were there any certainty units? This will have implications for the "degrees of freedom" for your tests. There was a thread about this at: statalist/archive/2008-01/msg00187.html about this. Although at the time, I agreed with Jeff Pitblado's (and Stata's) default recommendation, now I am not so sure. See Korn and Graubard, 1999, Analysis of Health Surveys, Wiley, Section 5.1.

3. You have asked if you need to aggregate your data, but I'm not sure what you mean by "area", "cohort," and "district".

4. Stata's survey analysis programs will answer the question: How different is change (in means, proportion) over time in areas with treatment A from the change in areas with treatment B? The p-values will be based on the survey design. Do you wish your conclusions to also generalize beyond the areas in this study? If so, then perhaps standard errors should be based on "between-area" differences in response profiles, just as in a longitudinal study of individuals. This brings the analysis back to the longitudinal, model-based ("super-population") approaches that Austin and Stas have written about.


On Oct 9, 2008, at 4:52 AM, Ana Gabriela Guerrero Serdan wrote:

Dear Steven,

Yes, PSUs were randomly selected in each survey. One survey design was done in two stages the others in three stages. However, the sampling frame is the same and based on the census.

I want to see if outcomes (Yi e.g. school/health) do change over time for peple that are living in some areas (dt) that are exposed to a certain treatment. So in the main issue I am looking for is the effect of residing in a certain region at a certain time on outcomes (assuming there is no migration).

I am also wondering if I would need to aggregate variables to a higher level maybe cohort or district? because I do not have panel data but repeated cross section surveys.

How do I deal with the difference of the sample designs?


--- On Tue, 10/7/08, Steven Samuels <[email protected]> wrote:

From: Steven Samuels <[email protected]>
Subject: Re: st: pool cross-section survey data
To: [email protected]
Date: Tuesday, October 7, 2008, 2:07 PM
You might find useful some of the advice at

You probably need a -survey- enabled analysis, or at least
one that
can handle weights and clustering.  To advise you further,
we would
need details of the survey design (strata, stages, units at
stage, weights).  Of particular interest: were primary
sampling units
(PSUs) selected anew at each survey? Also, what exactly is
the goal
of your analysis?  The suffix "dt" in your
equation suggests to me
that you want to look at changes.


On Oct 7, 2008, at 1:47 PM, Clive Nicholas wrote:

Gaby Guerrero Serdan wrote:

I wonder if you could point me out on readings and
on the main
issues when trying to pool two or three
independent cross-
sectional surveys. N is large and T is small. The
data is not
panel in the sense that I do not observe the same
individuals in
the three surveys but they are representative at
the provincial
and urban/rural areas.

I am trying to see if I can model something like

Yidt= a + b Xidt + c Zt + dPidt + u

where Xit are characteristics that might varied
over time for each
individual. Z is specific time for all
individuals. P is dummy for
individuals treated in region d and time t.

 I have been reading the Wooldrige on
cross-sectional and panel
data but would like to know if you know of any
other sources or
have in mind any applied examples and/or
econometric problems you
may encounter.

John Micklewright's chapter on analysing pooled
cross-sectional data
in Dale and Davies (1994) might be a very useful
starting point for

Clive Nicholas

[Please DO NOT mail me personally here, but at
<[email protected]>. Please respond to
contributions I make in
a list thread here. Thanks!]

Dale A and Davies RB (1994) Analysing Social and
Political Change: A
Casebook of Methods, London: Sage.
*   For searches and help try:

*   For searches and help try:

*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index