# st: How to analyze repeated cross-sectional data in which units are not truly followed?

 From Misha Spisok To statalist@hsphsun2.harvard.edu Subject st: How to analyze repeated cross-sectional data in which units are not truly followed? Date Mon, 19 Oct 2009 13:09:59 -0700

```Hello, Statalist!

This is a modified (simplified, I think) version of an unanswered question.

In brief, are -xt- commands appropriate for repeated cross-sectional
data in which different units are observed over time?  For the data
below, I considered the following:

/* First Error - repeated time valus within panel */
. xtset state year

repeated time values within panel

/* Attempt to correct first error, followed by second error - weight
must be constant within panel variable */
. by state year: generate type = _n
. egen newid = group(state type)
. xtset newid year
. xtreg y x1 x2 [fw=n], fe

weight must be constant within newid

/* Attempt to correct second error, followed by third "error" */
. egen newerid = group(state year type)
. xtreg y x1 x2 [fw=n], fe

The third "error" is that I get something like the following for output:

y           |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-----------+----------------------------------------------------------------------
x1      |  (dropped)
x2      |  (dropped)
_cons |    2.979312  2.12e-19  1.4e+19   0.000      2.979312     2.979312

This last "error" seems to be simply a consequence of having no
"cross-sectional" units being "sampled" more than once.

Using -expand- or -expandcl- results in a dataset that is too large
for my memory constraints, even with up to 100g.

What is the appropriate way of analyzing repeated cross-sectional data
in which different units are observed in each period (in general) and,
in particular, grouped cross-sectional data of the variety that I have
(where the same units are probably observed, but they are not uniquely
identified and cannot be followed in the sense of panel data with a
unique identifier)?

all (any) k.

My data look like the following:

State	Year	Population	y	n	x1	x2
1	1990	25261069	2.57	1070121	-1.33	11.4
1	1990	25261069	1.19	1810912	-0.57	9.98
1	1990	25261069	1.8	4748773	0.16	8.44
1	1990	25261069	4.07	3289300	-0.08	7.66
1	1990	25261069	5.53	4125362	1.85	7.84
1	1990	25261069	4.03	10216601	-0.46	6.26
…	…	…	…	…	…	…
50	1990	11092381	4.74	332842	-1.41	13.43
50	1990	11092381	2.9	1233123	0.96	12.2
50	1990	11092381	4.56	1922374	1.75	13.41
50	1990	11092381	5.17	1218358	-0.26	9.6
50	1990	11092381	2.18	423648	-2.09	10.48
50	1990	11092381	2.97	5962036	-0.51	6.52
…	…	…	…	…	…	…
1	2000	27787176	3.56	1769078	0.4	9.84
1	2000	27787176	2.04	2083925	0.32	9.93
1	2000	27787176	4.01	3338879	-0.1	8.4
1	2000	27787176	2.83	5401349	-1.28	11.65
1	2000	27787176	6.81	3204418	1.04	9.27
1	2000	27787176	2.33	11989527	0.15	10.4
…	…	…	…	…	…	…
50	2000	12201619	6.52	701923	0.39	12.31
50	2000	12201619	5.02	2224842	-1.62	7.55
50	2000	12201619	4.6	713768	0.02	11.61
50	2000	12201619	2.75	1172416	-0.43	12.94
50	2000	12201619	6.95	858296	1	10.48
50	2000	12201619	4.27	6530374	-2.14	11.58

Thank you for your time and attention.

Two thoughts that came to mind (without any basis) are (1) using
seemingly unrelated regressions (even though the dependent variable is
the same, but the year of observation would be different) or (2) using
a meta-analysis.

Misha