Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: How to analyze repeated cross-sectional data in which units are not truly followed?


From   Misha Spisok <misha.spisok@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   st: How to analyze repeated cross-sectional data in which units are not truly followed?
Date   Mon, 19 Oct 2009 13:09:59 -0700

Hello, Statalist!

This is a modified (simplified, I think) version of an unanswered question.

In brief, are -xt- commands appropriate for repeated cross-sectional
data in which different units are observed over time?  For the data
below, I considered the following:

/* First Error - repeated time valus within panel */
. xtset state year

repeated time values within panel

/* Attempt to correct first error, followed by second error - weight
must be constant within panel variable */
. by state year: generate type = _n
. egen newid = group(state type)
. xtset newid year
. xtreg y x1 x2 [fw=n], fe

weight must be constant within newid

/* Attempt to correct second error, followed by third "error" */
. egen newerid = group(state year type)
. xtset newerid year
. xtreg y x1 x2 [fw=n], fe

The third "error" is that I get something like the following for output:

  y           |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-----------+----------------------------------------------------------------------
     x1      |  (dropped)
     x2      |  (dropped)
    _cons |    2.979312  2.12e-19  1.4e+19   0.000      2.979312     2.979312

This last "error" seems to be simply a consequence of having no
"cross-sectional" units being "sampled" more than once.

Using -expand- or -expandcl- results in a dataset that is too large
for my memory constraints, even with up to 100g.

What is the appropriate way of analyzing repeated cross-sectional data
in which different units are observed in each period (in general) and,
in particular, grouped cross-sectional data of the variety that I have
(where the same units are probably observed, but they are not uniquely
identified and cannot be followed in the sense of panel data with a
unique identifier)?

Ultimately, the problem seems to be that newerid_t != newerid_t+k for
all (any) k.

My data look like the following:

State	Year	Population	y	n	x1	x2
1	1990	25261069	2.57	1070121	-1.33	11.4
1	1990	25261069	1.19	1810912	-0.57	9.98
1	1990	25261069	1.8	4748773	0.16	8.44
1	1990	25261069	4.07	3289300	-0.08	7.66
1	1990	25261069	5.53	4125362	1.85	7.84
1	1990	25261069	4.03	10216601	-0.46	6.26
…	…	…	…	…	…	…
50	1990	11092381	4.74	332842	-1.41	13.43
50	1990	11092381	2.9	1233123	0.96	12.2
50	1990	11092381	4.56	1922374	1.75	13.41
50	1990	11092381	5.17	1218358	-0.26	9.6
50	1990	11092381	2.18	423648	-2.09	10.48
50	1990	11092381	2.97	5962036	-0.51	6.52
…	…	…	…	…	…	…
1	2000	27787176	3.56	1769078	0.4	9.84
1	2000	27787176	2.04	2083925	0.32	9.93
1	2000	27787176	4.01	3338879	-0.1	8.4
1	2000	27787176	2.83	5401349	-1.28	11.65
1	2000	27787176	6.81	3204418	1.04	9.27
1	2000	27787176	2.33	11989527	0.15	10.4
…	…	…	…	…	…	…
50	2000	12201619	6.52	701923	0.39	12.31
50	2000	12201619	5.02	2224842	-1.62	7.55
50	2000	12201619	4.6	713768	0.02	11.61
50	2000	12201619	2.75	1172416	-0.43	12.94
50	2000	12201619	6.95	858296	1	10.48
50	2000	12201619	4.27	6530374	-2.14	11.58

Thank you for your time and attention.

Two thoughts that came to mind (without any basis) are (1) using
seemingly unrelated regressions (even though the dependent variable is
the same, but the year of observation would be different) or (2) using
a meta-analysis.

Misha
Using Stata 10.1 but with access to Stata 11

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index