Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: MORG data aggregation


From   "Austin Nichols" <austinnichols@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: MORG data aggregation
Date   Thu, 10 Apr 2008 14:47:21 -0400

Jimmy Verner <jverner@earthlink.net>:
That's the US Census Bureau you mean, I presume, and the survey is the
Current Population Survey (CPS). You don't mention what years and
months you are using--the file format changes over time. Individuals
are weighted not "so that the data is nationally aggregated" but so
that the data can be made (more or less) representative of the
resident noninstitutionalized population.

Jean Roth has a very nice collection of materials to begin with at
 http://www.nber.org/data/cps_index.html
and you should also try:
 ssc install ddf2dct
 help ddf2dct

The CPS has a somewhat odd hierarchical structure, with three
different kinds of records stacked on top of each other.  It's
possible you have neglected to put the data in the household and
family records into new variables, and drop those extraneous records.
If so, you might see:

  +----------------------------+
  | h_seq   precord   a_fnlwgt |
  |----------------------------|
  |     1         1       3200 |
  |     1         2          1 |
  |     1         3        628 |
  |     1         3      539.4 |
  |     1         3     506.81 |
  |     1         3     611.33 |
  |     1         3     491.08 |
  |----------------------------|
  |     2         1       3200 |
  |     2         2          2 |
  |     2         3     464.15 |
  |----------------------------|
  |     3         1       3200 |
  |----------------------------|
  |     4         1       3200 |
  |     4         2          2 |
  |     4         3      518.7 |
  |     4         3     534.48 |
  +----------------------------+

where the only real person records are those with precord==3, and
adding up the false weights for observations with precord==1 or
precord==2 would result in too-large estimated population sizes.

If you look in e.g.
http://www.nber.org/data/progs/cps/cpsmar07.do
you will see a bunch of -replace- statements followed by a -keep if
precord==3- (this is one way to turn the hierarchical file with 3
kinds of records into a person-level file).

But the calculations below indicate that may not be the problem, and
you may have other problems...  perhaps you have a file with an
implicit decimal point in the weight variable, and you have forgotten
to divide by some power of ten (usually "two implied decimal places"
so you must divide the weight by 100)?

clear all
qui infile using cpsmar07, using(cpsmar07.dat)
replace gestfips=gestfips[_n-1] if precord>1
su a_fnlw if gestf==1, meanonly
di %14.0f r(sum)
     479069661
su a_fnlw if gestf==1 & precord==3, meanonly
di %14.0f r(sum)
       4555061
su a_fnlw if gestf==1 & inlist(pemlr,1,2,3,4), meanonly
di %14.0f r(sum)
       2180305
su a_fnlw if gestf==1 & precord==3 & inlist(pemlr,1,2,3,4), meanonly
di %14.0f r(sum)
       2180305

This last is the estimate of Alabama's labor force in March 07, about
2.2 million, and that estimate is not affected by having the HH and
family records on the file.  In general, the total labor force is
about half the total population, and the latter numbers are available
in published tables for you to check or at e.g.
http://quickfacts.census.gov/qfd/

The CPS survey design variables are not on the public-use files, only
weights, but you can get reasonable estimates with:

egen psu=group(gestcen gtcsa)
svyset [pw=mars], strat(gestcen) psu(psu)

and see e.g.
http://www.amstat.org/Sections/Srms/Proceedings/papers/1992_127.pdf
for more detail.

On Thu, Apr 10, 2008 at 12:28 PM, Jimmy Verner <jverner@earthlink.net> wrote:
> The Census publishes monthly MORG files.  Individuals are weighted so that
> the data is nationally aggregated.  I'm trying to pull monthly observations
> by state from the files, but I'm not doing something right.  I've tried the
> various svy commands but my results just don't make sense (e.g., Alabama
> does not have a labor force of 22 million!).
>
> Does anyone have any do files on this subject?  Any other input would be
> much appreciated.
>
> I'm running Intercooled Stata 8.0 on OS X 5.
>
> TIA.
>
> Jimmy Verner
> Graduate Student
> School of Economic, Political & Policy Sciences
> University of Texas - Dallas
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index