[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
st: Health Survey for England
So I received a reply from the keepers of the data, ESDS Government:
"Thanks for your query. Yes you have understand correctly that postal
code is used as the PSU. Unfortunately you won't find this or strata in
the HSE datasets because of concerns over confidentiality. This is
something that we are going to raise with ONS and other data providers
as it is definitely one of the shortfalls with the datasets so thank you
for raising the issue. I'm sorry I can't bring you any better news."
So knowing that the data are from a complex multistage sampling design,
but having no access to the psu information, what would be the best way
to proceed with analysis?
I am using Stata 8 and trying to investigate the association between
colon symptoms (28 in the HSE variables illsm1 through illsm6) and
various indicators of child behavior (HSE variables sdq*, for example).
For instance, I would generate a new variable colon=1 if any illsm*
==28, and zero otherwise. Then regress sdq_hyp (a hyperactivity scale)
on colon and several other control variables like age, household income,
educational attainment of parent, and so forth.
Is this doomed to failure without the sampling design variables?
Each selected househould could contribute up to two children as
subjects. For households with 2 kids or less, all of them were
subjects. For households with 3 or more kids, 2 were selected. Would
simply using -regress- and clustering on hserial (household serial
number) be beneficial?
Christopher W. Ryan, MD
SUNY Upstate Medical University Clinical Campus at Binghamton
and Wilson Family Practice Residency, Johnson City, NY
GnuPG and PGP public keys available at http://pgp.mit.edu
"If you want to build a ship, don't drum up the men to gather wood,
divide the work and give orders. Instead, teach them to yearn for the
vast and endless sea." [Antoine de St. Exupery]
* For searches and help try: