Manipulating large datasets in Stata: how do we manage?
|
Speakers |
Michael Rosato, Office of National Statistics
Seeromanie Harding, Office of National Statistics
E. McVey, Office of National Statistics
|
|
Date |
5 June 1997
|
This paper discusses the manipulation of large datasets in Stata using the
Office of National Statistics Longitudinal Study. The study is based on a
one per cent sample of the population of England and Wales (about 650,000
persons) with 22 years of follow-up and is the largest cohort study in this
country.
We present findings on socio-economic variations in health using two
examples of current analyses. For both of these analyses we used Cox
regression models in Stata. The first shows the impact of social class
mobility on mortality of middle aged men and the second examines the
incidence of cancers among second generation Irish living in England and
Wales.
Previously, analysis of Longitudinal Study data was mainly limited to
descriptive statistics as use of individual level data was restricted to
mainframe computing. This made it difficult to implement the advances in
software for statistical modelling. Recent changes in protocols have
enabled analysis of individual level data in a PC environment using Stata.
This has brought new problems associated with the large size of the datasets
and the capability of the machines. We discuss the problems encountered and
the methods used to overcome the difficulties involved in analysing such a
large national datasets in Stata.
The handout for this presentation consists of one page which we have scanned
in. It is readable when printed.
Page 1 (56K)
|