SYMPOSIUM ON LARGE DATA SETS
November 6th., 2003
Amsterdam, The Netherlands
http://www.vvs-ssp.nl/symposium2003.html
Organization
________________________________
Section Statistical Software of The Netherlands Society for Statistics and Operations Research
Program Committee
________________________________
Dr. Ruud Koning, Universiteit of Groningen
Prof.dr. Arno Siebes, University of Utrecht
Dr. Siem Heisterkamp, National Institute of Public Health and the Environment (RIVM)
Prof.dr. Patrick Groenen, Erasmus University Rotterdam
Large Data Sets
________________________________
Fifteen years ago, handling of large datasets, let alone analysis in them was a nearly impossible task for researchers. The data were often stored on tape, and even the process of reading the dataset into the memory of a mainframe was slow. Memory was scarce, and so it was difficult to save intermediate results. Such datasets were analyzed using either tailor-made statistical software, or self-written programs using routines from numerical libraries like NAG or IMSL. Maximum-likelihood estimation of non-linear models was non-trivial if not impossible, and researchers often had to be satisfied with one-step improvements over some consistent estimator.
Things have changed for the better, from a technical point of view. Huge datasets are routinely available to researchers in different fields, like finance, marketing, biomedical sciences, particle physics, astronomy, life sciences, and social sciences. Datasets used to be large in the sense of containing many observations on a small number of variables. But nowadays, e.g. in the life sciences we are confronted with datasets with a small number of observations and a huge number of variables. Data can be transported on media that can be read by most personal computers, and the computing power on the desk of a statistical researcher is absolutely impressive. Instead of focusing on the mechanics of the analysis of datasets, researchers can focus on the actual statistical analysis. Thus the question has turned into: Now that we have a lot of data, what could we do with it?
This conference addresses the analysis of very large datasets, both from the point of view of a statistician who works with such datasets as well as the point of view of practitioners from various fields. By presenting several applications and tools available to a modern day statistical researcher, we want to show that large datasets offer unique opportunities for researchers to answer questions that were difficult to tackle before. The program committee is delighted to be able to present a selection of the top researchers on this topic.
Registration
________________________________
Please register via email to [email protected] or online via:
http://www.vvs-ssp.nl/symposium2003registration.html
Program
________________________________
9:30 registration and coffee
10:00 opening
10:05 Yoav Benjamini
Tel-Aviv University
Multiplicity issues related to complex research questions
in microarrays analysis
10:55 Philip Hans Franses
Erasmus University, Rotterdam
More, but also better?
11:40 Paul Eilers
Leiden University Medical Centre
Low Memory, High Speed Smoothing on Large
Multidimensional Grids
12:30 Lunch
13:30 Andreas Buja
University of Pennsylvania
Hands-On Experiences with Mining Telecom Data
14:15 Jos Roerdink
University of Groningen
Visualization of large data sets with applications in
life science
15:00 coffee/ tea break
15:15 Geert Wets
Limburg University, Belgium
Large data sets in traffic safety
16:30 Drinks
VVS-SSP
Nieuwpoortkade 25
1055 RX Amsterdam
The Netherlands
T +31 (0)20 5608410
F +31 (0)20 5608448
E [email protected]
U www.vvs-ssp.nl
SYMPOSIUM ON LARGE DATA SETS
November 6th., 2003
Amsterdam, The Netherlands
http://www.vvs-ssp.nl/symposium2003.html
Organization
Section Statistical Software of The Netherlands Society for Statistics and Operations Research
Program Committee
Dr. Ruud Koning, Universiteit of Groningen
Prof.dr. Arno Siebes, University of Utrecht
Dr. Siem Heisterkamp, National Institute of Public Health and the Environment (RIVM)
Prof.dr. Patrick Groenen, Erasmus University Rotterdam
Large Data Sets
Fifteen years ago, handling of large datasets, let alone analysis in them was a nearly impossible task for researchers. The data were often
stored on tape, and even the process of reading the dataset into the memory of a mainframe was slow. Memory was scarce, and so it was
difficult to save intermediate results. Such datasets were analyzed using either tailor-made statistical software, or self-written programs
using routines from numerical libraries like NAG or IMSL. Maximum-likelihood estimation of non-linear models was non-trivial if not
impossible, and researchers often had to be satisfied with one-step improvements over some consistent estimator.
Things have changed for the better, from a technical point of view. Huge datasets are routinely available to researchers in different fields,
like finance, marketing, biomedical sciences, particle physics, astronomy, life sciences, and social sciences. Datasets used to be large in
the sense of containing many observations on a small number of variables. But nowadays, e.g. in the life sciences we are confronted with
datasets with a small number of observations and a huge number of variables. Data can be transported on media that can be read by most
personal computers, and the computing power on the desk of a statistical researcher is absolutely impressive. Instead of focusing on the
mechanics of the analysis of datasets, researchers can focus on the actual statistical analysis. Thus the question has turned into: Now
that we have a lot of data, what could we do with it?
This conference addresses the analysis of very large datasets, both from the point of view of a statistician who works with such datasets
as well as the point of view of practitioners from various fields. By presenting several applications and tools available to a modern day
statistical researcher, we want to show that large datasets offer unique opportunities for researchers to answer questions that were difficult
to tackle before. The program committee is delighted to be able to present a selection of the top researchers on this topic.
Registration
Please register via email to [email protected] or online via:
http://www.vvs-ssp.nl/symposium2003registration.html
Program
9:30 registration and coffee
10:00 opening
10:05 Yoav Benjamini
Tel-Aviv University
Multiplicity issues related to complex research questions
in microarrays analysis
10:55 Philip Hans Franses
Erasmus University, Rotterdam
More, but also better?
11:40 Paul Eilers
Leiden University Medical Centre
Low Memory, High Speed Smoothing on Large
Multidimensional Grids
12:30 Lunch
13:30 Andreas Buja
University of Pennsylvania
Hands-On Experiences with Mining Telecom Data
14:15 Jos Roerdink
University of Groningen
Visualization of large data sets with applications in
life science
15:00 coffee/ tea break
15:15 Geert Wets
Limburg University, Belgium
Large data sets in traffic safety
16:30 Drinks
VVS-SSP
Nieuwpoortkade 25
1055 RX Amsterdam
The Netherlands
T +31 (0)20 5608410
F +31 (0)20 5608448
E [email protected]
U www.vvs-ssp.nl
Our Sponsors
CANdiensten
Your Partner in Mathematics and Statistics
Nieuwpoortkade 23-25
1055 RX Amsterdam
The Netherlands
http://www.candiensten.nl/english
SAS Institute B.V.
The Power to Know
Postbus 3053
1270 EB Huizen
The Netherlands
http://www.sas.com/nl
Cosinus
Powerful and Reliable Programs for Statistics and Numerics
Grotestraat 401a, Waalwijk
Postbus 220
5150 AE Drunen
The Netherlands
http://www.cosinus.nl
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/