Stata
Products Purchase Support Company
Search
   >> Home >> Resources & support >> Users Group meetings >> 4th UK meeting >> Abstract

Tools for longitudinal data management

Speaker  Michael Hills

In Stata longitudinal data are usually coded long, that is to say each set of measurements at each new time point constitutes a new record, and the set of all records for a subject share the same subject id. When exploring such data interactively most simple operations refer to records, but often the answers required are those referring to subjects. The most obvious example is how many subjects are there? This is the same as the number of unique codes for subject id, and is returned by the Stata command codebook id, along with much else. A simple alternative is the new command unique id which generalizes to unique id visit, for example, which reports the number of unique combinations of id and visit. In general, the command

        . unique varlist, by(varname) gen(newvar)

will give the number of unique combinations of varlist. When the by is present it creates a new variable newvar, which contains the number of unique combinations of varlist for each level of varname. For example,

        . unique job, by(id) gen(jobvar)

reports the overall number of unique values for the variable job, and creates the variable jobvar which contains the number of different job codes for each subject.

Slightly more complex questions take the form: how many records satisfy the condition C, where C refers to a single variable. An example is the condition height == . . The command longch takes the form

        . longch id, c(height == .)

where id is the subject id variable name and c( ) contains the condition. The output looks like this:

        71  records fulfill the condition height == .
        
        some : 46  subjects have height == . in at least one record
        none : 51  subjects have height == . in no records
        every: 0  subjects have height == . in every record

In addition three logical variables called _some, _none, and _every, are created for convenience in further manipulation (e.g. dropping or keeping records). These flag all records belonging to subjects with some records satisfying the condition, and so on.

Meetings
Announcements
Proceedings
Overview
2008 Summer North America
2007 North America
2006 North America
2005 North America
2004 North America
2003 North America
2001 North America
2007 Washington, DC
2007 West Coast U.S.
2006 Australia
2004 Australia
2008 Germany
2007 Germany
2006 Germany
2005 Germany
2004 Germany
2003 Germany
2003 Ireland
2007 Italy
2006 Italy
2005 Italy
2004 Italy
2002 Netherlands
2000 Netherlands
2008 Poland
2000 Spain
1999 Spain
2007 Sweden
2005 Sweden
2007 UK
2006 UK
2005 UK
2004 UK
2003 UK
2002 UK
2001 UK
2000 UK
1999 UK
1998 UK
1997 UK
Resources & support
FAQs
Technical support
NetCourses
Short courses
Users Group meetings
Statalist
Links
Software updates
Software archives
Customer service
Manuals & supplements
Stata Journal
STB
Stata News
Stata Automation
Plugins

Site overview
Products
Resources & support
Company
Site index

© Copyright 1996–2008 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index