|Speaker: Michael Hills|
In Stata longitudinal data are usually coded long, that is to say each set of measurements at each new time point constitutes a new record, and the set of all records for a subject share the same subject id. When exploring such data interactively most simple operations refer to records, but often the answers required are those referring to subjects. The most obvious example is how many subjects are there? This is the same as the number of unique codes for subject id, and is returned by the Stata command codebook id, along with much else. A simple alternative is the new command unique id which generalizes to unique id visit, for example, which reports the number of unique combinations of id and visit. In general, the command
. unique varlist, by(varname) gen(newvar)
will give the number of unique combinations of varlist. When the by is present it creates a new variable newvar, which contains the number of unique combinations of varlist for each level of varname. For example,
. unique job, by(id) gen(jobvar)
reports the overall number of unique values for the variable job, and creates the variable jobvar which contains the number of different job codes for each subject.
Slightly more complex questions take the form: how many records satisfy the condition C, where C refers to a single variable. An example is the condition height == . . The command longch takes the form
. longch id, c(height == .)
where id is the subject id variable name and c( ) contains the condition. The output looks like this:
71 records fulfill the condition height == . some : 46 subjects have height == . in at least one record none : 51 subjects have height == . in no records every: 0 subjects have height == . in every record
In addition three logical variables called _some, _none, and _every, are created for convenience in further manipulation (e.g. dropping or keeping records). These flag all records belonging to subjects with some records satisfying the condition, and so on.