Stata 11 help for xtset

help xtset dialog: xtset > } -------------------------------------------------------------------------------

Title

[XT] xtset -- Declare data to be panel data

Syntax

Declare data to be panel

xtset panelvar

xtset panelvar timevar [, tsoptions]

Display how data are currently xtset

xtset

Clear xt settings

xtset, clear

In the declare syntax, panelvar identifies the panels and the optional timevar identifies the times within panels. tsoptions concern timevar.

tsoptions description ------------------------------------------------------------------------- unitoptions specify units of timevar deltaoption specify periodicity of timevar -------------------------------------------------------------------------

unitoptions description ------------------------------------------------------------------------- (default) obtain timevar's units from timevar's display format

clocktime timevar is %tc, 0 = 1jan1960 00:00:00.000, 1 = 1jan1960 00:00:00.001, ... daily timevar is %td, 0 = 1jan1960, 1 = 2jan1960, ... weekly timevar is %tw, 0 = 1960w1, 1 = 1960w2, ... monthly timevar is %tm, 0 = 1960m1, 1 = 1960m2, ... quarterly timevar is %tq, 0 = 1960q1, 1 = 1960q2, ... halfyearly timevar is %th, 0 = 1960h1, 1 = 1960h2, ... yearly timevar is %ty, 1960 = 1960, 1961 = 1961, ... generic timevar is %tg, 0 = ?, 1 = ?, ...

format(%fmt) specify timevar's format and then apply default rule ------------------------------------------------------------------------- In all cases, negative timevar values are allowed.

deltaoption specifies the period between observations in timevar units and may be specified as

deltaoption example ------------------------------------------------------------------------- delta(#) delta(1) or delta(2) delta((exp)) delta((7*24)) delta(# units) delta(7 days) or delta(15 minutes) or delta(7 days 15 minutes) delta((exp) units) delta((2+3) weeks) -------------------------------------------------------------------------

Allowed units for %tc and %tC timevars are

------------------------------ seconds secs sec minutes mins min hours hour days day weeks week ------------------------------

and for all other %t timevars are

------------------------------ days day weeks week ------------------------------

Menu

Statistics > Longitudinal/panel data > Setup and utilities > Declare dataset to be panel data

Description

xtset declares the data in memory to be a panel. You must xtset your data before you can use the other xt commands. If you save your data after xtset, the data will be remembered to be a panel and you will not have to xtset again.

There are two syntaxes for setting the data:

xtset panelvar

xtset panelvar timevar

In the first syntax -- xtset panelvar -- the data are set to be a panel and the order of the observations within panel is considered to be irrelevant. For instance, panelvar might be country and the observations within be city.

In the second syntax -- xtset panelvar timevar -- the data are to be a panel and the order of observations within panel are considered ordered by timevar. For instance, in data collected from repeated surveying of the same people over various years, panelvar might be person and timevar, year. When you specify timevar, you may then use Stata's time-series operators such as L. and F. (lag and lead) in other commands. The operators will be interpreted as lagged and lead values within panel.

xtset without arguments -- xtset -- displays how the data are currently xtset. If the data are set with a panelvar and a timevar, xtset also sorts the data by panelvar timevar. If the data are set with a panelvar only, the sort order is not changed.

xtset, clear is a rarely used programmer's command to declare that the data are no longer to be considered a panel.

Options

unitoptions clocktime, daily, weekly, monthly, quarterly, halfyearly, yearly, generic, and format(%fmt) specify the units in which timevar is recorded, if timevar is specified.

timevar will often simply be a variable that counts 1, 2, ..., and is to be interpreted as first year of survey, second year, ..., or first month of treatment, second month, .... In these cases, you do not need to specify a unitoption.

In other cases, timevar will be a year variable or the like such as 2001, 2002, ..., and is to be interpreted as year of survey or the like. In those cases, you do not need to specify a unitoption.

In still other, more complicated cases, timevar will be a full blown %t variable; see [D] dates and times. If timevar already has a %t display format assigned to it, you do not need to specify a unitoption; xtset will obtain the units from the format. If you have not yet bothered to assign the appropriate %t format to the %t variable, however, you can use the unitoptions to tell xtset the units. Then xtset will set timevar's display format for you. Thus, the unitoptions are convenience options; they allow you to skip formatting the time variable. The following all have the same net result:

Alternative 1 Alternative 2 Alternative 3 -------------------------------------------------------------- format t %td (t not formatted) (t not formatted) xtset pid t xtset pid t, daily xtset pid t, format(%td)

timevar is not required to be a %t variable; it can be any variable of your own concocting so long as it takes on integer values. When you xtset a time variable that is not %t, the display format does not change unless you specify the unitoption generic or use the format() option.

delta() specifies the periodicity of timevar and is commonly used when timevar is %tc. delta() is only sometimes used with the other %t formats or with generic time variables.

If delta() is not specified, delta(1) is assumed. This means that at timevar = 5, the previous time is timevar = 5-1=4 and the next time would be timevar = 5+1=6. Lag and lead operators, for instance, would work this way. This would be assumed regardless of the units of timevar.

If you specified delta(2), then at timevar = 5, the previous time would be timevar = 5-2 = 3 and the next time would be timevar = 5+2 = 7. Lag and lead operators would work this way. In an observations with timevar = 5, L.income would be the value of income in the observation for which timevar = 3 and F.income would be the value of income in the observation for which timevar = 7. If you then add an observation with timevar=4, the operators will still work appropriately; i.e., at timevar = 5, L.income will still have the value of income at timevar = 3.

There are two aspects of timevar: its units and its periodicity. The unitoptions set the units. delta() sets the periodicity. You are not required to specify one to specify the other. You might have a generic timevar but it counts in 12: 0, 12, 24, .... You would skip specifying unitoptions but would specify delta(12).

We mentioned that delta() is commonly used with %tc timevars because Stata's %tc variables have units of milliseconds. If delta() is not specified and in some model you refer to L.price, you will be referring to the value of price 1 ms ago. Few people have data with periodicity of a millisecond. Perhaps your data is hourly. You could specify delta(3600000). Or you could specify delta((60*60*1000)), because delta() will allow expressions if you include an extra pair of parentheses. Or you could specify delta(1 hour). They all mean the same thing: timevar has periodicity of 3,600,000 ms. In an observation for which timevar = 1,489,572,000,000 (corresponding to 15mar2007 10:00:00), L.price would be the observation for which timevar = 1,489,572,000,000 - 3,600,000 = 1,489,568,400,000 (corresponding to 15mar2007 9:00:00).

When you xtset the data and specify delta(), xtset verifies that all the observations follow the specified periodicity. For instance, if you specified delta(2), then timevar could contain any subset of {..., -4, -2, 0, 2, 4, ...}. or it could contain any subset of {..., -3, -1, 1, 3, ...}. If timevar contained a mix of values, xtset would issue an error message. The check is made on each panel independently, so one panel might contain timevar values from one set and the next, another, and that would be fine.

clear -- used in xtset, clear -- makes Stata forget that the data ever were xtset. This is a rarely used programmer's option.

Examples

For a panel dataset with no time variable such as a dataset with variable country and observations on cities within country, type

. xtset country

Variable country must be numeric. If the variable is string, type

. egen cntry = group(country) . xtset cntry or . encode country, gen(cntry) . xtset cntry

The first will generate numeric variable cntry containing 1, 2, ..., for the various countries. The second will do the same but will also create a value label and label the new variable, so that when you list the variable, it will look like the original.

For an annual panel dataset such as a dataset with variables country and year, type

. xtset country year or . xtset country year, yearly

It makes little difference which you use, only the output will be formatted differently. In the second case, variable year must contain values such as 1990 and 2006. In the first case, year may contain any year encoding, including 1990 and 2006.

For a quarterly panel on company and quarter, type

. xtset company quarter

If quarter is encoded 1=1960q1, 2=1960q2, etc., you may type

. xtset company quarter, quarterly

and output will look better.

For a daily time panel, pid is the numeric person identification number and date is a %td variable and already has been assigned a %td format, type

. xtset pid date

If date has not yet been given a format:

. format date %td . xtset pid date or . xtset pid date, daily

For an hourly panel, pid is the patient ID and tod a %tc variable:

. xtset pid tod, clocktime delta(1 hour)

If time already had a %tc display format, the above could be reduced to

. xtset pid tod, delta(1 hour)

Saved results

xtset saves the following in r():

Scalars r(imin) minimum panel ID r(imax) maximum panel ID r(tmin) minimum time r(tmax) maximum time r(tdelta) delta

Macros r(panelvar) name of panel variable r(timevar) name of time variable r(tdeltas) formatted delta r(tmins) formatted minimum time r(tmaxs) formatted maximum time r(tsfmt) %fmt of time variable r(unit) units of time variable: Clock, clock, daily, weekly, monthly, quarterly, halfyearly, yearly, or generic r(unit1) units of time variable: C, c, d, w, m, q, h, y, or "" r(balanced) unbalanced, weakly balanced, or strongly balanced; a set of panels are strongly balanced if they all have the same time values, otherwise balanced if same number of time values, otherwise unbalanced

Also see

Manual: [XT] xtset

Help: [TS] tsset


© Copyright 1996–2009 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index