Home  /  Products  /  Stata 8  /  What's new  /  Data management

This page contains only historical information and is not about the current release of Stata. Please see our features page for information on the current version of Stata.

Data-management features in Stata 8

New data-management features include

  • ODBC support (Stata for Windows)
  • 26 new missing-value codes (.a, .b, ..., .z)
  • More convenient syntax for generate
  • merge and append improved
  • tsappend added
  • more

ODBC support

New command odbc allows Stata for Windows to act as an ODBC client, meaning that you can fetch data directly from ODBC sources.

Stata 8 supports full SQL selection statements.

26 new missing-value codes

Stata now has multiple missing values! In addition to the previously existing ., there are now .a, .b, ..., .z, and you can attach value labels to the new missing codes!

Matrices can now contain missing values, both standard (.) and extended (.a, .b, ..., .z).

More convenient syntax for generate

Existing command generate has a new, more convenient syntax. Now you can type
        . generate a = 2 + 3
or
        . generate b = "this" + "that"
without specifying whether new variable b is numeric or a string of a particular length. If you wish, you can also type
        . generate str b = "this" + "that"
which asserts that b is a string but leaves it to generate to determine the length of the string. This is useful in programming situations because it helps to prevent bugs. Of course, you can continue to type
        . generate double a = _pi/2
and
        . generate str8 b = "this" + "that"

merge and append improved

Existing command merge has been improved:

  • New options unique, uniqmaster, and uniqusing ensure that the merge goes as you intend. These options amount to assertions that, if false, cause merge to stop. unique specifies that there not be repeated observations within match variables, and if you say ``merge id using myfile'', specifies that there be one observation per id value in the master data (the data in memory) and one observation per id in the using data. If observations are not unique, merge will complain.

    Options uniqmaster and uniqusing make the same claim for one or the other half of the merge; uniq is equivalent to specifying uniqmaster and uniqusing.

  • merge no longer limits the number of match (key) variables.

  • merge has new option keep(varlist) that specifies the variables to be kept from the using data.

    Similarly, keep(varlist) has been added to append.

tsappend added

New command tsappend appends observations in a time-series context. tsappend uses the information set by tsset, automatically fills in the time variable, and fills in the panel variable if the panel variable was set.

More

Other improvements include the following:

  • Existing command list has been completely redone. Not only is output far more readable — and even pretty — but programmers will want to use list to format tables.

  • New command isid verifies that a variable or set of variables uniquely identifies the observations and so is suitable for use with merge.

  • Existing command describe using will now allow you to specify a varlist, so you can check whether a variable exists in a dataset before merging or appending. Programmers will be interested in the new varlist option, which will leave in r() the names of the variables in the dataset.

  • Existing command codebook has new option problems to report potential problems in the data.

  • New command labelbook is like codebook, but is for value labels. In addition to providing documentation, the output includes a list of potential problems.

  • New command numlabel prefixes numerical values onto value labels and removes them. For example, the mapping 2 to ``Catholic'' becomes ``2. Catholic'' and vice versa.

  • New command duplicates reports, gives examples of, lists, browses, tags, and/or drops duplicate observations.

  • Existing command recode now allows a varlist rather than a varname, so several variables can be recoded at once.

  • Existing command recode has new option generate() to specify that the transformed variables be stored under different names than the originals.

  • Existing command recode has a new option prefix(), which is an alternative to generate, to specify that the transformed variables be given their original names with a prefix.

  • Existing command sort has new option stable indicating that within equal values of the sort keys, the observations appear in the same order as they did originally.

  • New command webuse loads the specified dataset, obtaining it over the web. By default, datasets are obtained from http://www.stata-press.com/data/r8/, but you can reset that.

  • New command sysuse loads the specified dataset that was shipped with Stata, plus any other datasets stored along the ado-path.

  • Existing command insheet has a new delimiter(char) option that allows you to specify an arbitrary character as the value separator.