Describe data in current H2O frame

Syntax

    _h2oframe describe [columnlist] [, options]

columnlist is a list of column names in the H2O frame; see Specifying a list of columns for more information.

 options               Description
 -----------------------------------------------------------------------------------
 simple                display only column names
 fullnames             do not abbreviate column names
 numbers               display column number along with name
 replace               make dataset of description, not written report (the default)
 clear                 replace the data in memory; only valid with replace

 columnlist            programmer's option; store r(columnlist) in addition to
                         usual stored results
 -----------------------------------------------------------------------------------

Description

_h2oframe describe produces a summary of the data in the current H2O frame.

For a compact listing of column names, use _h2oframe describe, simple.

Options

simple displays only the column names in a compact format. simple may not be combined with other options.

fullnames specifies that _h2oframe describe display the full names of the columns. The default is to present an abbreviation when the column name is longer than 15 characters. The fullnames and numbers options may not be specified together.

numbers specifies that _h2oframe describe present the column number with the column name. If numbers is specified, column names are abbreviated when the name is longer than eight characters. The numbers and fullnames options may not be specified together.

replace and clear are alternatives to the options above. _h2oframe describe usually produces a written report, and the options above specify what the report is to contain. If you specify replace, however, no report is produced; the information about the H2O frame that the report would have presented will be loaded into Stata as the current dataset. Each observation of the new data describes a column in the H2O frame; see _h2oframe describe, replace below.

clear may be specified only when replace is specified. clear specifies that the data in memory be cleared and replaced with the description information, even if the original data have not been saved to disk.

columnlist, an option for programmers, specifies that r(columnlist) be stored in addition to the usual stored results. r(columnlist) will contain the names of the columns described.

Remarks

Remarks are presented under the following headings:

_h2oframe describe

If _h2oframe describe is typed without any column names, the contents of the data in the working H2O frame are described.

_h2oframe describe, replace

_h2oframe describe with the replace option is rarely used. _h2oframe describe, replace replaces the data in memory with a dataset in which each observation describes a column in the current H2O frame. The new variables are

  1. position, a variable containing the numeric position of the original column (1, 2, 3, …).

  2. column, a variable containing the name of the original column, such as “make”, “price”, “mpg”, and so on.

  3. type, a variable containing the storage type of the original column, such as “real”, “int”, “enum”, and “string”. See What is an H2O frame? for more information about the data types in an H2O frame.

  4. missing, a variable containing the number of missing values in the original column.

  5. zeros, a variable containing the number of zeros in the original column.

  6. pinf, a variable containing the number of values set to positive infinity in the original column.

  7. ninf, a variable containing the number of values set to negative infinity in the original column.

  8. cardinality, a variable containing the number of categorical levels in the original column if the column is type enum.

Examples

 Setup
     . sysuse auto
     . _h2oframe put, into(auto)
     . _h2oframe change auto

 Describe dataset in current H2O frame
     . _h2oframe describe

 Describe all columns whose names begin with t* for the current H2O frame
     . _h2oframe describe t*

 Describe dataset in current H2O frame, displaying full column names
     . _h2oframe describe, fullnames

 Replace the dataset in memory with meta information on the current H2O frame
     . _h2oframe describe, replace

Stored results

 _h2oframe describe stores the following in r():

 Scalars
   r(N)                number of rows in the H2O frame
   r(k)                number of columns in the H2O frame

 Macro
   r(columnlist)       columns described (if columnlist specified)

 _h2oframe describe, replace stores nothing in r().