Stata 15 help for dataex


dataex -- Generate a properly formatted data example for Statalist


dataex is community-contributed software. It is written and maintained by its authors, Robert Picard and Nicholas J. Cox. It is installed with official Stata for the convenience of users posting on online forums such as Statalist. It is also available in identical form from the Statistical Software Components (SSC) archive (see help ssc).


dataex [varlist] [if] [in] [, varlabel elsewhere count(#)]


dataex is for producing a data example to include in a post on Statalist. Make sure that you have read the FAQ before posting. Users who read your post will be able to copy the code generated by dataex and re-create the dataset shown.

The input command is used to enter the data into Stata variables of the same type as the original variables in memory. All numeric datetime variables will be correctly formatted, and all numeric variables with associated value labels will also be re-created. If the varlabel option is specified, the results will include commands to regenerate all variable labels.

Copy what is produced by dataex in the Stata Results window to your post on Statalist. Make sure to include the [CODE] and [/CODE] lines. You can use the Preview button, just to the left of the Post Reply button, to verify within Statalist that the data example is correctly formatted.


The output produced by dataex may also be useful outside Statalist in other forums, or even privately, say, in communicating with StataCorp technical support. In other forums or privately, the [CODE] and [/CODE] lines will not be useful and may be omitted. As a convenience, the option elsewhere may be used to suppress display of such lines.

General advice on posting example data includes the following:

1. It should be evident that readers can understand your dataset only to the extent that you explain it clearly. A detailed verbal explanation is likely to be too long to read and too hard for readers to absorb. So, use examples!

2. Aim for a minimal, complete, and verifiable example.

3. The word "minimal" underlines that small examples (say, 5 to 10 observations) may be quite sufficient to explain your data structure, variable types, and names. It is also true that your example should be "complete" enough to make your question clear. By providing data that you have used, you make your question "verifiable", too.

4. Even if you use a mutually accessible dataset (say, one read in with sysuse or webuse), providing code that others can run quickly will be very helpful.

dataex is not offered as a "one size fits all" solution to providing example data. Depending on your problem, explaining other facts about your dataset may be crucial, say, on its size, what you have tsset or xtset, and so forth.


varlabel specifies that commands to produce variable labels are also to be shown.

elsewhere indicates that your example is for use somewhere other than Statalist. Display of CODE delimiters intended for Statalist will therefore be suppressed.

count(#) specifies a limit to the number of observations listed. The default is count(100).


Prepare a small example from the standard auto dataset.

. sysuse auto . dataex make price mpg rep78 in 1/5

You present the variables in the order you want. If some variables have value labels, the results will include commands to re-create them.

. dataex make rep78 price foreign if rep78 == 5

You can use the varlabel option to include commands to regenerate variable labels.

. dataex make rep78 price foreign if rep78 == 5, var Numeric datetime variables will also be correctly formatted. In the following example, the daily date variable date is regenerated using Stata's internal numeric values and then formatted using the %td format. The next example shows a quarterly date variable.

. sysuse sp500 . dataex in 1/5 . sysuse gnp96 . dataex in 1/5 If the dataset is large, consider choosing a random sample. The following example uses randomtag (from SSC) to select 10 random observations.

. ssc install randomtag . sysuse icd9_cod.dta, clear . randomtag if length(__code9) == 4, count(10) gen(pick) . dataex __code9 __desc9 if pick


Many thanks to William Lisowski for his observation that some users may inadvertently trigger a large data dump and for his thoughtful suggestions on how to handle the issue.


Robert Picard

Nicholas J. Cox, Durham University, UK

Also see

SSC: listsome, randomtag

Help: [D] input, [D] data types, [D] datetime, [D] label, [D] encode, [D] list

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index