dataex -- Generate a properly formatted data example for Statalist
dataex is community-contributed software. It is written and maintained
by its authors, Robert Picard and Nicholas J. Cox. It is installed with
official Stata for the convenience of users posting on online forums such
as Statalist. It is also available in identical form from the
Statistical Software Components (SSC) archive (see help ssc).
dataex [varlist] [if] [in] [, varlabel elsewhere count(#)]
dataex is for producing a data example to include in a post on Statalist.
Make sure that you have read the FAQ before posting. Users who read your
post will be able to copy the code generated by dataex and re-create the
The input command is used to enter the data into Stata variables of the
same type as the original variables in memory. All numeric datetime
variables will be correctly formatted, and all numeric variables with
associated value labels will also be re-created. If the varlabel option
is specified, the results will include commands to regenerate all
Copy what is produced by dataex in the Stata Results window to your post
on Statalist. Make sure to include the [CODE] and [/CODE] lines. You
can use the Preview button, just to the left of the Post Reply button, to
verify within Statalist that the data example is correctly formatted.
The output produced by dataex may also be useful outside Statalist in
other forums, or even privately, say, in communicating with StataCorp
technical support. In other forums or privately, the [CODE] and [/CODE]
lines will not be useful and may be omitted. As a convenience, the
option elsewhere may be used to suppress display of such lines.
General advice on posting example data includes the following:
1. It should be evident that readers can understand your dataset
only to the extent that you explain it clearly. A detailed
verbal explanation is likely to be too long to read and too hard
for readers to absorb. So, use examples!
2. Aim for a minimal, complete, and verifiable example.
3. The word "minimal" underlines that small examples (say, 5 to 10
observations) may be quite sufficient to explain your data
structure, variable types, and names. It is also true that your
example should be "complete" enough to make your question clear.
By providing data that you have used, you make your question
4. Even if you use a mutually accessible dataset (say, one read in
with sysuse or webuse), providing code that others can run
quickly will be very helpful.
dataex is not offered as a "one size fits all" solution to providing
example data. Depending on your problem, explaining other facts about
your dataset may be crucial, say, on its size, what you have tsset or
xtset, and so forth.
varlabel specifies that commands to produce variable labels are also to
elsewhere indicates that your example is for use somewhere other than
Statalist. Display of CODE delimiters intended for Statalist will
therefore be suppressed.
count(#) specifies a limit to the number of observations listed. The
default is count(100).
Prepare a small example from the standard auto dataset.
. sysuse auto
. dataex make price mpg rep78 in 1/5
You present the variables in the order you want. If some variables have
value labels, the results will include commands to re-create them.
. dataex make rep78 price foreign if rep78 == 5
You can use the varlabel option to include commands to regenerate
. dataex make rep78 price foreign if rep78 == 5, var
Numeric datetime variables will also be correctly formatted. In the
following example, the daily date variable date is regenerated using
Stata's internal numeric values and then formatted using the %td format.
The next example shows a quarterly date variable.
. sysuse sp500
. dataex in 1/5
. sysuse gnp96
. dataex in 1/5
If the dataset is large, consider choosing a random sample. The
following example uses randomtag (from SSC) to select 10 random
. ssc install randomtag
. sysuse icd9_cod.dta, clear
. randomtag if length(__code9) == 4, count(10) gen(pick)
. dataex __code9 __desc9 if pick
Many thanks to William Lisowski for his observation that some users may
inadvertently trigger a large data dump and for his thoughtful
suggestions on how to handle the issue.
Nicholas J. Cox, Durham University, UK
SSC: listsome, randomtag
Help: [D] input, [D] data types, [D] datetime, [D] label, [D] encode,