Stata 15 help for datasignature

[D] datasignature -- Determine whether data have changed



datasignature set [, reset ]

datasignature confirm [, strict ]

datasignature report

datasignature set, saving(filename[, replace]) [ reset ]

datasignature confirm using filename [, strict ]

datasignature report using filename

datasignature clear


Data > Other utilities > Manage data signature


These commands calculate, display, save, and verify checksums of the data, which taken together form what is called a signature. An example signature is 162:11(12321):2725060400:4007406597. That signature is a function of the values of the variables and their names, and thus the signature can be used later to determine whether a dataset has changed.

datasignature without arguments calculates and displays the signature of the data in memory.

datasignature set does the same, and it stores the signature as a characteristic in the dataset. You should save the dataset afterward so that the signature becomes a permanent part of the dataset.

datasignature confirm verifies that, were the signature recalculated this instant, it would match the one previously set. datasignature confirm displays an error message and returns a nonzero return code if the signatures do not match.

datasignature report displays a full report comparing the previously set signature to the current one.

In the above, the signature is stored in the dataset and accessed from it. The signature can also be stored in a separate, small file.

datasignature set, saving(filename) calculates and displays the signature and, in addition to storing it as a characteristic in the dataset, also saves the signature in filename.

datasignature confirm using filename verifies that the current signature matches the one stored in filename.

datasignature report using filename displays a full report comparing the current signature with the one stored in filename.

In all the above, if filename is specified without an extension, .dtasig is assumed.

datasignature clear clears the signature, if any, stored in the characteristics of the dataset in memory.


reset is used with datasignature set. It specifies that even though you have previously set a signature, you want to erase the old signature and replace it with the current one.

strict is for use with datasignature confirm. It specifies that, in addition to requiring that the signatures match, you also wish to require that the variables be in the same order and that no new variables have been added to the dataset. (If any variables were dropped, the signatures would not match.)

saving(filename[, replace]) is used with datasignature set. It specifies that, in addition to storing the signature in the dataset, you want a copy of the signature saved in a separate file. If filename is specified without a suffix, .dtasig is assumed. The replace suboption allows filename to be replaced if it already exists.


Example 1: Verification at a distance

You load the data and type

. datasignature 74:12(71728):3831085005:1395876116

Your coworker does the same with his or her copy. You compare the two signatures.

Example 2: Protecting yourself from yourself

You load the data and type

. datasignature set 74:12(71728):3831085005:1395876116 (data signature set)

. save, replace

From then on, you periodically type

. datasignature confirm (data unchanged since 19feb2017 14:24)

One day, however, you check and see the message:

. datasignature confirm (data unchanged since 19feb2017 14:24, except 2 variables have been added)

You can find out more by typing

. datasignature report (data signature set on Monday 19feb2017 14:24)

Data signature summary

1. Previous data signature 74:12(71728):3831085005:1395876116 2. Same data signature today (same as 1) 3. Full data signature today 74:14(113906):1142538197:2410350265

Comparison of current data with previously set data signature

Variables No. Notes ------------------------------------------------------------ Original # of variables 12 (values unchanged) Added variables 2 (note 1) Dropped variables 0 ------------------------------------------------------------ Resulting # of variables 14

(1) Added variables are agesquared logincome.

You could now either drop the added variables or decide to incorporate them:

. datasignature set data signature already set -- specify option -reset- r(110)

. datasignature set, reset 74:14(113906):1142538197:2410350265 (data signature reset)

Concerning the detailed report, three data signatures are reported: 1) the stored signature, 2) the signature that would be calculated today on the basis of the same variables in their original order, and (3) the signature that would be calculated today on the basis of all the variables and in their current order.

datasignature confirm knew that new variables had been added because 1) was equal to 2). If some variables had been dropped, however, datasignature confirm would not be able to determine whether the remaining variables had changed.

Example 3: Working with assistants

You give your dataset to an assistant to have variable labels and the like added. You wish to verify that the returned data are the same data.

Saving the signature with the dataset is inadequate here. Your assistant, having your dataset, could change both your data and the signature and might even do that in a desire to be helpful. The solution is to save the signature in a separate file that you do not give to your assistant:

. datasignature set, saving(mycopy) 74:12(71728):3831085005:1395876116 (data signature set) (file mycopy.dtasig saved)

You keep file mycopy.dtasig. When your assistant returns the dataset to you, you use it and compare the current signature to what you have stored in mycopy.dtasig:

. datasignature confirm using mycopy (data unchanged since 19feb2017 15:05)

By the way, the signature is a function of the following:

o The number of observations and number of variables in the data

o The values of the variables

o The names of the variables

o The order in which the variables occur in the dataset

o The storage types of the individual variables

The signature is not a function of variable labels, value labels, notes, and the like.

Example 4: Working with shared data

You work on a dataset served on a network drive, which means that others could change the data. You wish to know whether this occurs.

The solution here is the same as working with an assistant: you save the signature in a separate, private file on your computer,

. datasignature set, saving(private) 74:12(71728):3831085005:1395876116 (data signature set) (file private.dtasig saved)

and then you periodically check the signature by typing

. datasignature confirm using private (data unchanged since 15mar2017 11:22)

Stored results

datasignature without arguments and datasignature set store the following in r():

Macros r(datasignature) the signature

datasignature confirm stores the following in r():

Scalars r(k_added) number of variables added

Macros r(datasignature) the signature

datasignature confirm aborts execution if the signatures do not match and so then returns nothing except a return code of 9.

datasignature report stores the following in r():

Scalars r(datetime) %tc date-time when set r(changed) . if r(k_dropped)!=0, otherwise 0 if data have not changed, 1 if data have changed r(reordered) 1 if variables reordered, 0 if not reordered, . if r(k_added)!=0 | r(k_dropped)!=0 r(k_original) number of original variables r(k_added) number of added variables r(k_dropped) number of dropped variables Macros r(origdatasignature) original signature r(curdatasignature) current signature on same variables, if it can be calculated r(fulldatasignature) current full-data signature r(varsadded) variable names added r(varsdropped) variable names dropped

datasignature clear stores nothing in r() but does clear it.

datasignature set stores the signature in the following characteristics:

Characteristic _dta[datasignature_si] signature _dta[datasignature_dt] %tc date-time when set in %21x format _dta[datasignature_vl1] part 1, original variables _dta[datasignature_vl2] part 2, original variables, if necessary etc.

To access the original variables stored in _dta[datasignature_vl1], etc., from an ado-file, code

mata: ado_fromlchar("vars", "_dta", "datasignature_vl")

Thereafter, the original variable list would be found in `vars'.

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index