Hi everyone,
I have the following data:
Id date var
A 1.1.90 10.1
A 1.2.90 11.2
A 1.3.90 12.3
...
A 1.11.04 3.1
A 1.12.04 4.2
A 1.1.05 4.2
A 1.2.05 4.2
A 1.3.05 4.2
... (only -date- changes, with var fixed at 4)
B 1.1.92 100
B 1.2.92 110
B 1.3.92 120
...
B 1.11.03 30.1
B 1.12.03 40.5
B 1.1.04 40.5
B 1.2.04 40.5
.. (only -date- changes, with var fixed at 40) When -var- becomes fixed, it
means that id stopped being updated. Given that I have thousands of -id- the
task of checking this one by one is cumbersome. One way of determining this
is to, for each observation for each Id, calculate the average of the
remaining values and check if this average is the same as the value in var.
I did the following:
. gsort id -date
. by id: gen n=_n
. by id: gen sum=sum(var)
. by id: gen avg=sum/n
. sort id date
. by id: gen ddate=1 if avg==var
Given that ddate returned all the values as missing values, I took the
difference between avg and var:
. drop ddate
. gen diff=avg-var
When checking the results in diff I realized that diff yielded values close
to 0 but not 0 (something like 8.179e-07). Even with the last value, when
avg is actually equal to var the result was something in the line of
8.179e-07 (for instance: var=111.2499, sum=111.2499, avg=111.2499, n=1, and
diff=8.179e-07). I understand that 8.179e-07 is close 0, and I could do
something like:
. replace diff=0 if abs(diff)<0.00001
But I'm afraid I could lose some observations. Any ideas about the reasons
for this to happen and how to solve this? The values for var are truncated
to 4 decimal points by database download.
Kind regards,
Nuno
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/