# st: Programing "by" and creating a new dataset

 From Cameron Hooper To statalist@hsphsun2.harvard.edu Subject st: Programing "by" and creating a new dataset Date Wed, 09 Mar 2005 16:41:04 -0500

Consider the following data:

. use http://www-personal.umich.edu/~chooper/stata/kptest1
. list

+---------------------+
| id year x1 x2 |
|---------------------|
1. | 1 1990 10 8 |
2. | 1 1990 7 11 |
3. | 1 1990 6 4 |
4. | 1 1990 9 12 |
5. | 1 1990 8 6 |
+---------------------+
.

I want to create a new variable based on the pairwise comparisons of -sign(x2 - x1)-. (I've included a brief explanation of these comparison as a postscript to this post. I don't think it is necessary to read it to understand my question.)

To achieve this I've written the following program. I don't think it is important to follow the code inside the -forvalues- loop.

-------------------------------------

capture program drop kpmetric
program kpmetric , rclass
version 8
tempvar chng
generate `chng' = sign(x2 - x1)
qui count
local n = r(N)
local n1 = r(N) - 1
local p = 0

forvalues i =1/`n1' {
local k = `i' + 1
forvalues j = `k'/`n' {

local opposite = `chng'[`i'] != `chng'[`j'] & \\\
`chng'[`i'] != 0 & `chng'[`j'] != 0
local flip = sign(x1[`i'] - x1[`j']) != sign(x2[`i'] - x2[`j'])
local diverge = abs(x2[`i'] - x2[`j']) > abs(x1[`i'] - x1[`j'])

if `opposite' & (`flip' | `diverge') {
local p = `p' + 1
}
}
}

local p = `p'/comb(`n',2)
return scalar p = `p'
end

--------------------------------------------------------------------------
Here as a sample run:

. use http://www-personal.umich.edu/~chooper/stata/kptest1
. kpmetric
. return list

scalars:
r(p) = .6

In practice my data is more complex. I have multiple years and companies so I need to make my program understand the -by- prefix. I also need a way of capturing the results of the program. This is what I would want to achieve:

. use http://www-personal.umich.edu/~chooper/stata/kptest2
. sort id year
. list

+---------------------+
| id year x1 x2 |
|---------------------|
1. | 1 1990 10 8 |
2. | 1 1990 7 11 |
3. | 1 1990 6 4 |
4. | 1 1990 9 12 |
5. | 1 1990 8 6 |
|---------------------|
6. | 1 1991 5 8 |
7. | 1 1991 7 9 |
8. | 1 1991 4 4 |
9. | 2 1988 2 6 |
10. | 2 1988 5 3 |
|---------------------|
11. | 2 1988 9 7 |
12. | 2 1988 4 1 |
13. | 2 1989 7 7 |
14. | 2 1989 8 8 |
15. | 2 1989 3 3 |
+---------------------+

. by id year: kpmetric
. list

id year p n
1. 1 1990 0.60 5
2. 1 1991 0.00 3
3. 2 1988 0.33 4
4. 2 1989 0.00 3

Any suggestions on how I can achieve this?

Thanks,

Cameron

PS. For those interested in what this code is trying to achieve here is a brief explanation. Consider the following data:

analyst f1 f2
1 10 12
2 11 9
3 8 10

This represents the forecasts of three analysts. f1 is forecast at time 1 and f2 is the revised forecast at time 2, after the analysts' observe a publicly available signal. Define a measure of differential interpretations of the public signal to be the proportion of inconsistent revisions. Inconsistent revision move in opposite directions AND either flip or end up further apart they the original forecasts.For example, analyst 1 revised his forecast upwards (from 10 to 12) while analyst 2 revised her forecast downwards (11 to 9). Since the revisions are in opposite directions and cross they are defined as inconsistent. In contrast both analysts 1 and 3 revise their forecasts upwards and are thus consistent revisions. While analysts 2 and 3 revise in different directions, they are not inconsistent because they converge as a result of observing the common signal. Of the 3 analyst pairs only 1 is inconsistent (1v2) so the value of the differential interpretations metrics equals 1/3 (#inconsistent pairs / total pairs).

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/