# st: RE: counting the cases for which the obervations of two different variables are equal

 From "Nick Cox" To Subject st: RE: counting the cases for which the obervations of two different variables are equal Date Wed, 26 Mar 2008 20:29:51 -0000

```Expressions of the form a == b == c are not necessarily illegal but they
won't reliably do what you want here.

You want

a == b == c

to mean

(a == b) & (b == c)

but to Stata == is a binary operator and this interpretation does not
hold. Otherwise put, two == within a trio of arguments don't define a
ternary operator.

Variables aside, the main issue can be seen by considering

. di 1 == 1 == 1
1

. di 0 == 0 == 0
0

The first behaves as you are hoping, but not the second. Why?

Let's guess that Stata evaluates left to right here. Even if that's the
wrong way round,
the examples will come out the same. Then

1 == 1 == 1

is treated as

(1 == 1) == 1

which is 1 == 1, which is 1. But

0 == 0 == 0

is treated as

(0 == 0) == 0

which is 1 == 0, which is 0.

There are various ways forward that I can suggest. One is that you have
to spell out all compound true and false statements in terms of atomic
binary comparisons using (e.g.) & as well as ==.

Another is to use some quite different approach. For example,
consistency of y within groups of x is explored by

. bysort y (x) : gen same_x = x[1] == x[_N]

Another is to tag duplicates using -duplicates- and then look at the
others.

There is more at

How do I list observations in a group that differ on a variable?
http://www.stata.com/support/faqs/data/diff.html

How do I compute the number of distinct observations?
http://www.stata.com/support/faqs/data/distinct.html

and also various other data management FAQs.

Nick
n.j.cox@durham.ac.uk

minimus

I would like to count the number of cases for which the observations
inside
two different variables are equal.
That is: Suppose you have a panel dataset where the variable 'respondent

number' repeats itself for some years (because the same person is
observed
for several years) and therefore the variable 'sex'  repeats itself too.

So if in the column of respndent number it reads for two obervations:

1011
1011

then the corresponding column the sex reads

male
male
.

Ok now I would like to check if the 'sex' variable is consistent in the
data. To do this i use the following command:

count if respnr[_n] == respnr[_n+1] & aa001[_n] == aa001[_n+1]

and it returns me a number, which is fine.

This command returns me the number of cases where a respondent is
observed
for any two years and the sex of the respondent was same for those two
years.

Now, I also ask the same question for '3 consecutive' years and use the
following command:

count if respnr[_n] == respnr[_n+1] == respnr[_n+2] & aa001[_n] ==
aa001[_n+1] == aa001[_n+2]

Although I can see in the data browser that this condition holds for
many
observations, stata returns "0" cases.
That is, although I determine cases where an individual is observed for
three years and his sex is male for those years,
stat does not see that and return "0".

Why?

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```