[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: counting the cases for which the obervations of two different variables are equal

From   "Nick Cox" <>
To   <>
Subject   st: RE: counting the cases for which the obervations of two different variables are equal
Date   Wed, 26 Mar 2008 20:29:51 -0000

Expressions of the form a == b == c are not necessarily illegal but they
won't reliably do what you want here. 

You want 

a == b == c 

to mean 

(a == b) & (b == c) 

but to Stata == is a binary operator and this interpretation does not
hold. Otherwise put, two == within a trio of arguments don't define a
ternary operator. 

Variables aside, the main issue can be seen by considering

. di 1 == 1 == 1

. di 0 == 0 == 0

The first behaves as you are hoping, but not the second. Why? 

Let's guess that Stata evaluates left to right here. Even if that's the
wrong way round, 
the examples will come out the same. Then 

1 == 1 == 1 

is treated as 

(1 == 1) == 1 

which is 1 == 1, which is 1. But 

0 == 0 == 0 

is treated as 

(0 == 0) == 0 

which is 1 == 0, which is 0. 

There are various ways forward that I can suggest. One is that you have
to spell out all compound true and false statements in terms of atomic
binary comparisons using (e.g.) & as well as ==. 

Another is to use some quite different approach. For example,
consistency of y within groups of x is explored by 

. bysort y (x) : gen same_x = x[1] == x[_N] 

Another is to tag duplicates using -duplicates- and then look at the

There is more at 

How do I list observations in a group that differ on a variable?

How do I compute the number of distinct observations?

and also various other data management FAQs.



I would like to count the number of cases for which the observations
two different variables are equal.
That is: Suppose you have a panel dataset where the variable 'respondent

number' repeats itself for some years (because the same person is
for several years) and therefore the variable 'sex'  repeats itself too.

So if in the column of respndent number it reads for two obervations:


then the corresponding column the sex reads


Ok now I would like to check if the 'sex' variable is consistent in the 
data. To do this i use the following command:

count if respnr[_n] == respnr[_n+1] & aa001[_n] == aa001[_n+1]

and it returns me a number, which is fine.

This command returns me the number of cases where a respondent is
for any two years and the sex of the respondent was same for those two 

Now, I also ask the same question for '3 consecutive' years and use the 
following command:

count if respnr[_n] == respnr[_n+1] == respnr[_n+2] & aa001[_n] == 
aa001[_n+1] == aa001[_n+2]

Although I can see in the data browser that this condition holds for
observations, stata returns "0" cases.
That is, although I determine cases where an individual is observed for 
three years and his sex is male for those years,
stat does not see that and return "0".


*   For searches and help try:

© Copyright 1996–2015 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index