Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: counting the cases for which the obervations of two different variables are equal


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   st: RE: counting the cases for which the obervations of two different variables are equal
Date   Wed, 26 Mar 2008 20:29:51 -0000

Expressions of the form a == b == c are not necessarily illegal but they
won't reliably do what you want here. 

You want 

a == b == c 

to mean 

(a == b) & (b == c) 

but to Stata == is a binary operator and this interpretation does not
hold. Otherwise put, two == within a trio of arguments don't define a
ternary operator. 

Variables aside, the main issue can be seen by considering

. di 1 == 1 == 1
1

. di 0 == 0 == 0
0

The first behaves as you are hoping, but not the second. Why? 

Let's guess that Stata evaluates left to right here. Even if that's the
wrong way round, 
the examples will come out the same. Then 

1 == 1 == 1 

is treated as 

(1 == 1) == 1 

which is 1 == 1, which is 1. But 

0 == 0 == 0 

is treated as 

(0 == 0) == 0 

which is 1 == 0, which is 0. 

There are various ways forward that I can suggest. One is that you have
to spell out all compound true and false statements in terms of atomic
binary comparisons using (e.g.) & as well as ==. 

Another is to use some quite different approach. For example,
consistency of y within groups of x is explored by 

. bysort y (x) : gen same_x = x[1] == x[_N] 

Another is to tag duplicates using -duplicates- and then look at the
others. 

There is more at 

How do I list observations in a group that differ on a variable?
http://www.stata.com/support/faqs/data/diff.html

How do I compute the number of distinct observations?
http://www.stata.com/support/faqs/data/distinct.html

and also various other data management FAQs.

Nick
[email protected] 

minimus

I would like to count the number of cases for which the observations
inside 
two different variables are equal.
That is: Suppose you have a panel dataset where the variable 'respondent

number' repeats itself for some years (because the same person is
observed 
for several years) and therefore the variable 'sex'  repeats itself too.

So if in the column of respndent number it reads for two obervations:

1011
1011

then the corresponding column the sex reads

male
male
.

Ok now I would like to check if the 'sex' variable is consistent in the 
data. To do this i use the following command:

count if respnr[_n] == respnr[_n+1] & aa001[_n] == aa001[_n+1]

and it returns me a number, which is fine.

This command returns me the number of cases where a respondent is
observed 
for any two years and the sex of the respondent was same for those two 
years.

Now, I also ask the same question for '3 consecutive' years and use the 
following command:

count if respnr[_n] == respnr[_n+1] == respnr[_n+2] & aa001[_n] == 
aa001[_n+1] == aa001[_n+2]

Although I can see in the data browser that this condition holds for
many 
observations, stata returns "0" cases.
That is, although I determine cases where an individual is observed for 
three years and his sex is male for those years,
stat does not see that and return "0".

Why?


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index