I think this is easiest through a
-reshape-
do stuff
-reshape-
sequence, otherwise known as the Stata twostep.
First we -rename- variables, so that
they have a common prefix, say
foreach v of var B-Y {
rename `v' S_`v'
}
Then we -reshape- to long:
reshape long S_ , i(A) string
Now our count of distinct strings is
bysort A S_ : gen Z = _n == 1
by A : replace Z = sum(Z)
by A : replace Z = Z[_N]
Now we -reshape- back
reshape wide S_ , i(A) string
and then -Z- is an extra variable
in the dataset.
Note that this counts "."
as a value like any other. (And
indeed also "", " ", " ", etc.)
If you want to subtract 1 because "."
is not of interest that one
way to do that is
gen countperiod = 0
foreach v of var B-Y {
replace countperiod = countperiod + (`v' == ".")
}
replace Z = Z - (countperiod > 0)
Nick
n.j.cox@durham.ac.uk
> Hi all,
>
> I checked findit but don't believe I found what I
> need.
>
> Each row in my data represents a respondent. Besides
> the first column "A" representing ID, the other
> columns (call them B thru Y) contain strings or "." I
> need to create a variable in column Z that counts the
> number of unique strings found for any given
> respondent in B thru Y. Advice?
>
> Thanks in advance,
> CM
