st: RE: Counting unique values across a set of variables: Re-sent

 From "Nick Cox" To Subject st: RE: Counting unique values across a set of variables: Re-sent Date Mon, 24 May 2004 17:15:40 +0100

```I think this is easiest through a

-reshape-
do stuff
-reshape-

sequence, otherwise known as the Stata twostep.

First we -rename- variables, so that
they have a common prefix, say

foreach v of var B-Y {
rename `v' S_`v'
}

Then we -reshape- to long:

reshape long S_ , i(A) string

Now our count of distinct strings is

bysort A S_ : gen Z = _n == 1
by A : replace Z = sum(Z)
by A : replace Z = Z[_N]

Now we -reshape- back

reshape wide S_ , i(A) string

and then -Z- is an extra variable
in the dataset.

Note that this counts "."
as a value like any other. (And
indeed also "", " ", "  ", etc.)

If you want to subtract 1 because "."
is not of interest that one
way to do that is

gen countperiod = 0
foreach v of var B-Y {
replace countperiod = countperiod + (`v' == ".")
}

replace Z = Z - (countperiod > 0)

Nick
n.j.cox@durham.ac.uk

> -----Original Message-----
> From: owner-statalist@hsphsun2.harvard.edu
> [mailto:owner-statalist@hsphsun2.harvard.edu]On Behalf Of CM
> Sent: 24 May 2004 16:53
> To: statalist@hsphsun2.harvard.edu
> Subject: st: Counting unique values across a set of variables: Re-sent
>
>
> Hi all,
>
> I checked findit but don't believe I found what I
> need.
>
> Each row in my data represents a respondent.  Besides
> the first column "A" representing ID, the other
> columns (call them B thru Y) contain strings or "."  I
> need to create a variable in column Z that counts the
> number of unique strings found for any given
> respondent in B thru Y.  Advice?
>