[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Counting unique values across a set of variables: Re-sent

From	"Nick Cox" <[email protected]>
To	<[email protected]>
Subject	st: RE: Counting unique values across a set of variables: Re-sent
Date	Mon, 24 May 2004 17:15:40 +0100

I think this is easiest through a 

-reshape- 
do stuff 
-reshape- 

sequence, otherwise known as the Stata twostep. 

First we -rename- variables, so that 
they have a common prefix, say 

foreach v of var B-Y { 
	rename `v' S_`v' 
} 

Then we -reshape- to long: 

reshape long S_ , i(A) string 

Now our count of distinct strings is 

bysort A S_ : gen Z = _n == 1 
by A : replace Z = sum(Z) 
by A : replace Z = Z[_N] 

Now we -reshape- back 

reshape wide S_ , i(A) string 

and then -Z- is an extra variable
in the dataset. 

Note that this counts "." 
as a value like any other. (And 
indeed also "", " ", "  ", etc.) 

If you want to subtract 1 because "." 
is not of interest that one 
way to do that is 

gen countperiod = 0 
foreach v of var B-Y { 
	replace countperiod = countperiod + (`v' == ".") 
} 

replace Z = Z - (countperiod > 0) 

Nick 
[email protected] 

> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]]On Behalf Of CM
> Sent: 24 May 2004 16:53
> To: [email protected]
> Subject: st: Counting unique values across a set of variables: Re-sent
> 
> 
> Hi all,
> 
> I checked findit but don't believe I found what I
> need.
> 
> Each row in my data represents a respondent.  Besides
> the first column "A" representing ID, the other
> columns (call them B thru Y) contain strings or "."  I
> need to create a variable in column Z that counts the
> number of unique strings found for any given
> respondent in B thru Y.  Advice?
> 
> Thanks in advance,
> CM

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Prev by Date: st: unobserved heterogeneity as a fn of the X's in hazards
Next by Date: st: RE: RE: Counting unique values across a set of variables: Re-sent
Previous by thread: st: unobserved heterogeneity as a fn of the X's in hazards
Next by thread: st: RE: RE: Counting unique values across a set of variables: Re-sent
Index(es):
- Date
- Thread