Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: RE: Counting unique values across a set of variables: Re-sent


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: RE: Counting unique values across a set of variables: Re-sent
Date   Mon, 24 May 2004 17:34:52 +0100

Another way to do it: 

gen ZZ = 0 

qui forval i = 1 /`=_N' { 
	foreach v of var B-Y { 
		local list `"`list' `"`=`v'[`i']'"'"' 
		local uniq : list uniq list 
	} 
	replace ZZ = `: list sizeof uniq' in `i' 
	local list 
}

The single, double, and compound double quotes
require a little care here. 

This is the somethimes deprecated loop over
observations, which nevertheless has a certain charm 
in this case. 

Nick 
n.j.cox@durham.ac.uk 

P.S. in the previous message, add a final -renpfix- 
to get your variable names back to the status quo ante. 

> -----Original Message-----
> From: owner-statalist@hsphsun2.harvard.edu
> [mailto:owner-statalist@hsphsun2.harvard.edu]On Behalf Of Nick Cox
> Sent: 24 May 2004 17:16
> To: statalist@hsphsun2.harvard.edu
> Subject: st: RE: Counting unique values across a set of variables:
> Re-sent
> 
> 
> I think this is easiest through a 
> 
> -reshape- 
> do stuff 
> -reshape- 
> 
> sequence, otherwise known as the Stata twostep. 
> 
> First we -rename- variables, so that 
> they have a common prefix, say 
> 
> foreach v of var B-Y { 
> 	rename `v' S_`v' 
> } 
> 
> Then we -reshape- to long: 
> 
> reshape long S_ , i(A) string 
> 
> Now our count of distinct strings is 
> 
> bysort A S_ : gen Z = _n == 1 
> by A : replace Z = sum(Z) 
> by A : replace Z = Z[_N] 
> 
> Now we -reshape- back 
> 
> reshape wide S_ , i(A) string 
> 
> and then -Z- is an extra variable
> in the dataset. 
> 
> Note that this counts "." 
> as a value like any other. (And 
> indeed also "", " ", "  ", etc.) 
> 
> If you want to subtract 1 because "." 
> is not of interest that one 
> way to do that is 
> 
> gen countperiod = 0 
> foreach v of var B-Y { 
> 	replace countperiod = countperiod + (`v' == ".") 
> } 
> 
> replace Z = Z - (countperiod > 0) 
> 
> Nick 
> n.j.cox@durham.ac.uk 
> 
> > -----Original Message-----
> > From: owner-statalist@hsphsun2.harvard.edu
> > [mailto:owner-statalist@hsphsun2.harvard.edu]On Behalf Of CM
> > Sent: 24 May 2004 16:53
> > To: statalist@hsphsun2.harvard.edu
> > Subject: st: Counting unique values across a set of 
> variables: Re-sent
> > 
> > 
> > Hi all,
> > 
> > I checked findit but don't believe I found what I
> > need.
> > 
> > Each row in my data represents a respondent.  Besides
> > the first column "A" representing ID, the other
> > columns (call them B thru Y) contain strings or "."  I
> > need to create a variable in column Z that counts the
> > number of unique strings found for any given
> > respondent in B thru Y.  Advice?
> > 
> > Thanks in advance,
> > CM
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
> 

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index