Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Counting Unique Values by Year


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   st: RE: Counting Unique Values by Year
Date   Mon, 2 Jun 2003 11:27:04 +0100

Jennifer S. Earl
> 
> I have a data set with cases spread out over a number of 
> years. I have a 
> numeric variable called CLMS. I want to create a new 
> variable UNIQCLMS that 
> equals the number of unique values that CLMS took on each year.
> 
> I have thought of some very long-winded ways to do this, 
> such as creating a 
> counter using a lag-comparison and then harvesting the last 
> value of this 
> counter, but it seems like it should be easier. In 
> particular, Stata 
> already calculates the number of unique values in lots of 
> operations, 
> including INSPECT (e.g., "by year: inspect clms" will 
> produce the number of 
> unique values for CLMS, unless that number exceeds 99, but 
> it won't write 
> that value out to another variable as far as I know), and 
> the number of 
> unique values should also equal the number of rows produced 
> using "by year: 
> tab clms".
> 
> So, I am hoping someone might be able to think of a quick 
> and/or elegant 
> way to get Stata to produce a new variable, UNIQCLMS that 
> contains the 
> number of unique values that CLMS takes on in each year. If 
> I could dream 
> up a new egen command, the format would be something like:
> 
> by year: egen uniqclm=unique(CLMS)
> 

If you look in the -egenmore- package on SSC 
you will find a (perhaps not well named) -nvals()- 
function for -egen- which does this. The syntax you 
want is similar to your dream, but not identical. 
After 

ssc inst egenmore

you want 

egen uniqclm = nvals(CLMS), by(year) 

But let's suppose this didn't exist. How 
would you get your variable using just official Stata? 
Your intuition is correct: in Stata this 
is not very difficult at all. 

In the simplest case, the code would be 

bysort year CLMS: gen uniqclms = _n == 1 
by year: replace uniqclms = sum(uniqclms) 
by year: replace uniqclms = uniqclms[_N] 

So we tag every distinct value by 1, just once, 
the first time it occurs. Then we sum all the 
1s, and so on. 

However, that code would need to be modified if 
you had missing values or wanted to tack on 
-if- or -in- conditions. 

There was a tutorial on -by:- in Stata Journal 
2(1), 86-102 (2002) with lots of explanation
and examples. 

Nick 
[email protected] 

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index