[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
st: RE: Counting Unique Values by Year
Jennifer S. Earl
> I have a data set with cases spread out over a number of
> years. I have a
> numeric variable called CLMS. I want to create a new
> variable UNIQCLMS that
> equals the number of unique values that CLMS took on each year.
> I have thought of some very long-winded ways to do this,
> such as creating a
> counter using a lag-comparison and then harvesting the last
> value of this
> counter, but it seems like it should be easier. In
> particular, Stata
> already calculates the number of unique values in lots of
> including INSPECT (e.g., "by year: inspect clms" will
> produce the number of
> unique values for CLMS, unless that number exceeds 99, but
> it won't write
> that value out to another variable as far as I know), and
> the number of
> unique values should also equal the number of rows produced
> using "by year:
> tab clms".
> So, I am hoping someone might be able to think of a quick
> and/or elegant
> way to get Stata to produce a new variable, UNIQCLMS that
> contains the
> number of unique values that CLMS takes on in each year. If
> I could dream
> up a new egen command, the format would be something like:
> by year: egen uniqclm=unique(CLMS)
If you look in the -egenmore- package on SSC
you will find a (perhaps not well named) -nvals()-
function for -egen- which does this. The syntax you
want is similar to your dream, but not identical.
ssc inst egenmore
egen uniqclm = nvals(CLMS), by(year)
But let's suppose this didn't exist. How
would you get your variable using just official Stata?
Your intuition is correct: in Stata this
is not very difficult at all.
In the simplest case, the code would be
bysort year CLMS: gen uniqclms = _n == 1
by year: replace uniqclms = sum(uniqclms)
by year: replace uniqclms = uniqclms[_N]
So we tag every distinct value by 1, just once,
the first time it occurs. Then we sum all the
1s, and so on.
However, that code would need to be modified if
you had missing values or wanted to tack on
-if- or -in- conditions.
There was a tutorial on -by:- in Stata Journal
2(1), 86-102 (2002) with lots of explanation
* For searches and help try: