[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Nick Cox" <n.j.cox@durham.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: RE: Counting Unique Values by Year |

Date |
Mon, 2 Jun 2003 11:27:04 +0100 |

Jennifer S. Earl > > I have a data set with cases spread out over a number of > years. I have a > numeric variable called CLMS. I want to create a new > variable UNIQCLMS that > equals the number of unique values that CLMS took on each year. > > I have thought of some very long-winded ways to do this, > such as creating a > counter using a lag-comparison and then harvesting the last > value of this > counter, but it seems like it should be easier. In > particular, Stata > already calculates the number of unique values in lots of > operations, > including INSPECT (e.g., "by year: inspect clms" will > produce the number of > unique values for CLMS, unless that number exceeds 99, but > it won't write > that value out to another variable as far as I know), and > the number of > unique values should also equal the number of rows produced > using "by year: > tab clms". > > So, I am hoping someone might be able to think of a quick > and/or elegant > way to get Stata to produce a new variable, UNIQCLMS that > contains the > number of unique values that CLMS takes on in each year. If > I could dream > up a new egen command, the format would be something like: > > by year: egen uniqclm=unique(CLMS) > If you look in the -egenmore- package on SSC you will find a (perhaps not well named) -nvals()- function for -egen- which does this. The syntax you want is similar to your dream, but not identical. After ssc inst egenmore you want egen uniqclm = nvals(CLMS), by(year) But let's suppose this didn't exist. How would you get your variable using just official Stata? Your intuition is correct: in Stata this is not very difficult at all. In the simplest case, the code would be bysort year CLMS: gen uniqclms = _n == 1 by year: replace uniqclms = sum(uniqclms) by year: replace uniqclms = uniqclms[_N] So we tag every distinct value by 1, just once, the first time it occurs. Then we sum all the 1s, and so on. However, that code would need to be modified if you had missing values or wanted to tack on -if- or -in- conditions. There was a tutorial on -by:- in Stata Journal 2(1), 86-102 (2002) with lots of explanation and examples. Nick n.j.cox@durham.ac.uk * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**st: computing marginal effects after Heckprob***From:*<mmelnik@mindspring.com>

**References**:**st: Counting Unique Values by Year***From:*"Jennifer S. Earl" <jearl@soc.ucsb.edu>

- Prev by Date:
**st: chi2 test for linear trend for repeated data** - Next by Date:
**Re: st: RE: Error Bars on Histogram or Bar Plots** - Previous by thread:
**st: Re: Counting Unique Values by Year** - Next by thread:
**st: computing marginal effects after Heckprob** - Index(es):

© Copyright 1996–2016 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |