# Re: st: how to generate sum of distinct id1, by id2, in the last n years

 From "Austin Nichols" To statalist@hsphsun2.harvard.edu Subject Re: st: how to generate sum of distinct id1, by id2, in the last n years Date Tue, 18 Sep 2007 11:06:42 -0400

```Pierre Azoulay <pierre.azoulay@gmail.com>:
The language setting up the problem seems perversely unclear: "create
a variable that records the sum of distinct [id values] in the last 3
years" does not seem what you want at all, though a sum can help you
get what you want, if you want the number of distinct values of id
across years t, t-1, and t-2 saved in a new variable at t, like so:

clear
input star_id  id  year nbpapers
1     2   1972    1
1     2   1973    0
1     2   1974    2
1     2   1975    3
1     2   1976    0
1     2   1977    4
1     3   1970    1
1     3   1971    0
1     3   1972    0
1     3   1973    2
1     4   1978    2
1     4   1979    1
1     5   1977    4
1     5   1978    1
1     5   1979    0
1     5   1980    1
1     5   1981    1
end
g obs=_n
expand 3
bys obs: gen n=_n
gen yr=year+n-1
bys star yr id: g d=_n==1
egen ndistinct=sum(d), by(star yr)
drop if n>1
collapse ndist, by(star year)
fillin star y
li, noo clean

On 9/17/07, Pierre Azoulay <pierre.azoulay@gmail.com> wrote:
> Dear Statalisters,
>
> I have what I believe a simple programming question that I can't quite solve.
> I have a panel of dyads, where each member of the dyad is a coauthor.
> Each dyad is composed or a "superstar" and a "simple joe/jane."
>
> For instance:
>
> star_id         id              year            nbpapers
> ---------------------------------------------------------
> 1               2               1972            1
> 1               2               1973            0
> 1               2               1974            2
> 1               2               1975            3
> 1               2               1976            0
> 1               2               1977            4
> 1               3               1970            1
> 1               3               1971            0
> 1               3               1972            0
> 1               3               1973            2
> 1               4               1978            2
> 1               4               1979            1
> 1               5               1977            4
> 1               5               1978            1
> 1               5               1979            0
> 1               5               1980            1
> 1               5               1981            1
>
> So superstar #1 has 4 "simple joe collaborators" numbered 2,3,4, and 5.
> In each year, the data records how many publications exist for
> superstar i and simple joe/jane j.
>
>
> I would like to collapse this data at the superstar/year level and
> create a variable that records the sum of distinct "simple joes" in
> the last 3 years.
> In other words, I'd like to create the variable stk_nbcoauth_it that is:
>
> star_id year    stk_nbcoauth_it
> ---------------------------------
> 1       1970    1
> 1       1971    1
> 1       1972    2
> 1       1973    2
> 1       1974    2
> 1       1975    2
> 1       1976    1
> 1       1977    2
> 1       1978    3
> 1       1979    3
> 1       1980    2
> 1       1981    2
>
> I have fiddle with bysort star_id id (year), but without clear
> success. Could anyone help?
>
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```