Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Re: SSC archive stats

From   Kit Baum <>
To   David Roodman <DRoodman@CGDEV.ORG>
Subject   st: Re: SSC archive stats
Date   Tue, 18 Oct 2005 18:23:34 -0400


I'm afraid that my prior explanations of this issue have been lost in the mists of time; my earliest post on Statalist @ HSPH (Jan 2001) says 'please see prior month's explanation', but the archives do not go back to 2000, and not even Google can find them. So I will try to reconstruct the logic.

The web server records every "hit" on an ado-file. ado-files are part of a package, which may contain a single ado- (and hlp-) file or may contain several or many indeed (e.g. egenmore). A 'score' should not be inflated by an author's inclusion of many files in a package, but we don't track packages; we track hits on individual ado files. Many ado files are multiply authored, and we want each author to receive credit for his or her work. So we have one file--an extract from the web server log
which says that, e.g., the single file xtabond2.ado was requested 478 times last month.

From the RePEc templates that define the SSC Archive, we generate another file (with perl)* in which each record contains the name of one ado-file, the SSC package of which it is a part, the number of ado-files in that package and an author's name. There is a record for each author|ado combination. So for instance your xtabond2 records in this file look like
XTABOND2 /repec/bocode/x/xtabond2.ado 2 David Roodman
XTABOND2 /repec/bocode/x/xtab2_p.ado 2 David Roodman
defining a package which has two components and one author.

Stata then reads the first file (containing the web server log excerpt) and merges the second file on the URL field shared by both files (the URL from which that ado may be downloaded at SSC). npkghit is generated as nhit/nmods -- so if xtabond2.ado and xtab2_p were both downloaded 478 times, npkghit would be 478 as well. But last month, xtab2_p was downloaded only 393 times, so npkghit is now potentially fractional (and for xtabond2 is 435.5). We then collapse this file to compute the sum of npkghit by(author package). This gives us the first listing distributed in my monthly emails, which shows e.g.
2. 435.50 David Roodman XTABOND2

We then collapse to compute the sum of npkghit by(author) to generate the second listing, e.g.
6. David Roodman 594.25
which reflects the totals from the several packages you have authored on SSC.

In contrast to the citation-count literature fashionable among tenure and promotion committees, I do not give each author of a package with N authors 1/Nth of the hits; I give each author all the hits.

So where are these fractions coming from? Recall that when you 'ssc install xtabond2, replace' Stata is smart--it only downloads the files which have changed. You updated xtabond2.ado but did not update xtab2_p.ado recently. Those updating their copies of xtabond2 installed only one file, while those installing it for the first time installed two. That explains why the 'hit counts' are not equal for all files in a package. (If people are downloading files from a web browser--even to just look at them on the screen--this would also be the case; they might look at the main .ado and not be interested in ancillary files).

Historical note: the one-page Stata program that does this manipulation is named '', written to satisfy a UK economist who thought these stats would be appreciated by the Dean.

Now that I have explained this, I should be able to find the explanation in the Statalist archives for the next five years or so!


* Note to Bill Gould: this perl program predated Stata's -file- command. If I wrote it today, I'd do it in Stata. It couldn't readily be done at the time I started crunching these numbers.

Kit Baum, Boston College Economics

On Oct 18, 2005, at 5:45 PM, David Roodman wrote:

Kit, is there documentation somewhere for how you massage the SSC download stats? How do the fractions come about in the number of hits? Thanks much.


David Roodman

Research Fellow

Center for Global Development

1776 Massachusetts Ave. NW

Washington, DC 20036

+1 (202) 416-0723

fax: +1 (202) 416-0750

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index