[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
st: Re: SSC archive stats
I'm afraid that my prior explanations of this issue have been lost in
the mists of time; my earliest post on Statalist @ HSPH (Jan 2001)
says 'please see prior month's explanation', but the archives do not
go back to 2000, and not even Google can find them. So I will try to
reconstruct the logic.
The web server records every "hit" on an ado-file. ado-files are part
of a package, which may contain a single ado- (and hlp-) file or may
contain several or many indeed (e.g. egenmore). A 'score' should not
be inflated by an author's inclusion of many files in a package, but
we don't track packages; we track hits on individual ado files. Many
ado files are multiply authored, and we want each author to receive
credit for his or her work. So we have one file--an extract from the
web server log
which says that, e.g., the single file xtabond2.ado was requested 478
times last month.
From the RePEc templates that define the SSC Archive, we generate
another file (with perl)* in which each record contains the name of
one ado-file, the SSC package of which it is a part, the number of
ado-files in that package and an author's name. There is a record for
each author|ado combination. So for instance your xtabond2 records in
this file look like
XTABOND2 /repec/bocode/x/xtabond2.ado 2 David Roodman
XTABOND2 /repec/bocode/x/xtab2_p.ado 2 David Roodman
defining a package which has two components and one author.
Stata then reads the first file (containing the web server log
excerpt) and merges the second file on the URL field shared by both
files (the URL from which that ado may be downloaded at SSC). npkghit
is generated as nhit/nmods -- so if xtabond2.ado and xtab2_p were
both downloaded 478 times, npkghit would be 478 as well. But last
month, xtab2_p was downloaded only 393 times, so npkghit is now
potentially fractional (and for xtabond2 is 435.5). We then collapse
this file to compute the sum of npkghit by(author package). This
gives us the first listing distributed in my monthly emails, which
2. 435.50 David Roodman XTABOND2
We then collapse to compute the sum of npkghit by(author) to generate
the second listing, e.g.
6. David Roodman 594.25
which reflects the totals from the several packages you have authored
In contrast to the citation-count literature fashionable among tenure
and promotion committees, I do not give each author of a package with
N authors 1/Nth of the hits; I give each author all the hits.
So where are these fractions coming from? Recall that when you 'ssc
install xtabond2, replace' Stata is smart--it only downloads the
files which have changed. You updated xtabond2.ado but did not update
xtab2_p.ado recently. Those updating their copies of xtabond2
installed only one file, while those installing it for the first time
installed two. That explains why the 'hit counts' are not equal for
all files in a package. (If people are downloading files from a web
browser--even to just look at them on the screen--this would also be
the case; they might look at the main .ado and not be interested in
Historical note: the one-page Stata program that does this
manipulation is named 'forthedean.do', written to satisfy a UK
economist who thought these stats would be appreciated by the Dean.
Now that I have explained this, I should be able to find the
explanation in the Statalist archives for the next five years or so!
* Note to Bill Gould: this perl program predated Stata's -file-
command. If I wrote it today, I'd do it in Stata. It couldn't readily
be done at the time I started crunching these numbers.
Kit Baum, Boston College Economics
On Oct 18, 2005, at 5:45 PM, David Roodman wrote:
Kit, is there documentation somewhere for how you massage the SSC
download stats? How do the fractions come about in the number of
hits? Thanks much.
Center for Global Development
1776 Massachusetts Ave. NW
Washington, DC 20036
+1 (202) 416-0723
fax: +1 (202) 416-0750
* For searches and help try: