Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Re: SSC archive stats


From   "Schaffer, Mark E" <[email protected]>
To   <[email protected]>
Subject   st: RE: Re: SSC archive stats
Date   Fri, 1 Dec 2006 15:28:26 -0000

Kit,

Could you explain the relationship between the SSC Archive activity
statistics that you circulate monthly, and the access statistics for
software packages on RePEc/LogEc?

For example, in your October 2006 SSC Archive activity email, the top
entry is

         npkghit                       author             package
    1.    956.00             John Luke Gallup              OUTREG

The RePEc download statistics for -outreg- show the following:

Access Statistics for the software item
 Month	 Abstract Views 	 Downloads
2006-10            224            121

What is the source of the large difference between the SSC and RePEc
downloads?

Cheers,
Mark

NB: For everybody's convenience, Kit's posting to Statalist in which he
describes how the npkghit figure is calculated is reproduced below.

> -----Original Message-----
> From: Kit Baum [mailto:[email protected]] 
> Sent: Tuesday, October 18, 2005 11:23 PM
> To: David Roodman
> Cc: statalist
> Subject: st: Re: SSC archive stats
> 
> David,
> 
> I'm afraid that my prior explanations of this issue have been 
> lost in the mists of time; my earliest post on Statalist @ 
> HSPH (Jan 2001) says 'please see prior month's explanation', 
> but the archives do not go back to 2000, and not even Google 
> can find them. So I will try to reconstruct the logic.
> 
> The web server records every "hit" on an ado-file. ado-files 
> are part of a package, which may contain a single ado- (and 
> hlp-) file or may contain several or many indeed (e.g. 
> egenmore). A 'score' should not be inflated by an author's 
> inclusion of many files in a package, but we don't track 
> packages; we track hits on individual ado files. Many ado 
> files are multiply authored, and we want each author to 
> receive credit for his or her work. So we have one file--an 
> extract from the web server log 
> /http://fmwww.bc.edu/fmrc/reports/report.ssc.html
> which says that, e.g., the single file xtabond2.ado was 
> requested 478 times last month.
> 
>  From the RePEc templates that define the SSC Archive, we 
> generate another file (with perl)* in which each record 
> contains the name of one ado-file, the SSC package of which 
> it is a part, the number of ado-files in that package and an 
> author's name. There is a record for each author|ado 
> combination. So for instance your xtabond2 records in this 
> file look like
> XTABOND2      /repec/bocode/x/xtabond2.ado      2      David Roodman
> XTABOND2      /repec/bocode/x/xtab2_p.ado      2      David Roodman
> defining a package which has two components and one author.
> 
> Stata then reads the first file (containing the web server log
> excerpt) and merges the second file on the URL field shared 
> by both files (the URL from which that ado may be downloaded 
> at SSC). npkghit is generated as nhit/nmods -- so if 
> xtabond2.ado and xtab2_p were both downloaded 478 times, 
> npkghit would be 478 as well. But last month, xtab2_p was 
> downloaded only 393 times, so npkghit is now potentially 
> fractional (and for xtabond2 is 435.5). We then collapse this 
> file to compute the sum of npkghit by(author package). This 
> gives us the first listing distributed in my monthly emails, 
> which shows e.g.
>     2.    435.50              David Roodman            XTABOND2
> 
> We then collapse to compute the sum of npkghit by(author) to 
> generate the second listing, e.g.
>     6.             David Roodman     594.25
> which reflects the totals from the several packages you have 
> authored on SSC.
> 
> In contrast to the citation-count literature fashionable 
> among tenure and promotion committees, I do not give each 
> author of a package with N authors 1/Nth of the hits; I give 
> each author all the hits.
> 
> So where are these fractions coming from? Recall that when 
> you 'ssc install xtabond2, replace' Stata is smart--it only 
> downloads the files which have changed. You updated 
> xtabond2.ado but did not update xtab2_p.ado recently. Those 
> updating their copies of xtabond2 installed only one file, 
> while those installing it for the first time installed two. 
> That explains why the 'hit counts' are not equal for all 
> files in a package. (If people are downloading files from a 
> web browser--even to just look at them on the screen--this 
> would also be the case; they might look at the main .ado and 
> not be interested in ancillary files).
> 
> Historical note: the one-page Stata program that does this 
> manipulation is named 'forthedean.do', written to satisfy a 
> UK economist who thought these stats would be appreciated by the Dean.
> 
> Now that I have explained this, I should be able to find the 
> explanation in the Statalist archives for the next five years or so!
> 
> Cheers
> Kit
> 
> * Note to Bill Gould: this perl program predated Stata's 
> -file- command. If I wrote it today, I'd do it in Stata. It 
> couldn't readily be done at the time I started crunching 
> these numbers.
> 
> Kit Baum, Boston College Economics
> http://ideas.repec.org/e/pba1.html
> 
> 
> On Oct 18, 2005, at 5:45 PM, David Roodman wrote:
> 
> > Kit, is there documentation somewhere for how you massage the SSC 
> > download stats? How do the fractions come about in the 
> number of hits? 
> > Thanks much.
> >
> > --David
> >
> >
> >
> > David Roodman
> >
> > Research Fellow
> >
> > Center for Global Development
> >
> > 1776 Massachusetts Ave. NW
> >
> > Washington, DC 20036
> >
> > [email protected]
> >
> > +1 (202) 416-0723
> >
> > fax: +1 (202) 416-0750
> >
> >
> >
> >
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
> 
> 

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index