Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: re: SSC Activity, November 2009


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: re: SSC Activity, November 2009
Date   Mon, 7 Dec 2009 14:26:07 -0000

I don't think you can fairly infer from collective silence anything
except that other people are unable (e.g. through not looking at
work-related email over a weekend) or unwilling to contribute to this
thread. There could be several reasons for the latter, which are equally
a matter for speculation, but nevertheless members of Statalist will be
able to make their own guesses. 

I personally find this entire line of enquiry very puzzling at various
levels, and ultimately deplorable. Having looked with some considerable
interest at Kit's download statistics I too occasionally have been
surprised by various spikes in downloads. But why not? Only an entity
omniscient of all downloaders and all their intentions can be _certain_
of not only who is downloading what and when and where but also _why_
they are downloading packages, which is what you are concerned about.
You are not that entity, I am not, and even collectively on this list we
don't approximate it. 

I can not rule out, as an impossibility, the speculation that there may
be someone, somewhere who is artificially boosting downloads either on
their own behalf or on behalf of others. But equally I think it in poor
taste to post even speculations of this kind without any hard evidence.
We are not talking of bug reports here. Claims in a public forum that
uneven unnamed individuals are guilty of systematic dishonesty are a big
deal. 

Even your analysis of download data is seriously flawed: you slide down
a slippery slope of interpreting whatever is (a) puzzling or implausible
to you in the time series as being (b) self-evidently strange to all and
thus being (c) evidence ipso facto of manipulation. If there is anyone
who is concerned to boost their ratings, and I doubt there is, they
could easily be clever and cunning enough to so do in a way that would
not be obvious, e.g. as strange spikes in the series. More importantly,
there could be any number of other interpretations for surprising
changes in the series, as for example, that someone is for a short
period repeatedly running a do file that installs the latest version of
one or more packages. It is shocking that you pay almost no attention to
the possibility of other quite different interpretations of the data and
that you airily dismiss Kit's hard evidence of IP addresses as being
easily faked. Moreover, your own mention of a program at a meeting is
regarded as fair comment in interpretation, but you do not extend the
same courtesy to all other programs mentioned explicitly in commenting
on any number of things that might have boosted downloads (papers,
talks, postings; also mentions in teaching, which are difficult to
document). 

As against all that I place my own experience that all the
user-programmers I know personally are people of high standards. You
don't get to write high-quality software that people want to install
without having 
integrity that doesn't include faking download statistics. I think it
immensely more plausible that fluctuations in downloads reflect
collective minor moods or dopeyness among thousands of downloaders than
that some dark programmer stalks among us. Besides, no one really much
cares about these download data any way! 

I don't know, Roy, why in this thread and in many others you show such
deep-seated distrust for, and even contempt for, the Stata user
community in making these accusations, to the extent of implying that
the community is knowingly tolerating bad behaviour. This thread started
with an insinuation that the data for your -outreg2- (among others) was
faked. I and many others trust you and imagine that kind of claim to be
absurd. You don't need to defend yourself against contemptible
insinuations made by others on the basis of no evidence. Why not trust
others as they trust you? 

Nick 
n.j.cox@durham.ac.uk 

Roy Wada

> I am going to add the following because I don't like the idea of
> someone given a license to keep on doing this:

As I said before I am making this post because this has gone on long
enough and I do not want to see Kit giving an official validation or
imprimatur to download numbers which do not look right. This will only
encourage more manipulations. Judging by the collective silence on
this popular topic, I suspect many people also have uneasy feelings
that there is something amiss about these download numbers. Kit has
spent tremendous time and effort developing ssc and it's very
unfortunate that this is being done through ssc.

Disclaimer: these are numbers obtained and crunched by me. If I made a
gross error, contact me to have it fixed or issue an addendum. Note
that I am only presenting numbers with possible explanations. I do not
say or imply who did the downloading. I do not say or imply, for
example, that Edwin Leuven and Barbara Sianesi manipuated psmatch2
numbers. That would be silly. psmatch is included here because it had
download number of 500 or more at some time since Jan 2005 and still
in circulation. Richard Williams will be happy to know that mfx2 is
included in the list on pure merit. I did not discriminate.

There are several sources for ssc programs. One is through -ssc
install- command. This method is virtually costly, meaning can be
easily download by thousands. The other method is through RePEc
websites. This is a manual method. The only people who go through this
method are the ones who really want it.

I have downloaded the ssc statistics (Kit's monthly plus whatshot, I
used logout for this) and matched them to the RePEc download history.
In order to make them comparable, RePEc was converted into a moving
3-month average x 10. The ssc statistics prior to Nov 2007 was also
converted into a 3-month average.

The basic premise is that the two series must track each other. The
Excel graphs are uploaded here. They have sufficient resolution; you
only need to zoom in. If they get knocked off (go offline or do not
work), I will upload the raw numbers.

http://profile.imageshack.us/user/roywada/

The graphs are numbered from right to left. The graph 1 is on the RIGHT.

In the graph 1 (on the right side), mat2txt and psmatch2 are clearly
"manipulated". The pink repec line and the blue ssc line diverge for
several months and then re-converge. outreg shows up as it should: the
two close lines track each other (the pink repec line lags the blue
ssc line).

In the graph 2 (the one next to the right-most one), both outreg2 and
tabout are moving up and looking as they should. Could I have kown
this and manipuated outreg2 precisely? If I did that, I would have had
to download each ado files (4 of them) and may be 3 help files for 40
times each (which is about 20%), which would mean 280 manual download
per months for several years. Should I have cleared the cookies after
each visit? I don't know. Even then it would have been a risky
operation because the two lines can easily become divergent and it
will show up. As some people have found out, outreg2 is not even fully
documented, and I don't add functionalities that can be easily added.
The best explanation for outreg2 is that it doesn't have to be
manipuated and I have no interest in doing so.

The problem is estout. The blue ssc line for estout breaks the trend
aroud Sep 2008, and breaks again around Jan 2009. Note that the blue
line keeps going up while the pink line is trending down. There is no
good reason for this divergence considering that the two lines have
previously tracked each other. estout was updated around Jan 2009 but
the functionality added at that time overlapped other existing
programs and should not have had that much impact. If you believe what
the pink line is saying, the download numbers for estout have
basically moved sideways since the middle of 2007 and possibly
trending downward.

If anyone is keeping track of the calendar, the massive manipulation
began around Sep 2008 with mat2txt. It then briefly moved on to
psmatch2. mat2txt subsided around the end of spring. By that time
estout is in full swing. I find the timing to be very interesting. I
also find the choice of programs very interesting.

An outside possibility is that the manipulation was done for the
purpose of casting suspicion in that direction but this seems too much
work just for that.

Roy

The graphs 3, 4, 5, and 6 are discussed at the end of this post (mfx2
is in there). They are included here for completeness sake:

In the graph 3, ivreg2 looks at it should. xml_tab has a surge,
courtesy of its introduction at a Stata conference (by me). Similar
introduction must have happend to mfx2, but I don't know who did that
one.

In the graph 4, overid look okay. gllamm and ranktest has tendency to
diverge but they always diverge, which means the trend between the
pink and the blue lines hold.

In the graph 5, whitetst and xtabond2 look okay. xcollapse has a
singular peak but that sometimes happens to a program that has not
been updated in a while.

In the graph 6, xtivreg2 looks okay.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index