Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: RE: news modules on SSC

From   Jean-Benoit Hardouin <>
Subject   Re: st: RE: news modules on SSC
Date   Fri, 30 Sep 2005 20:55:07 +0200

Dear Nick,
On your 1., you're right: -genscore- is very sensitive to the missing value and this is the main reason why I have wrote this command: I must improve the hlp file to explain that.
On your 2., you're right BUT the *missing* option consider by default only a point ".", so without using this option, you obtain what you want and a user who forget the *missing* option cannot have any problem. I don't think this is a dangerous option, but I am interesting by advices of others users !
On your 3. That's right !

I modify the hlp file following these remarks and the ado file to support -if- and -in-. Thank you Nick for your mail !

On the end of your mail, (my poor english can produce bad interpretations) my idea was to write a user-written Stata packages accessible from -egen- (I believe this is possible), not to propose to StataCorp to adopt it !! I am sorry if I have bad explain my idea.

Nevertheless, I have learned full of things in your mail about the way to adopt something in official Stata. What is a certification script ?


Nick Cox a écrit :

-genscore- appears to be a variation on the existing
official -egen- functions -rowmean()- and -rowtotal()-
(-rmean()- and -rsum()- in Stata < 9).
So that any interested can see what I am talking about, here is a simplified version of the program. (I take responsibility for any bugs introduced into this slimmed-down version.)
program genscore_simplified version 9.0
syntax varlist [, SCore(namelist min=1 max=1) MEan MIssing(string)]

if "`score'" == "" local score score
if "`missing'" == "" local missing .

capture confirm new variable `score'
if _rc {
di as err "The variable {hi:`score'} already defined"
exit 198

quietly {
gen `score' = 0
foreach v of local varlist {
replace `score' = ///
cond(`v' == `missing', ., `score' + `v') }
if "`mean'" != "" {
replace `score' = `score' / `: word count `varlist'' }

There are three main differences that I can see, apart
from cosmetic syntax details, compared with the official -egen-
1. -genscore- is ultra-sensitive to missing values. A single missing value in one of the variables processed is enough to produce a missing result. In contrast, -egen, rowtotal()- and -egen, rowmean()- are ultra-indulgent and return missing if and only if all arguments are missing.
This difference can clearly be important in terms of what you want. However, it is -genscore-'s main feature, yet the fact is not documented at all
in the help; nor is there a cross-reference to -egen-. In the next revision, I suggest that this be made clear.
2. There is an option -missing()- which allows you to declare that in your data a particular value has the meaning of missing.
Again in practice, there could be all sorts of reasons why you import data which contain idiosyncratic codings
for missing data. By and large, it is best to map those to missing using -mvdecode- as soon as possible. Maintaining a particular coding which you and only you know means missing is very dangerous. Forget that once
and you produce garbage results.
So, from one point of view, this option supports dangerous Stata practices.
3. There is no support for -if- or -in-.
In addition, some help file examples are in terms of a -genscore()- option, but the option is -score()-. That's a typo for the next fix.
Jean-Benoit offers his program for adoption in -egen-. That is a StataCorp decision, but I can offer another user perspective on the much more general question. These comments go far beyond the immediate detail of this program.
StataCorp are increasingly going to get much, much pickier about adoption of user-written software. There are two main reasons for this:
* The existence of -net- (and features parasitic on it like -ssc-) much reduces the need for official adoption
of user-written stuff. The whole point is that if it's good and you like it, you can have it, and it should work seamlessly with official Stata.
* Some users vastly underestimate how big a deal it is to adopt something in official Stata. Suppose you wrote a program and you did a good job. What next?
1. The code may be good, but is it up to StataCorp standards? Unlikely!
2. The help may be good, but is it up to StataCorp standards?
3. You did write a dialog, didn't you? (Most user-programmers, me included, stop short of writing the dialog too.)
4. You did write a certification script, didn't you? (Same story, more or less.)
5. Somebody has to write a manual entry. Perhaps just the help file, rejigged, but often a much bigger deal. (You just added some pages to a very fat series of manuals.)
6. Once this is in official Stata, and visible, it is something
else on which technical support may be sought.
7. Once this is in official Stata, it is something else that must be maintained as the rest of Stata changes.
Also, from the total perspective, there are let's say 1000-odd user-written Stata packages in the public domain. (That's an order of magnitude figure. It's at least several hundred, but not I think yet approaching 10,000.) Of course, no one, I presume, wants
StataCorp to adopt all of them. (Your wish is much more reasonable: you just want StataCorp to adopt all of those interesting and useful
to you, but so does everybody else!)
(On -egen- functions alone, the number in the public domain
is I guess of the order of 100.)
Let's say StataCorp should be real picky and choose the best 100. What would that mean? Probably setting aside all other work for 2 years and a few more manual volumes...
Jean-Benoit Hardouin

Thanks to Kit Baum two new modules are available on SSC :
- -biplotvlab- is an improvement of the Ken Higbee's code presented yesterday on the Statalist to draw a biplot graph with the label of the variables. The improvements concern a gap between the text and the ends of the arrows, and the possibility to give characteristics to the texts (color, size...). The labels of the variables are displayed and, if one or several variables have not a label, the name of these variables are displayed. For example this module is a nice way to produce biplots with temporary variables.
- -genscore- is a small module to easily create a new variable containing the score computed as the sum or the mean of several variables. It is possible to define a given modality as a missing value. I think that this module could be improved by integrating it in -egen- (for the next version of this module ?)

* For searches and help try:

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index