Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: news modules on SSC

From   "Nick Cox" <>
To   <>
Subject   st: RE: news modules on SSC
Date   Fri, 30 Sep 2005 18:54:41 +0100

-genscore- appears to be a variation on the existing
official -egen- functions -rowmean()- and -rowtotal()-
(-rmean()- and -rsum()- in Stata < 9). 

So that any interested can see what I am talking 
about, here is a simplified version of the program. 
(I take responsibility for any bugs introduced into this 
slimmed-down version.) 

program genscore_simplified 
	version 9.0
	syntax varlist [, SCore(namelist min=1 max=1) MEan MIssing(string)]

	if "`score'" == "" local score score
	if "`missing'" == "" local missing .

	capture confirm new variable `score'
	if _rc {
		di as err "The variable {hi:`score'} already defined"
		exit 198
	quietly {
		gen `score' = 0
		foreach v of local varlist {
			replace `score' = ///
			cond(`v' == `missing', ., `score' + `v') 
		if "`mean'" != "" {
			replace `score' = `score' / `: word count `varlist'' 

There are three main differences that I can see, apart
from cosmetic syntax details, compared with the official -egen-

1. -genscore- is ultra-sensitive to missing values. 
A single missing value in one of the variables 
processed is enough to produce a missing result. 
In contrast, -egen, rowtotal()- and -egen, rowmean()- 
are ultra-indulgent and return missing if and only 
if all arguments are missing. 

This difference can clearly be important in terms 
of what you want. However, it is -genscore-'s main 
feature, yet the fact is not documented at all
in the help; nor is there a cross-reference to -egen-. 
In the next revision, I suggest that this be made clear. 

2. There is an option -missing()- which allows you 
to declare that in your data a particular value has 
the meaning of missing. 

Again in practice, there could be all sorts of reasons 
why you import data which contain idiosyncratic codings
for missing data. By and large, it is best to map 
those to missing using -mvdecode- as soon as possible. 
Maintaining a particular coding which you and only you 
know means missing is very dangerous. Forget that once
and you produce garbage results. 

So, from one point of view, this option supports 
dangerous Stata practices. 

3. There is no support for -if- or -in-. 

In addition, some help file examples are in terms of 
a -genscore()- option, but the option is -score()-. 
That's a typo for the next fix. 

Jean-Benoit offers his program for adoption in -egen-. That 
is a StataCorp decision, but I can offer another user 
perspective on the much more general question. These 
comments go far beyond the immediate detail of this program. 

StataCorp are increasingly going to get much, much pickier 
about adoption of user-written software. There are two 
main reasons for this: 

* The existence of -net- (and features parasitic on it 
like -ssc-) much reduces the need for official adoption
of user-written stuff. The whole point is that if it's 
good and you like it, you can have it, and it should work 
seamlessly with official Stata. 

* Some users vastly underestimate how big a deal it is to 
adopt something in official Stata. Suppose you wrote 
a program and you did a good job. What next? 

1. The code may be good, but is it up to StataCorp standards? 

2. The help may be good, but is it up to StataCorp standards?

3. You did write a dialog, didn't you? (Most user-programmers, 
me included, stop short of writing the dialog too.) 

4. You did write a certification script, didn't you? (Same 
story, more or less.) 

5. Somebody has to write a manual entry. Perhaps just the 
help file, rejigged, but often a much bigger deal. (You just 
added some pages to a very fat series of manuals.) 

6. Once this is in official Stata, and visible, it is something
else on which technical support may be sought. 

7. Once this is in official Stata, it is something else that must 
be maintained as the rest of Stata changes. 

Also, from the total perspective, there are let's say 1000-odd 
user-written Stata packages in the public domain. (That's an order of 
magnitude figure. It's at least several hundred, but not I 
think yet approaching 10,000.) Of course, no one, I presume, wants
StataCorp to adopt all of them. (Your wish is much more reasonable: 
you just want StataCorp to adopt all of those interesting and useful
to you, but so does everybody else!) 

(On -egen- functions alone, the number in the public domain
is I guess of the order of 100.) 

Let's say StataCorp should be real picky and choose the best 100. 
What would that mean? Probably setting aside all other work for 2 
years and a few more manual volumes... 


Jean-Benoit Hardouin
> Thanks to Kit Baum two new modules are available on SSC :
>        - -biplotvlab- is an improvement of the Ken Higbee's code  
> presented yesterday on the Statalist to draw a biplot graph with  the 
> label of the variables. The improvements concern a gap 
> between  the text 
> and the ends of the arrows, and the possibility to give  
> characteristics 
> to the texts (color, size...). The labels of the  variables are 
> displayed and, if one or several variables have not a  label, 
> the name 
> of these variables are displayed. For example this  module is 
> a nice way 
> to produce biplots with temporary variables.
>       - -genscore- is a small module to easily create a new  variable 
> containing the score computed as the sum or the mean of  several 
> variables. It is possible to define a given modality as a  missing 
> value. I think that this module could be improved by  
> integrating it in 
> -egen- (for the next version of this module ?)

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index