Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Combining forvalues and by - simple programming question that somehow eludes me


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   st: RE: Combining forvalues and by - simple programming question that somehow eludes me
Date   Sun, 17 Jun 2007 17:10:34 +0100

I recall some fuss about this in _Nature_ and _Science_ 
a couple of years ago. I wrote a couple of programs 
and then got bored, or distracted, before I wrote the help. 
Anyway, both programs support -by:-. 

------------------------------------ hindex.ado 
*! NJC 1.0.0 21 Sept 2005 
program hindex, byable(recall) rclass sort 
	version 8 
	syntax varname(numeric) [if] [in] 

	quietly { 
		marksample touse 
		count if `touse' 
		if r(N) == 0 error 2000 

		tempvar negvar rank 
		gen `negvar' = -`varlist' 
		bysort `touse' (`negvar'): gen `rank' = _n
		bysort `touse' `negvar' (`rank'): replace `rank' = `rank'[_N] 

		su `rank' if (`rank' <= `varlist') & `touse', meanonly 
	}
	
	di _n as txt "h-index " as res %3.0f r(max) 
	return scalar hindex = r(max) 
end 
---------------------------------- 

---------------------------------- _ghindex.ado
*! 1.0.1 NJC 17 Oct 2005
*! 1.0.0 NJC 21 Sept 2005
program _ghindex 
	version 8
	syntax newvarname =/exp [if] [in] [, BY(varlist) ]
	marksample touse, novarlist 
	tempvar GRV
	quietly {
		gen double `GRV' = -(`exp') if `touse'
		markout `touse' `GRV' 
		bysort `touse' `by' (`GRV'): gen `typlist' `varlist' = _n 
		by `touse' `by' `GRV': replace `varlist' = `varlist'[_N]
		replace `varlist' = 0 if (`exp') < `varlist' 	
		bysort `touse' `by' (`varlist'): ///
			replace `varlist' = `varlist'[_N]
		if "`by'" != "" local by " by `by'" 
		label var `varlist' "h-index of `exp'`by'"
	} 	
end
--------------------------------- 

and I worked out this recipe for h-index of -response-, by -byvar-:
 
bysort byvar : egen temp = rank(-response), unique
bysort byvar response : egen rank = max(temp)
by byvar : egen hindex = max(rank) if response >= rank 
tabdisp byvar if response >= rank, cell(hindex)

Also in 2005, for what's it worth, I sent this to Nature, but 
it was rejected (meaning, it was not published, and no-one
replied). 

------------------------------------------------------- 
The h-index of Jorge Hirsch (http://xxx.arxiv.org/abs/physics/0508025;
Nature 436, 900; 2005) is the highest number of papers a scientist has
that have each received at least that number of citations. As a measure
of both productivity and impact, it is offered as a objective criterion
for decisions on say tenure, promotion and elections to distinguished
societies. 

The index is so simple and elegant that one wonders why it has
apparently not been suggested before as a descriptive statistic.
However, application to other ranked counts, such as species abundance
data from ecology, suggests that its bibliometric success depends partly
on the coincidence that numbers of publications and of citations per
paper are often close to each other.  This reflects both the sizes of
specialisms and various publication conventions.

On the evidence so far, mostly from physics, the h-index works rather
well, at least for comparing people in the same field who work in the
same way. But that is the nub of the matter. Can it acceptably rank
people in very different fields, even with a range of benchmarks? Like
any one-dimensional summary based on publications, it ducks the question
of how to assess outputs in other form (patents, software, etc.).
Sciences vary considerably in how far publications are single-authored
or multi-authored or very short or much longer, so number of papers is a
dubious metric on those grounds alone.  The fraction of past or even
current literature covered in major databases also should not be assumed
either high or constant across disciplines.  Some scientists publish
sparsely but durably in small and unspectacular but nevertheless
fundamental fields (e.g. specialisms in systematic biology). The
h-index also undervalues the contribution of those who publish a few
outstandingly important papers (deep theorems such as that of
Fermat-Wiles, fundamental new discoveries, very widely used methods). If
scientists were to plan with the h-index in mind, the incentive to write
books or review papers would decrease markedly, as each could have only
a marginal effect compared with writing conventional papers. Science
could only suffer as a result. 
-------------------------------------------

Nick 
[email protected] 

Pierre Azoulay
 
> I am trying to calculate the so-called h index for a large number of
> scientists. The h index of a scientist and the highest integer h such
> that the scientist has h papers cited at least h times.
> 
> For example, for the scientist below, the h index is 19.
> 
> scientist_id	article_id	nbcites
> GEORGE	10101157	8
> GEORGE	12242494	10
> GEORGE	11156976	12
> GEORGE	9409826	         19
> GEORGE	7635312	         23
> GEORGE	7799970	          23
> GEORGE	11290701	28
> GEORGE	8034742	        42
> GEORGE	8334302	        43
> GEORGE	2656402	        74
> GEORGE	2005819     	79
> GEORGE	2643162     	111
> GEORGE	8943317	     127
> GEORGE	1956405	     146
> GEORGE	9314530	     153
> GEORGE	2404021	     204
> GEORGE	3049620	     302
> GEORGE	2195038	     373
> GEORGE	2476649	     393
> GEORGE	2005809	     527
> GEORGE	6365931	     614
> GEORGE	6365930	     670
> 
> 
> I have written a program that calculates this for one scientist (see
> below). The problem is that I have a very large number of scientists,
> and so would like to combine the program below with "by scientist_id:"
> 
> I am not sure exactly how to do that in stata. Could any one help?
> 
> Thanks,
> 
> Pierre
> 
> 
> gen h_index=.;
> local N = _N;
> forvalues i = 1(1)`N'
> 		{;
> 		display `i';
> 		replace h_index=`N'-`i'+1 if 
> (nbcites[`i']>=`N'-`i'+1 & h_index==.);
> 		replace h_index=`N'-`i'+1 if (nbcites[`i']>=`N'-`i'+1 &
> h_index<`N'-`i'+1 & h_index!=.);
> 		};

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index