Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: data management


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: data management
Date   Mon, 7 Dec 2009 15:06:27 -0000

There is no inherent virtue in "quick" questions!

"rowwise" operations were discussed fairly systematically in the Stata
Journal earlier this year: 

SJ-9-1  pr0046  . . . . . . . . . . . . . . . . . . .  Speaking Stata:
Rowwise
        (help rowsort, rowranks if installed) . . . . . . . . . . .  N.
J. Cox
        Q1/09   SJ 9(1):137--157
        shows how to exploit functions, egen functions, and Mata
        for working rowwise; rowsort and rowranks are introduced

This -egen- function should help. Call as -egen <whatever> =
rownvals(<varlist>)-


* NJC 1.0.1 28 Jan 2009
* NJC 1.0.0 7 Jan 2009
program _grownvals 
	version 9
	gettoken type 0 : 0
	gettoken h    0 : 0 
	gettoken eqs  0 : 0

	syntax varlist(numeric) [if] [in] [, BY(string) MISSing]
	if `"`by'"' != "" {
		_egennoby rownvals() `"`by'"'
		/* NOTREACHED */
	}

	marksample touse, novarlist 
	local miss = "`missing'" != "" 
	quietly { 
		mata : row_nvals("`varlist'", "`touse'", "`h'",
"`type'", `miss') 
	}
end

mata : 

void row_nvals(string scalar varnames, 
		string scalar tousename,
		string scalar nvalsname,
		string scalar type, 
		real scalar miss)
{ 
	real matrix y 
	real colvector nvals, row

        st_view(y, ., tokens(varnames), tousename)    
	nvals = J(rows(y), 1, .) 

	if (miss) { 
		for(i = 1; i <= rows(y); i++) { 
			row = y[i,]'        
			nvals[i] = length(uniqrows(row))
	        }
	}
	else { 
		for(i = 1; i <= rows(y); i++) { 
			row = y[i,]'        
			nvals[i] = length(uniqrows(select(row, (row :<
.))))
		}
        }

	st_addvar(type, nvalsname)
	st_store(., nvalsname, tousename, nvals) 
}	

end

Nick 
n.j.cox@durham.ac.uk 

Sim.Oertel@t-online.de

I have a quick data management question. I would like to count the
number of different values over several variables within a row and save
the result in a new variable.

For example: 
Var1 Var2 Var3 Var4 Var5 Var6 Var7 Varx
1         2      3        9       20     1       1      ...

In the example above, the new variable would holds the value 5, since
Var1, Var6 and Var7 all hold the same value (put differently, 5
different values can be count over var1 - var7). 

I tried the following command: egen byte countdif = diff(var1 var2 var3
var4 var5 var6 var7 varx). However, the new variable is only a dummy
(0/1). 


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index