Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <njcoxstata@gmail.com> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | Re: st: specifying all values different condition |
Date | Thu, 13 Feb 2014 12:05:24 +0000 |
Not so with your syntax, I think. If all possible pairs are unequal then no pair of values can be equal. With three variables, ... if (var1 == var2) | (var1 == var3) | (var2 == var3) is a way of checking for any matches. The more general (and interesting!) question remains. First off, presumably either all the variables are numeric or all are string; otherwise comparison makes no sense, to Stata at least. (Exceptions surely require -destring- to -tostring- to put variables into comparable storage types.) In my files I find programs defining two -egen- functions, one for numeric and one for string variables. Here is _grownvals.ado for finding the number of distinct values in each observation of a bunch of numeric variables. If the number of distinct values is fewer than the number of variables, matches have been found. I have hacked at the indentation to reduce the chance of misreading this. * program begins * NJC 1.0.1 28 Jan 2009 * NJC 1.0.0 7 Jan 2009 program _grownvals version 9 gettoken type 0 : 0 gettoken h 0 : 0 gettoken eqs 0 : 0 syntax varlist(numeric) [if] [in] [, BY(string) MISSing] if `"`by'"' != "" { _egennoby rownvals() `"`by'"' /* NOTREACHED */ } marksample touse, novarlist local miss = "`missing'" != "" quietly { mata : row_nvals("`varlist'", "`touse'", "`h'", "`type'", `miss') } end mata : void row_nvals(string scalar varnames, string scalar tousename, string scalar nvalsname, string scalar type, real scalar miss) { real matrix y real colvector nvals, row st_view(y, ., tokens(varnames), tousename) nvals = J(rows(y), 1, .) if (miss) { for(i = 1; i <= rows(y); i++) { row = y[i,]' nvals[i] = length(uniqrows(row)) } } else { for(i = 1; i <= rows(y); i++) { row = y[i,]' nvals[i] = length(uniqrows(select(row, (row :< .)))) } } st_addvar(type, nvalsname) st_store(., nvalsname, tousename, nvals) } end * program ends You would need to put this in a file _grownvals.ado along your -adopath- and call by egen nrowvals = rownvals(<varlist>) and then compare the number of distinct values with the number of variables. The algorithm here is pretty lousy and I would be very happy to learn of a smarter one. Numeric missings are ignored by default. The option -missing- overrides that. This function was mentioned in a review of technique in this territory, which should be of interest to anyone interested in this question. SJ-9-1 pr0046 . . . . . . . . . . . . . . . . . . . Speaking Stata: Rowwise (help rowsort, rowranks if installed) . . . . . . . . . . . N. J. Cox Q1/09 SJ 9(1):137--157 shows how to exploit functions, egen functions, and Mata for working rowwise; rowsort and rowranks are introduced That is easy to access at http://www.stata-journal.com/article.html?article=pr0046 (except at the moment I write I find no connection). It's stated in that paper that the functions are in -egenmore- (SSC). That was an intention never carried out, but no one has asked for this kind of thing before now. If you want the function for string variables, please ask. Nick njcoxstata@gmail.com On 13 February 2014 11:12, Viktor Emonds <Viktor.Emonds@soc.kuleuven.be> wrote: > I've been breaking my head over something that can probably be very easily done. I want to evaluate whether the value on any variables in a varlist matches with the value of any other variable in the last. If it were only three variables, this would amount to something like: > > if var1!=var2 & var1!=var3 & var2!=var3 > > How could I do this in a more elegant way when the varlist becomes much longer and the number of these 1-on-1 dissimilarity conditions would exponentially increase? * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/