Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <njcoxstata@gmail.com> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | Re: st: specifying all values different condition |
Date | Thu, 13 Feb 2014 12:11:01 +0000 |
-reshape-ing and then looking for -duplicates- is another strategy. That has extra value if you want to do other kinds of row comparisons. Nick njcoxstata@gmail.com On 13 February 2014 12:05, Nick Cox <njcoxstata@gmail.com> wrote: > Not so with your syntax, I think. If all possible pairs are unequal > then no pair of values can be equal. > > With three variables, > > ... if (var1 == var2) | (var1 == var3) | (var2 == var3) > > is a way of checking for any matches. > > The more general (and interesting!) question remains. First off, > presumably either all the variables are numeric or all are string; > otherwise comparison makes no sense, to Stata at least. (Exceptions > surely require -destring- to -tostring- to put variables into > comparable storage types.) > > In my files I find programs defining two -egen- functions, one for > numeric and one for string variables. > > Here is _grownvals.ado for finding the number of distinct values in > each observation of a bunch of numeric variables. If the number of > distinct values is fewer than the number of variables, matches have > been found. I have hacked at the indentation to reduce the chance of > misreading this. > > * program begins > > * NJC 1.0.1 28 Jan 2009 > * NJC 1.0.0 7 Jan 2009 > program _grownvals > version 9 > gettoken type 0 : 0 > gettoken h 0 : 0 > gettoken eqs 0 : 0 > > syntax varlist(numeric) [if] [in] [, BY(string) MISSing] > if `"`by'"' != "" { > _egennoby rownvals() `"`by'"' /* NOTREACHED */ > } > > marksample touse, novarlist > local miss = "`missing'" != "" > quietly { > mata : row_nvals("`varlist'", "`touse'", "`h'", "`type'", `miss') > } > end > > mata : > > void row_nvals(string scalar varnames, > string scalar tousename, > string scalar nvalsname, > string scalar type, > real scalar miss) > { > real matrix y > real colvector nvals, row > > st_view(y, ., tokens(varnames), tousename) > nvals = J(rows(y), 1, .) > > if (miss) { > for(i = 1; i <= rows(y); i++) { > row = y[i,]' > nvals[i] = length(uniqrows(row)) > } > } > else { > for(i = 1; i <= rows(y); i++) { > row = y[i,]' > nvals[i] = length(uniqrows(select(row, (row :< .)))) > } > } > > st_addvar(type, nvalsname) > st_store(., nvalsname, tousename, nvals) > } > > end > > * program ends > > You would need to put this in a file _grownvals.ado along your > -adopath- and call by > > egen nrowvals = rownvals(<varlist>) > > and then compare the number of distinct values with the number of > variables. The algorithm here is pretty lousy and I would be very > happy to learn of a smarter one. > > Numeric missings are ignored by default. The option -missing- overrides that. > > This function was mentioned in a review of technique in this > territory, which should be of interest to anyone interested in this > question. > > SJ-9-1 pr0046 . . . . . . . . . . . . . . . . . . . Speaking Stata: Rowwise > (help rowsort, rowranks if installed) . . . . . . . . . . . N. J. Cox > Q1/09 SJ 9(1):137--157 > shows how to exploit functions, egen functions, and Mata > for working rowwise; rowsort and rowranks are introduced > > That is easy to access at > http://www.stata-journal.com/article.html?article=pr0046 (except at > the moment I write I find no connection). > > It's stated in that paper that the functions are in -egenmore- (SSC). > That was an intention never carried out, but no one has asked for this > kind of thing before now. > > If you want the function for string variables, please ask. > > Nick > njcoxstata@gmail.com > > > On 13 February 2014 11:12, Viktor Emonds <Viktor.Emonds@soc.kuleuven.be> wrote: > >> I've been breaking my head over something that can probably be very easily done. I want to evaluate whether the value on any variables in a varlist matches with the value of any other variable in the last. If it were only three variables, this would amount to something like: >> >> if var1!=var2 & var1!=var3 & var2!=var3 >> >> How could I do this in a more elegant way when the varlist becomes much longer and the number of these 1-on-1 dissimilarity conditions would exponentially increase? * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/