Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: specifying all values different condition


From   Nick Cox <njcoxstata@gmail.com>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   Re: st: specifying all values different condition
Date   Thu, 13 Feb 2014 12:05:24 +0000

Not so with your syntax, I think. If all possible pairs are unequal
then no pair of values can be equal.

With three variables,

... if (var1 == var2) | (var1 == var3) | (var2 == var3)

is a way of checking for any matches.

The more general (and interesting!) question remains. First off,
presumably either all the variables are numeric or all are string;
otherwise comparison makes no sense, to Stata at least. (Exceptions
surely require -destring- to -tostring- to put variables into
comparable storage types.)

In my files I find programs defining two -egen- functions, one for
numeric and one for string variables.

Here is _grownvals.ado for finding the number of distinct values in
each observation of a bunch of numeric variables. If the number of
distinct values is fewer than the number of variables, matches have
been found. I have hacked at the indentation to reduce the chance of
misreading this.

* program begins

* NJC 1.0.1 28 Jan 2009
* NJC 1.0.0 7 Jan 2009
program _grownvals
version 9
gettoken type 0 : 0
gettoken h    0 : 0
gettoken eqs  0 : 0

syntax varlist(numeric) [if] [in] [, BY(string) MISSing]
if `"`by'"' != "" {
_egennoby rownvals() `"`by'"'   /* NOTREACHED */
}

marksample touse, novarlist
local miss = "`missing'" != ""
quietly {
mata : row_nvals("`varlist'", "`touse'", "`h'", "`type'", `miss')
}
end

mata :

void row_nvals(string scalar varnames,
string scalar tousename,
string scalar nvalsname,
string scalar type,
real scalar miss)
{
real matrix y
real colvector nvals, row

st_view(y, ., tokens(varnames), tousename)
nvals = J(rows(y), 1, .)

if (miss) {
for(i = 1; i <= rows(y); i++) {
row = y[i,]'
nvals[i] = length(uniqrows(row))
}
}
else {
for(i = 1; i <= rows(y); i++) {
row = y[i,]'
nvals[i] = length(uniqrows(select(row, (row :< .))))
}
}

st_addvar(type, nvalsname)
st_store(., nvalsname, tousename, nvals)
}

end

* program ends

You would need to put this in a file _grownvals.ado along your
-adopath- and call by

egen nrowvals = rownvals(<varlist>)

and then compare the number of distinct values with the number of
variables. The algorithm here is pretty lousy and I would be very
happy to learn of a smarter one.

Numeric missings are ignored by default. The option -missing- overrides that.

This function was mentioned in a review of technique in this
territory, which should be of interest to anyone interested in this
question.

SJ-9-1  pr0046  . . . . . . . . . . . . . . . . . . .  Speaking Stata: Rowwise
        (help rowsort, rowranks if installed) . . . . . . . . . . .  N. J. Cox
        Q1/09   SJ 9(1):137--157
        shows how to exploit functions, egen functions, and Mata
        for working rowwise; rowsort and rowranks are introduced

That is easy to access at
http://www.stata-journal.com/article.html?article=pr0046 (except at
the moment I write I find no connection).

It's stated in that paper that the functions are in -egenmore- (SSC).
That was an intention never carried out, but no one has asked for this
kind of thing before now.

If you want the function for string variables, please ask.

Nick
njcoxstata@gmail.com


On 13 February 2014 11:12, Viktor Emonds <Viktor.Emonds@soc.kuleuven.be> wrote:

> I've been breaking my head over something that can probably be very easily done. I want to evaluate whether the value on any variables in a varlist matches with the value of any other variable in the last. If it were only three variables, this would amount to something like:
>
> if var1!=var2 & var1!=var3 & var2!=var3
>
> How could I do this in a more elegant way when the varlist becomes much longer and the number of these 1-on-1 dissimilarity conditions would exponentially increase?

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index