Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: specifying all values different condition

From	Nick Cox <[email protected]>
To	"[email protected]" <[email protected]>
Subject	Re: st: specifying all values different condition
Date	Thu, 13 Feb 2014 12:05:24 +0000

Not so with your syntax, I think. If all possible pairs are unequal
then no pair of values can be equal.

With three variables,

... if (var1 == var2) | (var1 == var3) | (var2 == var3)

is a way of checking for any matches.

The more general (and interesting!) question remains. First off,
presumably either all the variables are numeric or all are string;
otherwise comparison makes no sense, to Stata at least. (Exceptions
surely require -destring- to -tostring- to put variables into
comparable storage types.)

In my files I find programs defining two -egen- functions, one for
numeric and one for string variables.

Here is _grownvals.ado for finding the number of distinct values in
each observation of a bunch of numeric variables. If the number of
distinct values is fewer than the number of variables, matches have
been found. I have hacked at the indentation to reduce the chance of
misreading this.

* program begins

* NJC 1.0.1 28 Jan 2009
* NJC 1.0.0 7 Jan 2009
program _grownvals
version 9
gettoken type 0 : 0
gettoken h    0 : 0
gettoken eqs  0 : 0

syntax varlist(numeric) [if] [in] [, BY(string) MISSing]
if `"`by'"' != "" {
_egennoby rownvals() `"`by'"'   /* NOTREACHED */
}

marksample touse, novarlist
local miss = "`missing'" != ""
quietly {
mata : row_nvals("`varlist'", "`touse'", "`h'", "`type'", `miss')
}
end

mata :

void row_nvals(string scalar varnames,
string scalar tousename,
string scalar nvalsname,
string scalar type,
real scalar miss)
{
real matrix y
real colvector nvals, row

st_view(y, ., tokens(varnames), tousename)
nvals = J(rows(y), 1, .)

if (miss) {
for(i = 1; i <= rows(y); i++) {
row = y[i,]'
nvals[i] = length(uniqrows(row))
}
}
else {
for(i = 1; i <= rows(y); i++) {
row = y[i,]'
nvals[i] = length(uniqrows(select(row, (row :< .))))
}
}

st_addvar(type, nvalsname)
st_store(., nvalsname, tousename, nvals)
}

end

* program ends

You would need to put this in a file _grownvals.ado along your
-adopath- and call by

egen nrowvals = rownvals(<varlist>)

and then compare the number of distinct values with the number of
variables. The algorithm here is pretty lousy and I would be very
happy to learn of a smarter one.

Numeric missings are ignored by default. The option -missing- overrides that.

This function was mentioned in a review of technique in this
territory, which should be of interest to anyone interested in this
question.

SJ-9-1  pr0046  . . . . . . . . . . . . . . . . . . .  Speaking Stata: Rowwise
        (help rowsort, rowranks if installed) . . . . . . . . . . .  N. J. Cox
        Q1/09   SJ 9(1):137--157
        shows how to exploit functions, egen functions, and Mata
        for working rowwise; rowsort and rowranks are introduced

That is easy to access at
http://www.stata-journal.com/article.html?article=pr0046 (except at
the moment I write I find no connection).

It's stated in that paper that the functions are in -egenmore- (SSC).
That was an intention never carried out, but no one has asked for this
kind of thing before now.

If you want the function for string variables, please ask.

Nick
[email protected]

On 13 February 2014 11:12, Viktor Emonds <[email protected]> wrote:

> I've been breaking my head over something that can probably be very easily done. I want to evaluate whether the value on any variables in a varlist matches with the value of any other variable in the last. If it were only three variables, this would amount to something like:
>
> if var1!=var2 & var1!=var3 & var2!=var3
>
> How could I do this in a more elegant way when the varlist becomes much longer and the number of these 1-on-1 dissimilarity conditions would exponentially increase?

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: specifying all values different condition
  - From: Nick Cox <[email protected]>

References:
- st: specifying all values different condition
  - From: Viktor Emonds <[email protected]>

Prev by Date: st: adjusted means for survey data
Next by Date: Re: st: specifying all values different condition
Previous by thread: st: specifying all values different condition
Next by thread: Re: st: specifying all values different condition
Index(es):
- Date
- Thread