Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: specifying all values different condition


From   Nick Cox <[email protected]>
To   "[email protected]" <[email protected]>
Subject   Re: st: specifying all values different condition
Date   Thu, 13 Feb 2014 13:57:21 +0000

I see that the title and the content clash slightly, but reconciliation is easy.

"all values different" corresponds to e.g.

if (var1 != var2) | (var1 != var3) | (var2 != var3)

"any values match" to e.g.

if (var1 == var2) | (var1 == var3) | (var2 == var3)

Nick
[email protected]


On 13 February 2014 12:11, Nick Cox <[email protected]> wrote:
> -reshape-ing and then looking for -duplicates- is another strategy.
> That has extra value if you want to do other kinds of row comparisons.
> Nick
> [email protected]
>
>
> On 13 February 2014 12:05, Nick Cox <[email protected]> wrote:
>> Not so with your syntax, I think. If all possible pairs are unequal
>> then no pair of values can be equal.
>>
>> With three variables,
>>
>> ... if (var1 == var2) | (var1 == var3) | (var2 == var3)
>>
>> is a way of checking for any matches.
>>
>> The more general (and interesting!) question remains. First off,
>> presumably either all the variables are numeric or all are string;
>> otherwise comparison makes no sense, to Stata at least. (Exceptions
>> surely require -destring- to -tostring- to put variables into
>> comparable storage types.)
>>
>> In my files I find programs defining two -egen- functions, one for
>> numeric and one for string variables.
>>
>> Here is _grownvals.ado for finding the number of distinct values in
>> each observation of a bunch of numeric variables. If the number of
>> distinct values is fewer than the number of variables, matches have
>> been found. I have hacked at the indentation to reduce the chance of
>> misreading this.
>>
>> * program begins
>>
>> * NJC 1.0.1 28 Jan 2009
>> * NJC 1.0.0 7 Jan 2009
>> program _grownvals
>> version 9
>> gettoken type 0 : 0
>> gettoken h    0 : 0
>> gettoken eqs  0 : 0
>>
>> syntax varlist(numeric) [if] [in] [, BY(string) MISSing]
>> if `"`by'"' != "" {
>> _egennoby rownvals() `"`by'"'   /* NOTREACHED */
>> }
>>
>> marksample touse, novarlist
>> local miss = "`missing'" != ""
>> quietly {
>> mata : row_nvals("`varlist'", "`touse'", "`h'", "`type'", `miss')
>> }
>> end
>>
>> mata :
>>
>> void row_nvals(string scalar varnames,
>> string scalar tousename,
>> string scalar nvalsname,
>> string scalar type,
>> real scalar miss)
>> {
>> real matrix y
>> real colvector nvals, row
>>
>> st_view(y, ., tokens(varnames), tousename)
>> nvals = J(rows(y), 1, .)
>>
>> if (miss) {
>> for(i = 1; i <= rows(y); i++) {
>> row = y[i,]'
>> nvals[i] = length(uniqrows(row))
>> }
>> }
>> else {
>> for(i = 1; i <= rows(y); i++) {
>> row = y[i,]'
>> nvals[i] = length(uniqrows(select(row, (row :< .))))
>> }
>> }
>>
>> st_addvar(type, nvalsname)
>> st_store(., nvalsname, tousename, nvals)
>> }
>>
>> end
>>
>> * program ends
>>
>> You would need to put this in a file _grownvals.ado along your
>> -adopath- and call by
>>
>> egen nrowvals = rownvals(<varlist>)
>>
>> and then compare the number of distinct values with the number of
>> variables. The algorithm here is pretty lousy and I would be very
>> happy to learn of a smarter one.
>>
>> Numeric missings are ignored by default. The option -missing- overrides that.
>>
>> This function was mentioned in a review of technique in this
>> territory, which should be of interest to anyone interested in this
>> question.
>>
>> SJ-9-1  pr0046  . . . . . . . . . . . . . . . . . . .  Speaking Stata: Rowwise
>>         (help rowsort, rowranks if installed) . . . . . . . . . . .  N. J. Cox
>>         Q1/09   SJ 9(1):137--157
>>         shows how to exploit functions, egen functions, and Mata
>>         for working rowwise; rowsort and rowranks are introduced
>>
>> That is easy to access at
>> http://www.stata-journal.com/article.html?article=pr0046 (except at
>> the moment I write I find no connection).
>>
>> It's stated in that paper that the functions are in -egenmore- (SSC).
>> That was an intention never carried out, but no one has asked for this
>> kind of thing before now.
>>
>> If you want the function for string variables, please ask.
>>
>> Nick
>> [email protected]
>>
>>
>> On 13 February 2014 11:12, Viktor Emonds <[email protected]> wrote:
>>
>>> I've been breaking my head over something that can probably be very easily done. I want to evaluate whether the value on any variables in a varlist matches with the value of any other variable in the last. If it were only three variables, this would amount to something like:
>>>
>>> if var1!=var2 & var1!=var3 & var2!=var3
>>>
>>> How could I do this in a more elegant way when the varlist becomes much longer and the number of these 1-on-1 dissimilarity conditions would exponentially increase?

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index