Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: specifying all values different condition

From	Nick Cox <[email protected]>
To	"[email protected]" <[email protected]>
Subject	Re: st: specifying all values different condition
Date	Thu, 13 Feb 2014 12:11:01 +0000

-reshape-ing and then looking for -duplicates- is another strategy.
That has extra value if you want to do other kinds of row comparisons.
Nick
[email protected]


On 13 February 2014 12:05, Nick Cox <[email protected]> wrote:
> Not so with your syntax, I think. If all possible pairs are unequal
> then no pair of values can be equal.
>
> With three variables,
>
> ... if (var1 == var2) | (var1 == var3) | (var2 == var3)
>
> is a way of checking for any matches.
>
> The more general (and interesting!) question remains. First off,
> presumably either all the variables are numeric or all are string;
> otherwise comparison makes no sense, to Stata at least. (Exceptions
> surely require -destring- to -tostring- to put variables into
> comparable storage types.)
>
> In my files I find programs defining two -egen- functions, one for
> numeric and one for string variables.
>
> Here is _grownvals.ado for finding the number of distinct values in
> each observation of a bunch of numeric variables. If the number of
> distinct values is fewer than the number of variables, matches have
> been found. I have hacked at the indentation to reduce the chance of
> misreading this.
>
> * program begins
>
> * NJC 1.0.1 28 Jan 2009
> * NJC 1.0.0 7 Jan 2009
> program _grownvals
> version 9
> gettoken type 0 : 0
> gettoken h    0 : 0
> gettoken eqs  0 : 0
>
> syntax varlist(numeric) [if] [in] [, BY(string) MISSing]
> if `"`by'"' != "" {
> _egennoby rownvals() `"`by'"'   /* NOTREACHED */
> }
>
> marksample touse, novarlist
> local miss = "`missing'" != ""
> quietly {
> mata : row_nvals("`varlist'", "`touse'", "`h'", "`type'", `miss')
> }
> end
>
> mata :
>
> void row_nvals(string scalar varnames,
> string scalar tousename,
> string scalar nvalsname,
> string scalar type,
> real scalar miss)
> {
> real matrix y
> real colvector nvals, row
>
> st_view(y, ., tokens(varnames), tousename)
> nvals = J(rows(y), 1, .)
>
> if (miss) {
> for(i = 1; i <= rows(y); i++) {
> row = y[i,]'
> nvals[i] = length(uniqrows(row))
> }
> }
> else {
> for(i = 1; i <= rows(y); i++) {
> row = y[i,]'
> nvals[i] = length(uniqrows(select(row, (row :< .))))
> }
> }
>
> st_addvar(type, nvalsname)
> st_store(., nvalsname, tousename, nvals)
> }
>
> end
>
> * program ends
>
> You would need to put this in a file _grownvals.ado along your
> -adopath- and call by
>
> egen nrowvals = rownvals(<varlist>)
>
> and then compare the number of distinct values with the number of
> variables. The algorithm here is pretty lousy and I would be very
> happy to learn of a smarter one.
>
> Numeric missings are ignored by default. The option -missing- overrides that.
>
> This function was mentioned in a review of technique in this
> territory, which should be of interest to anyone interested in this
> question.
>
> SJ-9-1  pr0046  . . . . . . . . . . . . . . . . . . .  Speaking Stata: Rowwise
>         (help rowsort, rowranks if installed) . . . . . . . . . . .  N. J. Cox
>         Q1/09   SJ 9(1):137--157
>         shows how to exploit functions, egen functions, and Mata
>         for working rowwise; rowsort and rowranks are introduced
>
> That is easy to access at
> http://www.stata-journal.com/article.html?article=pr0046 (except at
> the moment I write I find no connection).
>
> It's stated in that paper that the functions are in -egenmore- (SSC).
> That was an intention never carried out, but no one has asked for this
> kind of thing before now.
>
> If you want the function for string variables, please ask.
>
> Nick
> [email protected]
>
>
> On 13 February 2014 11:12, Viktor Emonds <[email protected]> wrote:
>
>> I've been breaking my head over something that can probably be very easily done. I want to evaluate whether the value on any variables in a varlist matches with the value of any other variable in the last. If it were only three variables, this would amount to something like:
>>
>> if var1!=var2 & var1!=var3 & var2!=var3
>>
>> How could I do this in a more elegant way when the varlist becomes much longer and the number of these 1-on-1 dissimilarity conditions would exponentially increase?

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: specifying all values different condition
  - From: Nick Cox <[email protected]>

References:
- st: specifying all values different condition
  - From: Viktor Emonds <[email protected]>
- Re: st: specifying all values different condition
  - From: Nick Cox <[email protected]>

Prev by Date: Re: st: specifying all values different condition
Next by Date: Re: st: Endogeneity in zero inflated Poisson model
Previous by thread: Re: st: specifying all values different condition
Next by thread: Re: st: specifying all values different condition
Index(es):
- Date
- Thread