Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: create a variable based on a recurring value in a varlist

From   Derek Darves <>
Subject   Re: st: create a variable based on a recurring value in a varlist
Date   Thu, 13 Oct 2005 10:32:46 -0700

Thanks all for the comments.

I had to rewrite Nick's suggestion (see original message below) to get this to work. In Nick's original formulation every case was rendered a "1". I think the problem is that some of the PID variables were missing for nearly every case. So, I added a little bit of code. I did some error checking and, for the cases that it did mark greater than 1, the data are correct. This does not mean, of course, that I did not miss cases. Since my goal is not find a repeated (non-missing) value in a varlist, will this code do the trick. In other words, does anyone see a way that the code below could have missed a repeated value in varlist? This is the code:
set mem 1000m
use data, clear
keep pid* index
save safecopy, replace
// Preparations for easy reshape
local i 1
foreach var of varlist pid* {
ren `var' pid`i++'

// Solution for Problem
reshape long pid, i(index) j(var)
by index (pid), sort: gen same = sum(pid==pid[_n-1]) if pid!=.
replace same = 0 if same ==.
gen same1=0
bysort index (same) : replace same1 = same[_N]
drop same
reshape wide

save shareddirector, replace

On Oct 13, 2005, at 4:43 AM, Nick Cox wrote:

This is easier done long.

save safecopy

gen long id = _n
reshape long PID, i(id)
bysort id (PID) : gen same = PID == PID[_n-1]
bysort id (same) : replace same = same[_N]
reshape wide


Seb Buechte

you could take a "brute force" approach by comparing each var with all
the other vars using two loops:

gen interlock=0
foreach var1 of varlist PID1 PID2 .... {
foreach var2 of varlist PID2 PID3.... {
if "`var1'"!="`var2'" { // making sure you do not compare the
var with itself
replace interlock=1 if `var1' == `var2'

I am not too sure how long it will take to run through these loops.

Derek Darves

I have a group of variables:

PID1 - PID15

PID* takes on values from 1 to 8000, and many are missing.

Basically, I would like to make a new variable, called interlock,
that is equal to 1 if any of the variables in the list are equal to
any other variable in the list (not including itself, of course).
For example, if PID5==705 and PID14==705 I would like like


Likewise, if none of the the variables in PID* take on the value of
any of the other variables in PID*, I would like interlock==0

I tried this:
egen interlock = group(pid1_a  pid1_b  pid2_a  pid2_b  pid3_a
pid3_b  pid4_a pid4_b  pid5_a  pid5_b  pid6_a  pid6_b  pid7_a
pid7_b  pid8_a  pid8_b  pid9_a  pid9_b  pid10_a  pid10_b  pid11_a
pid11_b  pid12_a  pid12_b  pid13_a  pid13_b  pid14_a

pid14_b   pid15_a)

, but it returned all missing values when I know that some share a
common value in two of the PID* fields.

Lastly, not that it should matter, but the above is a simplifying
example. In my actual dataset I have about 130 PID*

variables. I just

mention this in case I am hitting some kind of memory limitation (I
am not receiving any errors when I run the command, though, it just
doesn't work).

*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index