Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: create a variable based on a recurring value in a varlist


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: create a variable based on a recurring value in a varlist
Date   Thu, 13 Oct 2005 18:42:52 +0100

My code solved your problem as stated! But 
I appreciate that missings should be ignored. 

Try this. Here -1 will be returned in -same- 
if and only all PIDs are missing. 

gen long id = _n
reshape long PID, i(id)
bysort id (PID) : gen same = cond(mi(PID), -1, PID == PID[_n-1]) 
bysort id (same) : replace same = same[_N]
reshape wide

Nick 
n.j.cox@durham.ac.uk 

Derek Darves
 
> Thanks all for the comments.
> 
> I had to rewrite Nick's suggestion (see original message below) to  
> get this to work. In Nick's original formulation every case was  
> rendered a "1".  I think the problem is that some of the PID  
> variables were missing for nearly every case. So, I added a little  
> bit of code. I did some error checking and, for the cases 
> that it did  
> mark greater than 1, the data are correct. This does not mean, of  
> course, that I did not miss cases. Since my goal is not find a  
> repeated (non-missing) value in a varlist, will this code do the  
> trick. In other words, does anyone see a way that the code below  
> could have missed a repeated value in varlist? This is the code:
> *Start
> clear
> set mem 1000m
> use data, clear
> keep pid* index
> save safecopy, replace
> // Preparations for easy reshape
> local i 1
> foreach var of varlist pid* {
>      ren `var' pid`i++'
> }
> 
> // Solution for Problem
> reshape long pid, i(index) j(var)
> by index (pid), sort: gen same = sum(pid==pid[_n-1]) if pid!=.
> replace same = 0 if same ==.
> gen same1=0
> bysort index (same) : replace same1 = same[_N]
> drop same
> reshape wide
> 
> save shareddirector, replace
> *end
> 
> 
> On Oct 13, 2005, at 4:43 AM, Nick Cox wrote:
> 
> > This is easier done long.
> >
> > save safecopy
> >
> > gen long id = _n
> > reshape long PID, i(id)
> > bysort id (PID) : gen same = PID == PID[_n-1]
> > bysort id (same) : replace same = same[_N]
> > reshape wide
> >
> > Nick
> > n.j.cox@durham.ac.uk
> >
> > Seb Buechte
> >
> >
> >> you could take a "brute force" approach by comparing each 
> var with  
> >> all
> >> the other vars using two loops:
> >>
> >> gen interlock=0
> >> foreach var1 of varlist PID1 PID2 .... {
> >>     foreach var2 of varlist PID2 PID3.... {
> >>         if "`var1'"!="`var2'" { // making sure you do not 
> compare the
> >> var with itself
> >>            replace interlock=1 if `var1' == `var2'
> >>         }
> >>     }
> >> }
> >>
> >> I am not too sure how long it will take to run through these loops.
> >>
> >
> > Derek Darves
> >
> >
> >>> I have a group of variables:
> >>>
> >>> PID1 - PID15
> >>>
> >>> PID* takes on values from 1 to 8000, and many are missing.
> >>>
> >>> Basically, I would like to make a new variable, called interlock,
> >>> that is equal to 1 if any of the variables in the list 
> are equal to
> >>> any other variable in the list (not including itself, of course).
> >>> For example, if PID5==705 and PID14==705 I would like like
> >>>
> >> interlock==1
> >>
> >>>
> >>> Likewise, if none of the the variables in PID* take on 
> the value of
> >>> any of the other variables in PID*, I would like interlock==0
> >>>
> >>> I tried this:
> >>> egen interlock = group(pid1_a  pid1_b  pid2_a  pid2_b  pid3_a
> >>> pid3_b  pid4_a pid4_b  pid5_a  pid5_b  pid6_a  pid6_b  pid7_a
> >>> pid7_b  pid8_a  pid8_b  pid9_a  pid9_b  pid10_a  pid10_b  pid11_a
> >>> pid11_b  pid12_a  pid12_b  pid13_a  pid13_b  pid14_a
> >>>
> >> pid14_b   pid15_a)
> >>
> >>>
> >>> , but it returned all missing values when I know that some share a
> >>> common value in two of the PID* fields.
> >>>
> >>> Lastly, not that it should matter, but the above is a simplifying
> >>> example. In my actual dataset I have about 130 PID*
> >>>
> >> variables. I just
> >>
> >>> mention this in case I am hitting some kind of memory 
> limitation (I
> >>> am not receiving any errors when I run the command, 
> though, it just
> >>> doesn't work).
> >>>

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index