Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: Replacing grouped string values with longest string value in the group


From   Nick Cox <[email protected]>
To   "'[email protected]'" <[email protected]>
Subject   st: RE: Replacing grouped string values with longest string value in the group
Date   Thu, 19 Aug 2010 18:36:16 +0100

No loop is needed. 

gen length = length(Symptom)
bysort slike (length) : replace symptom = symptom[_N] 

For a tutorial on -by:- see 

SJ-2-1  pr0004  . . . . . . . . . . Speaking Stata:  How to move step by: step
        Q1/02   SJ 2(1):86--102                                  (no commands)
        explains the use of the by varlist : construct to tackle
        a variety of problems with group structure, ranging from
        simple calculations for each of several groups to more
        advanced manipulations that use the built-in _n and _N

Your loop is failing because of a confusion between -if- command and -if- qualifier. -if- as you are using it looks only at the first observation and in no will sense loop as you would wish. 

See

FAQ     . . . . . . . . . . . . . . . . . . . . .  if command vs. if qualifier
        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  J. Wernow
        6/00    I have an if or while command in my program that
                only seems to evaluate the first observation,
                what's going on?
                http://www.stata.com/support/faqs/lang/ifqualifier.html

In addition, your -foreach- loop is just a loop over one value. 

But there is no need to fix your loop as the code above can be used. 


Nick 
[email protected] 

McDermaid, Cameron
Stata 11 SE and a new STATA user

I want to replace string values in a group to the longest string value
from the same variable within the group.

I have a dataset with a number of symptoms that come from different
hospitals. Because of differences in coding practices and errors, there
can be differences in values (e.g. spelling).  I will not know how many
symptom values will be present in a given data set and the symptom
values are always strings. I've grouped the variables by soundex code
and want to use the longest Symptom value to replace the shorter ones in
the group. 

The intent is to write an ado file that anyone can run with minimum
interfacing to generate frequencies of Symptoms after they've been
processed as above.

I've processed the data such that the longest value by slike(soundex)
group is flagged:

Symptom	                	|slike	|slength	|slongest
ALTEREDLEVELOFCONSCIOUSNESS	|A436		|27		|1
ALTERED CONSCIOUSNESS		|A436		|20		|0
ALTERED CONSCIOUSNESS		|A436		|20		|0
BLURREDVISION			|B463		|13		|1
CONVULSIONS				|C514		|11
|1
DIZZY					|D200		|5
|1
DIZZINESS/VERTIGO			|D252		|17
|1
DIZZY/VERTIGO			|D252		|13		|0

In this instance ALTEREDLEVELOFCONSCIOUSNESS should replace ALTERED
CONSCIOUSNESS and DIZZINESS/VERTIGO should replace DIZZY/VERTIGO.

My approach has been:

.levelsof slike, local(sound)   
.foreach 1 of local sound {      
.if slike=="`1'" & slongest==1 {    
.local name=Symptom		 
.}
.replace Symptom="`name'" if slike=="`1'" & slongest==0   
.}

What appears to be happening is that the local macro `name' does not get
re-assigned with the Symptom value corresponding to the next level of
slike in the foreach loop.  All values of Symptom with an slongest value
of 0 are recoded to the first symptom: ALTEREDLEVELOFCONSCIOUSNESS

I suspect my approach is close, but the loop is failing to function as I
expected.  Any suggestions or alternate approaches would be greatly
appreciated.


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index