dckersh

statalist@hsphsun2.harvard.edu

st: loops and strpos to test if an observations string variable value is also in another string variable

Thu, 24 Jun 2010 21:42:04 -0400

I could use some help on using loops and strpos() to see whether an observation’s string variable value is present in another string variable. I have statewide roster data for a number of different years. Within each year, schools track students in classes in different ways. Some schools are very detailed and keep track of classrooms at numerous times throughout the years. So you will have different records for a child that will contain the same teacher, course title, course code, but different sections, semesters, meeting times, etc. The result is a situation where the same "class" has different students (some students move, etc.). I am trying to find a way to identify which students (and the overall proportion of students that) were in a class in each instance. To do this, I want to be able to flag the first semester students who were also present in the second semester. A fairly simple way to do this is to select the second semester classes, reshape the data to one record per class, capture the names of students in the class in one variable, merge those names (that variable) onto the first semester version of that class, and then search for the last names of the students in the first semester of the class within the variable that captures the last names of the students in the class during the second semester. Skipping the reshaping and merging which I've figured out, I theoretically get what I want with the following code on data similar to below: gen in_sem_b = 0 replace in_sem_b = 1 if Child=="<NAME>" & strpos(class_sem2_names,"<NAME>")>0 replace in_sem_b = 1 if Child=="Smith" & strpos(class_sem2_names,"Smith")>0 *using data structured like this* Class Teacher Subject Semester Child in_sem_b class_sem2_names 1 Mrs. Fox Math a Smith 1 SmithJonesFoxTolmieKershawBarker 1 Mrs. Fox Math a Jones 1 SmithJonesFoxTolmieKershawBarker I have seen some code in older listserv posts that substitutes a relative `X' variable within strpos [strpos(class_sem2_names,`X' )] but I am at a loss as to what code I can use to simultaneously take each observation's name from the child variable and to search for it in the class_sem2_names variables (which will vary by class). Note that the same child may be in other classes, so the child names are not unique. I am open to all suggestions. Thanks for any assistance. Warm Regards, Dave Kershaw HERE’S MORE DETAILED DATA TO HIGHLIGHT WHAT I’M DOING *TRUNCATED RAW DATA - one class, two semesters Class Teacher Subject Semester Child 1 Mrs. Fox Math a Smith 1 Mrs. Fox Math a Jones 1 Mrs. Fox Math a Barker 1 Mrs. Fox Math a Kershaw 1 Mrs. Fox Math a Tanner 1 Mrs. Fox Math a Tolmie 2 Mrs. Fox Math b Smith 2 Mrs. Fox Math b Jones 2 Mrs. Fox Math b Fox 2 Mrs. Fox Math b Tolmie 2 Mrs. Fox Math b Kershaw 2 Mrs. Fox Math b Barker . . . *Data for only the first semester of a class only Class Teacher Subject Semester Child 1 Mrs. Fox Math a Smith 1 Mrs. Fox Math a Jones 1 Mrs. Fox Math a Barker 1 Mrs. Fox Math a Kershaw 1 Mrs. Fox Math a Tanner 1 Mrs. Fox Math a Tolmie *Data for the first semester of a class with names of 2nd merged, kids flagged. Class Teacher Subject Semester Child in_sem_b class_sem2_names 1 Mrs. Fox Math a Smith 1 SmithJonesFoxTolmieKershawBarker 1 Mrs. Fox Math a Jones 1 SmithJonesFoxTolmieKershawBarker 1 Mrs. Fox Math a Barker 1 SmithJonesFoxTolmieKershawBarker 1 Mrs. Fox Math a Kershaw 1 SmithJonesFoxTolmieKershawBarker 1 Mrs. Fox Math a Tanner 0 SmithJonesFoxTolmieKershawBarker 1 Mrs. Fox Math a Tolmie 1 SmithJonesFoxTolmieKershawBarker * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

