Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | "Buzz Burhans" <buzzb3@earthlink.net> |
To | <statalist@hsphsun2.harvard.edu> |
Subject | st: shifting existing values to subsequent variables in a row |
Date | Sat, 7 May 2011 13:14:40 -0600 |
I have been given a several datasets with ~50K observations and 197 variables. Of the 197 variables, 180 are obtained from 20 measurement occasions where 9 measurements were made. After the initial 17 variables describing the subject (in my case individual cows), there are a series of 9 measurements from the first measurement occasion, repeated in the same order for each subsequent measurement occasion Inspecting the data, it is clear that in a small but substantial proportion of the observations/cases the last datum is not present for a given measurement occasion, with the consequence that the value provided for all the subsequent variables is shifted left by one variable. It is quite clear from the content of the data in the remaining variables in a row ( a mixture of text and numbers) that the values are offset one column to the left, i.e. one variable to the left. I can fairly easily identify the "bad" observations with an -if- statement. Such observations occur randomly and not usually consecutively. The problem can occur in multiple subsets of the data, as it appears to have occurred with the final datum on different measurement occasions (but not every one, only in some instances), so whatever procedure I come up with will need to be reusable for application to data that is shifted after a later measurement occasion. I am trying to come up with an easy way to shift all the remaining values back to the correct variable. The 9 variables on each occasion all have different names, though within a measurement occasion the prefix is the same for all 9 measurements (variables) attached to that occasion (i.e. day1this day1that day1other day1more...) The data look something like this for 2 observations where the first is ok, the second with a absent missing value and subsequent values shifted right:: var1 - var25 day1No8 day1last day2first day2second day2third..... ___________|________|________|_______|___________|___________.... Ok values....| 100 Hot 25 120 UP Bad values..| 100 25 120 UP What I need is: var1 - var25 day1No8 day1last day2first day2second day2third..... ___________|________|________|_______|___________|___________.... Ok values....| 100 Hot 25 120 UP shift values.| 100 . 25 120 UP I would appreciate any ideas on how to accomplish this efficiently when applied to multiple observations. Buzz Buzz Burhans, Ph.D. Dairy-Tech Group So. Albany, VT / Twin Falls ID Cell: 208-320-0829 ID Fax: 208-735-1289 VT Fax: 802-755-6842 Email: buzzb3@earthlink.net * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/