Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | "Buzz Burhans" <buzzb3@earthlink.net> |
To | <statalist@hsphsun2.harvard.edu> |
Subject | RE: st: shifting existing values to subsequent variables in a row |
Date | Sun, 8 May 2011 10:00:49 -0600 |
Thanks Nick. Reshape is one of the strategies I was thinking of. I thought though perhaps there might be an easier or faster way involving exporting the ID variable and the block of affected variables to a Mata matrix, then renaming the columns of the matrix such that the new col name is the previous column name that was one column offset to the right, then saving the Mata matrix with the renamed column names as a .dta file, and then merging it back in to the original file. Or possibly this could be done by interacting Mata and Stata without the need to create the .dta file to be merged back in. I have not previously used Mata though, and am not sure how to accomplish either of these in Mata, at least without spending what seems to require a lot of time to learn enough Mata, which I can't see happening soon because of other time commitments. Ultimately, I will need to reshape this data anyway so your suggestion is quite appropriate, but it seems easier to do some of the data inspection and cleaning (such as dropping some observations that are not usable for other reasons) before I did the reshape. Thanks for the suggestion. Buzz Buzz Burhans, Ph.D. Dairy-Tech Group So. Albany, VT / Twin Falls ID Cell: 208-320-0829 ID Fax: 208-735-1289 VT Fax: 802-755-6842 Email: buzzb3@earthlink.net From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox Sent: Sunday, May 08, 2011 1:31 AM To: statalist@hsphsun2.harvard.edu Subject: Re: st: shifting existing values to subsequent variables in a row I'd -reshape- such data to long. Then use -by:- and subscripting. Nick On Sat, May 7, 2011 at 8:14 PM, Buzz Burhans <buzzb3@earthlink.net> wrote: > I have been given a several datasets with ~50K observations and 197 > variables. > > Of the 197 variables, 180 are obtained from 20 measurement occasions where 9 > measurements were made. After the initial 17 variables describing the > subject (in my case individual cows), there are a series of 9 measurements > from the first measurement occasion, repeated in the same order for each > subsequent measurement occasion > > Inspecting the data, it is clear that in a small but substantial proportion > of the observations/cases the last datum is not present for a given > measurement occasion, with the consequence that the value provided for all > the subsequent variables is shifted left by one variable. It is quite clear > from the content of the data in the remaining variables in a row ( a > mixture of text and > numbers) that the values are offset one column to the left, i.e. one > variable to the left. > > I can fairly easily identify the "bad" observations with an -if- statement. > Such observations occur randomly and not usually consecutively. The > problem can occur in > multiple subsets of the data, as it appears to have occurred with the final > datum on different measurement occasions (but not every one, only in some > instances), so whatever procedure I come up with will need to be reusable > for application to data that is shifted after a later measurement occasion. > I am trying to come up with an easy way to shift all the remaining > values back to the correct variable. The 9 variables on each occasion all > have different names, though within a > measurement occasion the prefix is the same for all 9 measurements > (variables) attached to that occasion (i.e. day1this day1that day1other > day1more...) > > The data look something like this for 2 observations where the first is > ok, the second with a absent missing value and subsequent values shifted > right:: > > var1 - var25 day1No8 day1last day2first day2second > day2third..... > ___________|________|________|_______|___________|___________.... > > Ok values....| 100 Hot 25 > 120 UP > Bad values..| 100 25 120 > UP > > What I need is: > > var1 - var25 day1No8 day1last day2first day2second > day2third..... > ___________|________|________|_______|___________|___________.... > > Ok values....| 100 Hot 25 > 120 UP > shift values.| 100 . 25 > 120 UP > > I would appreciate any ideas on how to accomplish this efficiently when > applied to multiple observations. > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ ________________________________________ No virus found in this message. Checked by AVG - www.avg.com Version: 10.0.1325 / Virus Database: 1500/3623 - Release Date: 05/07/11 * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/