Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Sergiy Radyakin <serjradyakin@gmail.com> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | Re: st: Using infile with varying lines() per observation |
Date | Fri, 26 Jul 2013 17:10:23 -0400 |
Kevin, I really want to help, but I don't understand your description of the data then. Your file declares variables C1, C2, C3, C4, where is C5 coming from (see your output)? I assume that C1, C2 etc are really some meaningful names, like age, education, gender. The program would not be able to 'invent' C5 even if it looks like an account number or a set of GPS coordinates. Don't use same letters for variables and values. If you use C in column names C1, C2,..., then use x,y,z for values. Values C D and E were all in the C3 column, how would they end up in different columns in the output? Where is C in your final output? Is it part of ID, or a separate variable? Do you want to discard it? And can I be sure that every subject will have at least two lines in the dataset? Are they always the same with A,B,C\A,B,D ? or can that vary? What is A B C D ? is that a number? a string? if a string is it a letter? a word? a sentence? I am also confused by the blank lines following each subject. Are they part of the file? (important) It would be better if you could take the 5-10 representative subjects from your actual dataset, change the names to Cameron Diaz, Angelina Jolie, Bill Gould and other legendary people, divide their incomes by 3.14, and then post the resulting file somewhere from where it can be web-read into Stata. That would take care of much of the questions. Sergiy On Fri, Jul 26, 2013 at 4:18 PM, Kevin McConeghy <kevinmcconeghy@gmail.com> wrote: > Thank you for your help Sergiy, however I did a bad job describing my > data. I am having trouble adapting your code. The columns are > fixed-format. > > > The id var=D is on the second line. > > C1 C2 C3 C4 > > A B C > A B D > B E F > B > B > B > > A B C > A B D > B E F > > A B C > A B D > B E F > B > B > > I need to convert so it is: > > C1 C2 id C4 C5 > > AA BBBBBB D E F > > AA BBB D E F > > AA BBBBB D E F > > > I apologize for being vague before. > > Kevin > > ------------------------------ > > Date: Thu, 25 Jul 2013 20:28:15 -0400 > From: Sergiy Radyakin <serjradyakin@gmail.com> > Subject: Re: st: Using infile with varying lines() per observation > > Kevin, considering your described setup the following should work: > > type http://radyakin.org/statalist/2013072501/testdata.txt > do http://radyakin.org/statalist/2013072501/readflex.do > > Here is the output: > > id col1 col2 col3 col4 > 1 A B C 7 > B > 2 A B C 1 > 3 A B C 90 > B > B > > > id col1 col2 col3 col4 > 1 A BB C 7 > 2 A B C 1 > 3 A BBB C 90 > > > It's up to you to make sure that 244 chars is enough for the whole BBB > value and that the numbers are completely located in the first line of > each subject. Id is assumed to be a string. > > Hope this helps, Sergiy Radyakin > > -- > Kevin McConeghy, PharmD, BCPS > 833 S Wood St, Chicago, IL 60612 > College of Pharmacy, Dept. of Pharmacy Practice > University of Illinois at Chicago > (312)-413-1422, kwm@uic.edu > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/