Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: How to loop a <clean> script over multiple files?


From   Nick Cox <n.j.cox@durham.ac.uk>
To   "'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu>
Subject   RE: st: How to loop a <clean> script over multiple files?
Date   Mon, 23 Jan 2012 20:17:32 +0000

Pleased you got there, and I understand what you mean, but note in passing how the word "format" is overloaded. 

-tostring- is not about changing formats in the sense of -format-: it is about forcing a change of variable type from numeric to strings. 

-reshape- although sometimes described as being about a change of dataset format describes itself as being about long and wide form. Elsewhere many Stata people talk about long or wide shape or structure. 

That's three senses of the word "format": 

1. Display formats for numbers and strings. 

2. How data are held as specific numeric or string variable types. 

3. How data are organised large scale: what defines an observation, and so forth. 

Nick 
n.j.cox@durham.ac.uk 

Brandon Olszewski

Dmitry - the code worked great. Thanks. I used -tostring- upfront for
some vars since they were in different formats in different files,
then cleaned them up all together later. And, you're quite right about
-append- vs. -merge- + -reshape- since I wanted long format all along.
Thanks to you and Nick. Solved!

On Mon, Jan 23, 2012 at 1:14 AM, Nick Cox <njcoxstata@gmail.com> wrote:

> Note the convention made explicit in the FAQ is that you are presumed
> to be using the current version of Stata unless you specify otherwise.
>
> Nick
>
> On Mon, Jan 23, 2012 at 1:20 AM, Brandon Olszewski
> <olszewski.brandon@gmail.com> wrote:
>> Thanks, Dmitry & Nick. I will tinker with it tomorrow when I'm back in
>> the office.
>> I've used -erase- for this kind of thing before. My office machine
>> runs Windows 7, but my home machine is Ubuntu/Linux, and I'm used to
>> the command terminal and, thus, the forward slashes.
>> We're updating to Stata12 soon, and I'll look for the -rename- options there.
>> Thanks again - hopefully I can get it to fly.
>>
>> On Sat, Jan 21, 2012 at 1:48 AM, Nick Cox <njcoxstata@gmail.com> wrote:
>>> If the OS were WIndows, !rm has its equivalent !del, but regardless of
>>> OS you could use the Stata command -erase-. From a tell-tale forward
>>> slash, I guess Brandon is using some flavour of Unix, but he could be
>>> in Windows and be being smart about using forward slashes in Stata.
>>>
>>> -renvars- (mentioned but not explained by Brandon)  is from SJ. The
>>> new -rename- in Stata 12 is as least as good and has a nicer syntax.
>>>
>>> -fs- (mentioned and explained by Dimitriy) is fine with files in two
>>> or more directories. You just need to say what they are.
>>>
>>> Nick
>>>
>>> On Sat, Jan 21, 2012 at 2:21 AM, Dimitriy V. Masterov
>>> <dvmaster@gmail.com> wrote:
>>>
>>>> Here's some toy code that seems to work as long as the files are all
>>>> in the same directory. You should be able to modify it pretty easily.
>>>>
>>>> I chose a slightly different strategy than you because I find it
>>>> easier to append multiple files than to merge them (though there are
>>>> ssc commands like nmerge and mergeall that you may want to look into).
>>>> You probably want the data in long format anyway.
>>>>
>>>> Let me and the list know if you have any problems. I think regular
>>>> expression/subinstr combo could be improved upon, but I am am too
>>>> lazy.
>>>>
>>>> DVM
>>>>
>>>> *********************************************************
>>>> #delimit;
>>>> clear all;
>>>>
>>>> cd "$lapdesktop"; // Change this!
>>>>
>>>> /* Create Fake Data */
>>>> forvalues v=1/24 {;
>>>>        sysuse auto, clear;
>>>>        outsheet using "data_file_`v'.txt", replace;
>>>> };
>>>>
>>>>
>>>> /* Import the txt data and do some renaming */
>>>> ssc install fs; // don't need to do it each time
>>>> fs data_file*.txt;
>>>>
>>>> foreach txtfile in `r(files)' {;
>>>>        insheet using "`txtfile'", clear;
>>>>        local date "`=regexr("`=subinstr("`txtfile'",".txt","",1)'","[^0-9]+","")'";
>>>> // keep only the numberic part of filename
>>>>        rename make id;
>>>>        gen date=`date';
>>>>        save sd_`date'.dta, replace;
>>>> };
>>>>
>>>>
>>>> /* Combine the Stata Files Into One And Reshape */
>>>> fs sd*.dta;
>>>>
>>>> clear;
>>>>
>>>> append using `r(files)';
>>>>
>>>> sort id date;
>>>>
>>>> reshape wide price-foreign, i(id) j(date);
>>>>
>>>>
>>>> /* Erase both the txt and individual Stata file */
>>>> !rm data_file*.txt;
>>>> !rm sd_*.dta;
>>>> *********************************************************

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index