Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: A problem while dealing with massive amount of data


From   Neil Shephard <[email protected]>
To   [email protected]
Subject   Re: st: A problem while dealing with massive amount of data
Date   Tue, 28 Jun 2011 09:45:26 +0100

On 28 June 2011 09:19, Mayank Mishra <[email protected]> wrote:
> Hello all,
>
> I have around two thousand .csv file in a folder which I need to clean
> and save as stata .dta file. For this I am running a loop in which
> -insheet- command takes up a file, then it gets cleaned and saved.
> There is a variable named "option_typ" which is used twice in the loop
> while cleaning. The problem is, in some files this variable is named
> as "optiontype". For those files, this do file gives an error and loop
> stops as it cannot find a variable named "option_typ". What makes it
> worse is that I don't know, which file have different variable name
> than used in the do file. So, please tell me what I can do for this
> situation.

You don't state which operating system your working on, but if your on
a *NIX based system you could easily use 'grep' to search all your
files and tell you just which files match (using the '-l' switch) or
those that don't match (using the '-L' switch), for example...

$ grep -l 'option_typ' *.csv > files_matching_option_typ.txt
$ grep -L 'option_typ' *.csv > files_not_matching_option_typ.txt

...will give you two files, whose names should be self-explanatory.
You can then use these lists to loop over specific files appropriately
depending on their contents.

If you're not on a *NIX system you could achieve this under M$-Windows
by installing the UNIX-like shell Cygwin (see http://x.cygwin.com/).

Neil

-- 
“Truth in science can be defined as the working hypothesis best suited
to open the way to the next better one.” - Konrad Lorenz

Email - [email protected]
Website - http://kimura.no-ip.org/
Photos - http://www.flickr.com/photos/slackline/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index