Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Singeling out datasets containing variable X in folder with many stata files


From   Stas Kolenikov <skolenik@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Singeling out datasets containing variable X in folder with many stata files
Date   Tue, 14 Jun 2011 15:16:20 -0500

On Tue, Jun 14, 2011 at 6:14 AM, Lukas Maximilian Rudolph
<Lukas.Rudolph@campus.lmu.de> wrote:
> Dear Statalisters,
>
> I have a folder with many stata-files that I am about to merge. Some of these contain information on household level, some on individual level. Of these, some are are in wide, some are in long form.
>
> I now want to identify all file names that contain a certain variable:
> I want to seperate all files with the variable "pidlink", the individual identifier.
> Within these, I want to identify all files that contain a variable ending with "*type" as just these are in long form.
>
> Then I would be able to construct one loop that reshapes all datasets in long form and then another loop that merges all files on individual and household level automatically without going through every single file.
>
> My thought would have been to try to save the files in different folders conditional on whether the respective variable is contained - but save is not combinable with "if".

local allfiles : dir . files *.dta
tokenize `allfiles'
local idlist
local longlist
while `"`1'"' != `""' {
   use in 1 using `"`1'"'
   unab allvars : *
   if strpos("`allvars'", "pidlink") {
      local idlist `"`idlist' `1'"'
      if strpos("`allvars'", "type") {
          local longlist `"`longlist' `1'"
      }
   }
   macro shift
}

After that, the local `idlist' should contain all the files that have
-pidlink- variable, and the local `longlist', the subset of these that
have variables with names containing "type". It is likely that you
would want to parse `allvars' for the second filter using -regexp- so
that it is only activated when "type" is in the end of variable name.
But this should give you a starting point :)

-- 
Stas Kolenikov, also found at http://stas.kolenikov.name
Small print: I use this email account for mailing lists only.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index