Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: import excel


From   Eric Booth <eric.a.booth@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: import excel
Date   Thu, 12 Apr 2012 08:16:57 -0500

<>



On Apr 12, 2012, at 5:22 AM, Jenniffer Solorzano Mosquera wrote:

> Hi,
> 
> Is there any way to import several files from excel with a loop in such a =
> way I preserve the original format of the data?
> 

Yes, I would use the techniques discussed in these threads for importing a large # of files using a loop and then converting to .dta format (and possibly combining/appending the data at the same time):


http://www.stata.com/statalist/archive/2012-04/msg00317.html "st: Re: Concatenate files"
http://www.stata.com/statalist/archive/2010-01/msg00239.html "Re: global macro for system directory?"
http://www.stata.com/statalist/archive/2010-11/msg00111.html "Re: st: repeat same commands over hundreds of files"


> I have 100 excel reports of 100 firms each one. The problem is that every =
> single row is a firm but the firm has many observations by date in the =
> same cell. In excel you can see that with format the information is =
> possible to read because they are separated in terms of format but they =
> are in the same cell.=20
> For example:
> This list is information of firm 1, and it is only in one cell:
> 09/01/2010
> 09/01/2009
> 09/01/2008
> 12/31/2007
> 12/31/2006
> This list is for firm 2, in the next row and so on:
> 04/01/2010
> 04/01/2009
> 04/01/2008
> 12/31/2007
> 12/31/2006
> When I import everything appears in only one cell in the dta file. The =
> problem is that there is no regular pattern of the variables. Some columns =
> are value of investments (not only dates) so you can not separate by =
> programming a regular loop to identify the information imported. I can =
> identify the dates because they are made from 10 digits always, but the =
> same doesn't happen with other type of information.
> Any recommendation?


You first need to inspect whether all your data in those excel long cells are making it into Stata when you -import- the file -- Stata has a string length lim. of 244 chars, so if any of those combined cells are too long, you'll have truncated data and need to consider pre-processing the file.  You can do that in Excel or possibly use -filefilter- (-help filefilter-).

If all the data in these long cells are making it into Stata, then your goal is to -split- the cells using either -split- or some combination of string functions (-help string_functions-).  
If all your cells looked like the data example you gave, then you could run:  

 split v1

and it would create separate vars from this space-delimited string.  However, where you have  other delimiters in a cell/string, you'd need to adjust -split-'s parse() option (or some other string_function) accordingly.  Finally, if you have different numbers of elements within a string, you'll get different numbers of variables created by -split-, so you'll have to do some work to organize your new variables, but without seeing that data I couldn't guess about the strategy.

- Eric



__
Eric A. Booth
Public Policy Research Institute 
Texas A&M University
ebooth@ppri.tamu.edu
+979.845.6754





*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index