Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: import excel

From	Eric Booth <[email protected]>
To	[email protected]
Subject	Re: st: import excel
Date	Thu, 12 Apr 2012 10:52:38 -0500

<>

Try saving your excel file as tab-delimited text.  Then -insheet- the data with the no-names option and Stata sees those line breaks within cells from Excel (which are ascii char(10)) as spaces and then you can -split- as I described.  E.g., 

***********
**1. using tab-delimited version **
insheet using "test.txt", clear no names  //text.txt contains just one cell with the example you provided
list // reads in data as space delimited
split v1
rename v12 agent //and so on...
***********



-import excel- doesn't read these line breaks within Excel cells as spaces and does format them strangely when displayed/listed in the Result window (the text is overlapping).  Using -subinstr- string function, you can take out the line breaks and then use -split- as I described.

***********
clear
** using excel version **
import excel using "test.xlsx", clear

list
**Produces strange overlapping text in Result Window 
di as smcl `"See screenshot of overlapping text here:  {browse `"http://goo.gl/AE43R";'}"'



**examine with -charlist-
cap which charlist
if _rc ssc install charlist
charlist A
di `"`r(ascii)'"'


g B = subinstr(A, char(10), " | ", .)

split B, parse("|")


 l B1-B10

     +---------------------------------------------------------------------------------------+
     |                     B1      B2     B3     B4     B5     B6     B7     B8     B9   B10 |
     |---------------------------------------------------------------------------------------|
  1. | AgeatFinancingMonths1     116     73     69     57     41     34     17     14      0 |
     +---------------------------------------------------------------------------------------+


***********

Again, if there are other issues/chars that are causing issues, -filefilter- might be a good resource for pre-processing your file.


- Eric

__
Eric A. Booth
Public Policy Research Institute
Texas A&M University
[email protected]
Office: +979.845.6754


On Apr 12, 2012, at 10:10 AM, Jenniffer Solorzano Mosquera wrote:

> The specific problem is that after importing, the data fits into STATA without any problem (it doesn't exceed the 244 charterers restriction). The problem is that everything is compressed in the same cell without any spaces to split for. Example:
> 
> AgeatFinancingMonths1
> 116
> 73
> 69
> 57
> 41
> 34
> 17
> 14
> 0
> 
> 
> In STATA dta type stays like this...
> 
> AgeatFinancingMonths1
> 116736957413417140
> ( tel:116736957413417140) 
> There is no regular pattern to count the months or the continuous variables. I tried pasting the excel data first in a text editor (actually, pasting the data in the Eviews command window) first and then copy it from that editor and finally paste it into STATA editor. It worked but doing it column by column.. Could you guys have any thoughts to do this with a text editor or something? They preserve the formats...
> 
>>>> Eric Booth <[email protected]> 2012-04-12 15:49 >>>
> <>
> 
> No. There isn't any part of -import excel- that tells Stata to split those cells while importing.	 However, the second response in my last message does tell you some strategies & warning about how to split those cells after importing (or that you can try to pre-process the file). 
> 
> You have to tell Stata _how_ to split those cells/variables and that may vary variable by variable.  If you have a specific question about how to split those cells after importing or how to pre-process the file let us know.
> 
> - Eric
> 
> __
> Eric A. Booth
> Public Policy Research Institute 
> Texas A&M University
> [email protected] 
> +979.845.6754
> 
> 
> On Apr 12, 2012, at 8:41 AM, Jenniffer Solorzano Mosquera wrote:
> 
>> Thanks, but actually the problem is not how to do the loop. The problem is that several lines of different information are contained in the same cell in excel. Is there any way to make STATA understand please keep it as different cells, even if they aren't in excel but are tabulated one below the other one and so on?
>> 
>>>>> Eric Booth <[email protected]> 2012-04-12 15:16 >>>
>> 
>> <>
>> 
>> 
>> 
>> On Apr 12, 2012, at 5:22 AM, Jenniffer Solorzano Mosquera wrote:
>> 
>>> Hi,
>>> 
>>> Is there any way to import several files from excel with a loop in such a =
>>> way I preserve the original format of the data?
>>> 
>> 
>> Yes, I would use the techniques discussed in these threads for importing a large # of files using a loop and then converting to .dta format (and possibly combining/appending the data at the same time):
>> 
>> 
>> http://www.stata.com/statalist/archive/2012-04/msg00317.html "st: Re: Concatenate files"
>> http://www.stata.com/statalist/archive/2010-01/msg00239.html "Re: global macro for system directory?"
>> http://www.stata.com/statalist/archive/2010-11/msg00111.html "Re: st: repeat same commands over hundreds of files"
>> 
>> 
>>> I have 100 excel reports of 100 firms each one. The problem is that every =
>>> single row is a firm but the firm has many observations by date in the =
>>> same cell. In excel you can see that with format the information is =
>>> possible to read because they are separated in terms of format but they =
>>> are in the same cell.=20
>>> For example:
>>> This list is information of firm 1, and it is only in one cell:
>>> 09/01/2010
>>> 09/01/2009
>>> 09/01/2008
>>> 12/31/2007
>>> 12/31/2006
>>> This list is for firm 2, in the next row and so on:
>>> 04/01/2010
>>> 04/01/2009
>>> 04/01/2008
>>> 12/31/2007
>>> 12/31/2006
>>> When I import everything appears in only one cell in the dta file. The =
>>> problem is that there is no regular pattern of the variables. Some columns =
>>> are value of investments (not only dates) so you can not separate by =
>>> programming a regular loop to identify the information imported. I can =
>>> identify the dates because they are made from 10 digits always, but the =
>>> same doesn't happen with other type of information.
>>> Any recommendation?
>> 
>> 
>> You first need to inspect whether all your data in those excel long cells are making it into Stata when you -import- the file -- Stata has a string length lim. of 244 chars, so if any of those combined cells are too long, you'll have truncated data and need to consider pre-processing the file.  You can do that in Excel or possibly use -filefilter- (-help filefilter-).
>> 
>> If all the data in these long cells are making it into Stata, then your goal is to -split- the cells using either -split- or some combination of string functions (-help string_functions-).  
>> If all your cells looked like the data example you gave, then you could run:  
>> 
>> split v1
>> 
>> and it would create separate vars from this space-delimited string.  However, where you have  other delimiters in a cell/string, you'd need to adjust -split-'s parse() option (or some other string_function) accordingly.  Finally, if you have different numbers of elements within a string, you'll get different numbers of variables created by -split-, so you'll have to do some work to organize your new variables, but without seeing that data I couldn't guess about the strategy.
>> 
>> - Eric
>> 
>> 
>> 
>> __
>> Eric A. Booth
>> Public Policy Research Institute 
>> Texas A&M University
>> [email protected] 
>> +979.845.6754
>> 
>> 
>> 
>> 
>> 
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search 
>> *   http://www.stata.com/support/statalist/faq 
>> *   http://www.ats.ucla.edu/stat/stata/ 
>> 
>> 
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search 
>> *   http://www.stata.com/support/statalist/faq 
>> *   http://www.ats.ucla.edu/stat/stata/ 
> 
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search 
> *   http://www.stata.com/support/statalist/faq 
> *   http://www.ats.ucla.edu/stat/stata/
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: import excel
  - From: "Jenniffer Solorzano Mosquera" <[email protected]>

References:
- st: import excel
  - From: "Jenniffer Solorzano Mosquera" <[email protected]>
- Re: st: import excel
  - From: Eric Booth <[email protected]>
- Re: st: import excel
  - From: "Jenniffer Solorzano Mosquera" <[email protected]>
- Re: st: import excel
  - From: Eric Booth <[email protected]>
- Re: st: import excel
  - From: "Jenniffer Solorzano Mosquera" <[email protected]>

Prev by Date: Re: st: Bug with date(.,.) function?
Next by Date: st: Interpretation of oprobit coefficients
Previous by thread: Re: st: import excel
Next by thread: Re: st: import excel
Index(es):
- Date
- Thread