Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Strings dropped when importing Excel file
From
Sergiy Radyakin <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: Strings dropped when importing Excel file
Date
Fri, 8 Nov 2013 16:15:53 -0500
Friedrich,
even if the file contains no formulas, it might still contain errors.
I know it sounds strange, but this is true. There is a content-type
flag that indicates a cell as an error, and it might be set regardless
of what content the cell has. Consider the following example:
http://radyakin.org/statalist/2013110801/errs.xlsx
All values (in column A) were typed directly from the keyboard. No
tricks. Note formatting differs (alignment) for values RECOGNIZED as
errors by Excel as errors. I didn't adjust that at all.
Now if we look at how the file is saved, you will see here (extract):
http://radyakin.org/statalist/2013110801/errs.png
that some cells have t="e" and some cells have t="s"
If it is an "e" - the cell is erroneous. Stata imports it as a missing
(blank for strings).
If it is an "s" - it is a string, Stata pulls the corresponding string
from the strings repository (separate file, not shown).
Stata itself doesn't enumerate particular error messages. And there
are more, not only the ones you have mentioned so far, see here:
http://www.techonthenet.com/excel/formulas/error_type.php
Instead Stata seems to be looking at the flag of cell type. But it
doesn't have to. Instead StataCorp could have decided to import the
value as an extended missing. Wait! You can say there are no extended
missing values in strings! Well in Stata 13 you can come up with a way
to deal with it. Especially now that you have binary strings. Even
without serious file format modifications, the values could have been
preserved as e.g. "char(13)+value", since we know char(13) can't be
part of the string in Excel. Later your code would simply truncate the
marker (since you don't need it). Alternatively, StataCorp could have
imported all values as valid, and let the user recode all #something!
values into the missings in strings if needed. Or... I'll stop here.
Again, there are multiple ways to resolve the problem in your current
situation without waiting for Stata to evolve. What I suggested before
is still valid - just create, say, another column and recompute the
values there making sure the errors (and they are really errors, the
ISERROR() function returns true for such cells) are replaced with
something else.
Largely the choice of method depends of whether you have one such
file? or 100? or they are coming every day?
There is no key or option in Stata's -import excel- facility that
would help you with that. StataCorp itself might extend the
functionality in the future, but as far as version 13.1 stands now, I
see no other option but to manipulate the file externally, either with
Excel itself or with another tool available. For example, if your file
contains only strings, as you describe, save it from Excel as
tab-separated txt file. Then insheet it into Stata.
Best, Sergiy Radyakin
On Fri, Nov 8, 2013 at 2:13 PM, Friedrich Huebler <[email protected]> wrote:
> Thank you, Sergiy.
>
> The Excel file that I am trying to import contains only text, no
> formulas. The entries #DIV/0! (which is also replaced by an empty cell
> as I discovered), #N/A and #VALUE! are stored as text in their
> respective cells, together with other text in the same columns. The
> Excel file that you created contains a formula that yields a #DIV/0!
> error and this formula (=7/A2) can be seen when one clicks on the
> cell. My file is different, there are no formulas, and therefore I
> would expect all variables to be imported without changes.
>
> Friedrich
>
> On Fri, Nov 8, 2013 at 1:52 PM, Sergiy Radyakin <[email protected]> wrote:
>> Friedrich,
>>
>> to the best of my knowledge the importer would take the
>> 'cached'/'computed' value for each cell. These are stored in the excel
>> file by Excel itself. Sometimes, if you open really old file, Excel
>> would tell you, that these values were saved after computing the
>> formulas using an older version of Excel, and it needs to recompute
>> them in the newer version (this warning has many reasons to be
>> crucial, as recomputing simply may not succeed at all).
>>
>> Anyways, both values that you mention seem to be not 'values' but
>> representation - something that Excel shows you in the cell, based not
>> only on the genuine cell value, but also the formula, and other cell
>> attributes (note that you have TEXT separate from VALUE separate from
>> FORMULA for each cell). In this case these are representations of the
>> errors.
>>
>> Depending on the complexity of the Excel file different suggestions
>> might be suitable. My first suggestion would be to use Excel's
>> functions ISERR() and IFERROR() to obtain values, that you can later
>> manage in Stata.
>> See the example here:
>> http://radyakin.org/statalist/2013110801/func.xlsx
>> (I expect you would have the problem you described with col B but not col C)
>>
>> If you have errors of different types, use the ERROR.TYPE() function
>> to distinguish between them. Not sure which Excel versions have this
>> function, it might be fairly recent. Based on this reference, after
>> 2000:
>> http://www.techonthenet.com/excel/formulas/error_type.php
>>
>> Hope this helps. Don't hesitate to ask if you need more information.
>>
>> Best, Sergiy Radyakin
>>
>>
>>
>>
>> On Fri, Nov 8, 2013 at 1:23 PM, Friedrich Huebler <[email protected]> wrote:
>>> I use Stata 12.1. When importing an Excel file with string variables
>>> (cell format "Text"), the variables are correctly imported as strings
>>> but the following entries are replaced by empty cells in Stata:
>>>
>>> #N/A
>>> #VALUE!
>>>
>>> Is it possible to keep these strings? I found no information on this
>>> in the documentation for -import excel-. The -allstring- option makes
>>> no difference.
>>>
>>> Thanks,
>>>
>>> Friedrich
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/