Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Stata bug? - Mixture of European/US dates using import excel


From   "Phillips, Patrick" <[email protected]>
To   "[email protected]" <[email protected]>
Subject   RE: st: Stata bug? - Mixture of European/US dates using import excel
Date   Fri, 7 Feb 2014 23:16:51 +0000

Thanks, Phil, for that helpful and thoughtful response.

I guess that problem I have with Stata is your comment: ' For dates, it does this by converting the date into the human readable format %tdnn/dd/CCYY.'

I am not in the US and my computer (windows, MS officem, etc.) is set to show dates as %tddd/nn/CCYY. It would be good if Stata could know this rather than assuming I want mm/dd/CCYY. [Why is there not a c(dateformat)?]

The work-around is saving the Excel spreadsheet as CSV (Excel saves 'dates' as dd/mm/ccyy as it knows I am European) and then read that into Stata - as I used to do before Stata 12!

The moral is that an Excel spreadsheet is a poor way of transferring data.

Thanks for your help,

Patrick



_____________________________
Patrick PJ Phillips, PhD
Senior Statistician
MRC Clinical Trials Unit at UCL
Institute of Clinical Trials & Methodology
Aviation House
125 Kingsway
London  WC2B 6NH

Tel: +44 (0)20 7670 4637
Fax: +44 (0)20 7670 4815
e-mail: [email protected]
Skype: patrick.pj.phillips
Website: http://www.ctu.mrc.ac.uk/
Twitter: @MRCCTU
_____________________________

Please note my new email address: [email protected] 
[email protected] and [email protected] will continue to work for a short period

-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Phil Schumm
Sent: 07 February 2014 22:45
To: Statalist Statalist
Subject: Re: st: Stata bug? - Mixture of European/US dates using import excel 

On Feb 7, 2014, at 3:52 PM, Phillips, Patrick <[email protected]> wrote:
> If a spreadsheet has a column with a mixture of dates and strings, Stata will read this column in as a string when using the import excel command.


Correct -- what else could it do?  If it reads the column as numeric, then it cannot preserve the string values.


> If a date is stored in excel as 'text' it is read into Stata directly as string. 


Again, that makes sense.  If the column is stored in Excel as text, Stata takes it at its word.  I wouldn't want Stata looking at the data and trying to guess what the string values represent (Excel does this, and it can lead to serious errors, e.g., http://www.theregister.co.uk/2004/07/16/excel_vanishing_dna/).


> However, if a date is stored in excel as a 'date' it is converted to mm/dd/yyyy format and read into Stata as a string. 


Yes, if Stata determines that a column contains text values, then it must read that column as text, translating any numeric values it finds into a reasonable text representation.  For dates, it does this by converting the date into the human readable format %tdnn/dd/CCYY.


> You can see that if dates are formatted in excel as dd/mm/yyyy, they will then be read into Stata in a mixture of mm/dd/yyyy and dd/mm/yyyy formats depending on whether excel 'thinks' it is text or a date! 
> 
> Try copying the following three lines into excel (the white space at the front of the second row is important) and saving as an excel spreadsheet.
> 
> 20/02/2013
> 20/02/2013
> Text
> 
> When I read this into stata it looks like the following:
> 
>      +-------------+
>      |           A |
>      |-------------|
>   1. |   2/20/2013 |
>   2. |  20/02/2013 |
>   3. |        Text |
>      +-------------+
> 
> Very worrying as it is impossible to tease out which might be dd/mm/yyyy and which mm/dd/yyyy! Has anyone else had this same problem? Does anyone have any solutions?


This is exactly as expected.  In your example, the first value is stored in Excel as a date, so Stata converts it to a string representation of that date.  The next two values are strings (the space causes " 20/02/2013" to be interpreted by Excel as a string), and are read by Stata exactly as-is.

In sum, this isn't a bug, but rather merely a function of the fact that Stata treats an entire column/variable as a uniform data type (i.e., numeric or string) whereas in Excel, the cells within a column are unrestricted.  If you are receiving Excel files in which string and numeric (including date) values are mixed within a single column, then you need to be *very* careful when importing them into Stata (the same would be true of R, Python, etc.).  Best to see if you can change the way in which the original (raw) Excel files are being generated, if possible.


-- Phil


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/



*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index