
Title | How Excel writes empty cells | |
Author | Paul Lin, StataCorp |
This problem has to do with how Excel writes empty cells into files. Pretend that a row of your spreadsheet reads
(1) (2) (3) (4) (5) (6) (7) (8) (9) +-----------------------------------------------------+ | | | | | | | | | | | 136 | 9.8 | | 64 | 108 | | | 7.8 | | | | | | | | | | | | +-----------------------------------------------------+
Excel writes this row in a file as
136,9.8,,64,108,,,7.8,
Stata can read this line without difficulty. Note the comma at the end of the line. It is because of this comma that Stata knows there is a ninth value; it is merely that the ninth value is missing in this observation.
Sometimes, however, Excel writes these same data as
136,9.8,,64,108,,,7.8
without the trailing comma. Thus it appears to Stata’s insheet command that the line contains eight rather than nine values, so insheet complains.
The conditions under which Excel does this are complicated. Basically, Excel reads a chunk of the spreadsheet and then looks at the chunk. If the last column in the chunk contains empty cells, it omits that column. In the resulting file, there may be some records showing nine columns, whereas others show eight, and even others show seven or fewer. Each will occur in a group to be determined by Excel’s buffer size.
If you run into this problem, here is the solution:
Adding a column of 1s will avoid the problem of empty cells in the last column, causing Excel to write the correct number of separators on each line.