Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

papandreou
Copy link
Contributor

Hi!

I ran into a problem where a trailing carriage in a cell (shared string) would come out as _x000D_ in the decoded worksheet. Turns out that OOXML has a weird escaping scheme intended for characters that cannot be represented in XML: https://www.robweir.com/blog/2008/03/ooxmls-out-of-control-characters.html

This PR makes shared strings with escapes come out right when reading .xlsx files. For completeness we should also make sure to introduce escape sequences for characters disallowed by XML (U+0004 END OF TRANSMISSION, U+0006 ACKNOWLEDGE, U+0007 BELL, U+0008 BACKSPACE, U+0017 SYNCHRONOUS IDLE). Also, we should encode underscores as _x005F_ when they are part of literal text that could otherwise be interpreted as an escape.

For the record, the enclosed test case was created in Excel by entering _x000D_ into a cell. In the shared strings table that comes out as:

<si><t>_x005F_x000D_</t></si>

Similar libraries in other languages have been dealing with this as well, both while reading and writing .xlsx, eg.:

@guyonroche guyonroche merged commit ec4cd23 into exceljs:master Jun 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants