Fix UnicodeDecodeError in some VBA file handling #2380

LeonKohli · 2024-01-15T09:28:09Z

Summary

This pull request addresses the issue where xlwings encounters a UnicodeDecodeError when handling VBA files with encodings other than UTF-8. The proposed changes add a multi-encoding read strategy to the export_vba_modules, vba_edit, and vba_import functions in cli.py, allowing xlwings to handle files with different encodings such as 'ISO-8859-1' and 'cp1252', alongside the default 'utf-8'.

Changes

Modified export_vba_modules, vba_edit, and vba_import functions to try reading files with 'utf-8', 'ISO-8859-1', and 'cp1252' encodings.
Added error handling to raise UnicodeDecodeError if all encodings fail, providing clear feedback.

Justification

These changes enhance the robustness of xlwings in handling VBA files with various encodings, which is a common scenario in diverse environments. It ensures smoother functionality and reduces the likelihood of runtime errors due to encoding issues.

Testing

The modifications have been tested in various scenarios to ensure compatibility and functionality across different file encodings.

fzumstein · 2024-01-15T10:10:15Z

Thanks! Can you also provide instructions on how to reproduce this issue? See also #2335

LeonKohli · 2024-01-15T10:19:48Z

Hey @fzumstein

Thank you for following up on the pull request. Regarding the UnicodeDecodeError issue with xlwings, I encountered it while working on a large VBA project in my professional environment. Initially, I focused on resolving the issue directly as it was impacting our workflow, rather than setting up a detailed reproduction scenario.

However, I believe the problem arises when handling VBA files with specific Unicode characters that are not encoded in UTF-8, especially in large projects where various encoding standards might have been used over time

fzumstein · 2024-01-15T10:26:00Z

were you using files that were exported outside of xlwings?

LeonKohli · 2024-01-15T10:29:17Z

To clarify, the files I encountered the issue with were indeed part of a larger VBA project, and some of these files may have been exported or edited outside of xlwings before being reintegrated into the project. This mixed handling could be a contributing factor to the encoding discrepancies leading to the UnicodeDecodeError.

fzumstein · 2024-01-15T14:00:03Z

Right, in this case I think it's fair to expect that the user has to use xlwings to do the initial export though. The way you have it now is just covering your specific use case, it wouldn't work for this case: #2335
I'd rather allow users to specify a non-utf-8 encoding via command line switch, something like:
xlwings vba edit --encoding=cp932
I think it could be ok to loop through utf-8 and locale.getpreferredencoding() by default, as this is still generic.

LeonKohli · 2024-01-15T14:51:51Z

A command line switch for specifying encoding, offers greater flexibility for individual files.
However, I'd like to raise a concern regarding projects where multiple contributors have worked on different modules, potentially using varying encodings. In such scenarios, a single encoding specified via the command line might not be sufficient to handle all modules correctly.

Based on my project structure there are some modules using utf8 some using cp932 or LATIN-1

I understand this introduces additional complexity but it might be necessary to ensure robust handling of diverse and collaborative VBA projects.

zwackelfuss · 2024-03-28T08:59:22Z

Hello,
I am new to xlwings and just installed the latest version (0.31.0). Opening a file the way described here #2335 (comment) (Terminal vba edit --file) , the xlsm opens, but this error occurs:

  File "C:\Users\Heiko\AppData\Local\Programs\Python\Python39\lib\site-packages\xlwings\cli.py", line 757, in export_vba_modules
    exported_code = f.readlines()
  File "C:\Users\Heiko\AppData\Local\Programs\Python\Python39\lib\codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 354: invalid start byte
PS C:\Users\Heiko\Documents\testxl>

Is there anything I can do to resolve this problem?

fzumstein · 2024-03-28T14:56:17Z

@zwackelfuss if you can attach a sample workbook to replicate the issue, that would help. Also, if you can report your machine's locale:

zwackelfuss · 2024-03-28T18:50:12Z

Thank you Felix. See my language settings (German).

Odd: I opened the xlsm and deleted some simple cell values (name of persons), saved the file and opened it in xlwings.. without error. So it seems some of the names are the reason for the error. As the files contains names, I prefer sending you the file via mail instead of posting it here - hope this is ok for you.

EDIT: it even odder: the content is not important. Opening the file for the first time causes the error. Closing it and opening it again.. error is gone, all fine.

PedroWitzel · 2024-04-03T20:17:21Z

Hello, first time here 👋
I had a similar issue with a Brazilian Portuguese workbook, and grabbing this modification resolved the issue.

I'm running xlwings vba edit to do so.
I can do further testing if you go further with this, or if you require a 'broken' workbook.

ZwilleSmutje · 2024-04-17T10:39:31Z

I was able to get the export working when setting the reading part of cli.py>vba_edit to

Line 867 with open(path, "r", encoding="ISO 8859-1") as f:

JumperJu · 2025-04-01T01:29:15Z

Hey, thanks for the code!
I was having trouble with some Spanish characters, and your fix did the trick. 👍

berryloop · 2025-04-12T11:38:08Z

Hey, I've also stumbled upon this problem working with legacy code and looked into the code.
As far as I can tell, the import routine called in vba_import

xlwings/xlwings/cli.py

Line 812 in beb6853

book.api.VBProject.VBComponents.Import(path)

apparently expects ISO-8859-1 or Windows 1252 encoding while working with vba edit uses utf-8 encoding to read changes

xlwings/xlwings/cli.py

Line 871 in beb6853

with open(path, "r", encoding="utf-8") as f:

This inconsistent behaviour might lead to corrupted files, especially in projects featuring different encodings due to their history.

Could you maybe take a new look at harmonizing vba import to feature importing utf-8 encoded files directly? @fzumstein
That would be great!

LeonKohli added 2 commits January 15, 2024 10:14

feat: new Encoding handling in cli.py

29ea4ec

Update cli.py

cfdca34

LeonKohli changed the title ~~"Fix UnicodeDecodeError in some VBA file handling~~ Fix UnicodeDecodeError in some VBA file handling Jan 15, 2024

Uh oh!

Fix UnicodeDecodeError in some VBA file handling #2380

Are you sure you want to change the base?

Fix UnicodeDecodeError in some VBA file handling #2380

Uh oh!

Conversation

LeonKohli commented Jan 15, 2024

Summary

Changes

Justification

Testing

Uh oh!

fzumstein commented Jan 15, 2024

Uh oh!

LeonKohli commented Jan 15, 2024

Uh oh!

fzumstein commented Jan 15, 2024

Uh oh!

LeonKohli commented Jan 15, 2024

Uh oh!

fzumstein commented Jan 15, 2024

Uh oh!

LeonKohli commented Jan 15, 2024

Uh oh!

zwackelfuss commented Mar 28, 2024

Uh oh!

fzumstein commented Mar 28, 2024

Uh oh!

zwackelfuss commented Mar 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PedroWitzel commented Apr 3, 2024

Uh oh!

ZwilleSmutje commented Apr 17, 2024

Uh oh!

JumperJu commented Apr 1, 2025

Uh oh!

berryloop commented Apr 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

zwackelfuss commented Mar 28, 2024 •

edited

Loading

berryloop commented Apr 12, 2025 •

edited

Loading