Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

LeonKohli
Copy link

Summary

This pull request addresses the issue where xlwings encounters a UnicodeDecodeError when handling VBA files with encodings other than UTF-8. The proposed changes add a multi-encoding read strategy to the export_vba_modules, vba_edit, and vba_import functions in cli.py, allowing xlwings to handle files with different encodings such as 'ISO-8859-1' and 'cp1252', alongside the default 'utf-8'.

Changes

  • Modified export_vba_modules, vba_edit, and vba_import functions to try reading files with 'utf-8', 'ISO-8859-1', and 'cp1252' encodings.
  • Added error handling to raise UnicodeDecodeError if all encodings fail, providing clear feedback.

Justification

These changes enhance the robustness of xlwings in handling VBA files with various encodings, which is a common scenario in diverse environments. It ensures smoother functionality and reduces the likelihood of runtime errors due to encoding issues.

Testing

The modifications have been tested in various scenarios to ensure compatibility and functionality across different file encodings.

@LeonKohli LeonKohli changed the title "Fix UnicodeDecodeError in some VBA file handling Fix UnicodeDecodeError in some VBA file handling Jan 15, 2024
@fzumstein
Copy link
Member

Thanks! Can you also provide instructions on how to reproduce this issue? See also #2335

@LeonKohli
Copy link
Author

Hey @fzumstein

Thank you for following up on the pull request. Regarding the UnicodeDecodeError issue with xlwings, I encountered it while working on a large VBA project in my professional environment. Initially, I focused on resolving the issue directly as it was impacting our workflow, rather than setting up a detailed reproduction scenario.

However, I believe the problem arises when handling VBA files with specific Unicode characters that are not encoded in UTF-8, especially in large projects where various encoding standards might have been used over time

@fzumstein
Copy link
Member

were you using files that were exported outside of xlwings?

@LeonKohli
Copy link
Author

To clarify, the files I encountered the issue with were indeed part of a larger VBA project, and some of these files may have been exported or edited outside of xlwings before being reintegrated into the project. This mixed handling could be a contributing factor to the encoding discrepancies leading to the UnicodeDecodeError.

@fzumstein
Copy link
Member

Right, in this case I think it's fair to expect that the user has to use xlwings to do the initial export though. The way you have it now is just covering your specific use case, it wouldn't work for this case: #2335
I'd rather allow users to specify a non-utf-8 encoding via command line switch, something like:
xlwings vba edit --encoding=cp932
I think it could be ok to loop through utf-8 and locale.getpreferredencoding() by default, as this is still generic.

@LeonKohli
Copy link
Author

A command line switch for specifying encoding, offers greater flexibility for individual files.
However, I'd like to raise a concern regarding projects where multiple contributors have worked on different modules, potentially using varying encodings. In such scenarios, a single encoding specified via the command line might not be sufficient to handle all modules correctly.

Based on my project structure there are some modules using utf8 some using cp932 or LATIN-1

I understand this introduces additional complexity but it might be necessary to ensure robust handling of diverse and collaborative VBA projects.

@zwackelfuss
Copy link

Hello,
I am new to xlwings and just installed the latest version (0.31.0). Opening a file the way described here #2335 (comment) (Terminal vba edit --file) , the xlsm opens, but this error occurs:

  File "C:\Users\Heiko\AppData\Local\Programs\Python\Python39\lib\site-packages\xlwings\cli.py", line 757, in export_vba_modules
    exported_code = f.readlines()
  File "C:\Users\Heiko\AppData\Local\Programs\Python\Python39\lib\codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 354: invalid start byte
PS C:\Users\Heiko\Documents\testxl> 

Is there anything I can do to resolve this problem?

@fzumstein
Copy link
Member

@zwackelfuss if you can attach a sample workbook to replicate the issue, that would help. Also, if you can report your machine's locale:

image

@zwackelfuss
Copy link

zwackelfuss commented Mar 28, 2024

Thank you Felix. See my language settings (German).
image
Odd: I opened the xlsm and deleted some simple cell values (name of persons), saved the file and opened it in xlwings.. without error. So it seems some of the names are the reason for the error. As the files contains names, I prefer sending you the file via mail instead of posting it here - hope this is ok for you.

EDIT: it even odder: the content is not important. Opening the file for the first time causes the error. Closing it and opening it again.. error is gone, all fine.

@PedroWitzel
Copy link

Hello, first time here πŸ‘‹
I had a similar issue with a Brazilian Portuguese workbook, and grabbing this modification resolved the issue.

I'm running xlwings vba edit to do so.
I can do further testing if you go further with this, or if you require a 'broken' workbook.

@ZwilleSmutje
Copy link

I was able to get the export working when setting the reading part of cli.py>vba_edit to

Line 867 with open(path, "r", encoding="ISO 8859-1") as f:

@JumperJu
Copy link

JumperJu commented Apr 1, 2025

Hey, thanks for the code!
I was having trouble with some Spanish characters, and your fix did the trick. πŸ‘

@berryloop
Copy link

berryloop commented Apr 12, 2025

Hey, I've also stumbled upon this problem working with legacy code and looked into the code.
As far as I can tell, the import routine called in vba_import

book.api.VBProject.VBComponents.Import(path)

apparently expects ISO-8859-1 or Windows 1252 encoding while working with vba edit uses utf-8 encoding to read changes
with open(path, "r", encoding="utf-8") as f:

This inconsistent behaviour might lead to corrupted files, especially in projects featuring different encodings due to their history.

Could you maybe take a new look at harmonizing vba import to feature importing utf-8 encoded files directly? @fzumstein
That would be great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants