Thanks to visit codestin.com
Credit goes to github.com

Skip to content

test_tkinter leaks files in the C locale #107705

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
serhiy-storchaka opened this issue Aug 7, 2023 · 10 comments
Closed

test_tkinter leaks files in the C locale #107705

serhiy-storchaka opened this issue Aug 7, 2023 · 10 comments
Labels
topic-tkinter topic-unicode type-bug An unexpected behavior, bug, or error

Comments

@serhiy-storchaka
Copy link
Member

serhiy-storchaka commented Aug 7, 2023

$ LC_ALL=C ./python -m test -vuall test_tkinter -m 'test_write'
...
Warning -- files was modified by test_tkinter
Warning --   Before: []
Warning --   After:  ['@test_1061191_tmp\udce6'] 
Warning -- files was modified by test_tkinter
Warning --   Before: []
Warning --   After:  ['@test_1061191_tmp\udce6'] 
test_tkinter failed (env changed)
...

Since it is only a warning and occurs only in the C locale, it was unnoticed.

In the C locale Tcl uses Latin1 to encode filenames, while Python uses UTF-8.

Linked PRs

@serhiy-storchaka serhiy-storchaka added type-bug An unexpected behavior, bug, or error topic-tkinter topic-unicode labels Aug 7, 2023
@terryjreedy
Copy link
Member

terryjreedy commented Aug 7, 2023

Also seems to be *nix specific. On Windows, that test invocation, with -m 'test_write', runs no tests. Omitting that runs 741 tests, with no warning after executing set LC_ALL=C.

@serhiy-storchaka
Copy link
Member Author

On Windows you should not use single quotes (actually they are not needed here on Posix too, it is a remnant of other filters).

@terryjreedy
Copy link
Member

terryjreedy commented Aug 7, 2023

Duh. With double quotes or none, test_write runs and passes without warning. But the debug build says it is running with "locale=cp1252". It seems that the LC_ALL setting is ignored on Windows.

@vstinner
Copy link
Member

AsObj() has 3 code paths to encode a Python str object to Tcl/Tk:

#if USE_TCL_UNICODE
        if (sizeof(Tcl_UniChar) == 2)
            encoded = _PyUnicode_EncodeUTF16(value,
                    "surrogatepass", NATIVE_BYTEORDER);
        else if (sizeof(Tcl_UniChar) == 4)
            encoded = _PyUnicode_EncodeUTF32(value,
                    "surrogatepass", NATIVE_BYTEORDER);
        else
            Py_UNREACHABLE();
#else
        encoded = _PyUnicode_AsUTF8String(value, "surrogateescape");
#endif

From Python, I don't see any way to detect how to know which code path is taken: USE_TCL_UNICODE and sizeof(Tcl_UniChar) are not exposed in the Python API. Not even logged by test.pythoninfo. Well, USE_TCL_UNICODE is not hard to guess :-)

#ifdef MS_WINDOWS
#define USE_TCL_UNICODE 1
#else
#define USE_TCL_UNICODE 0
#endif

Maybe sizeof(Tcl_UniChar) should be exposed in the _tkinter extension.


TESTFN is '@test_452347_tmpæ' string in my case. This string can be encoded to UTF-8:

>>> '@test_452347_tmpæ'.encode('utf8')
b'@test_452347_tmp\xc3\xa6'

I checked that FromObj(AsObj()) gives back the same string.

No, the problem is that the Tcl/Tk calls ('::img::test', 'write', '@test_452819_tmpæ') doesn't handle non-ASCII characters and escapes them using surrogateescape for an unknown reason :-(

For now, honestly, I would suggest to simply use an ASCII filename.

@serhiy-storchaka
Copy link
Member Author

No, it has no relation to how Tkinter convert strings between Python and Tcl.

Tcl uses Latin1 encoding for the C locale, while Python uses UTF-8. Tcl command

open \u00e6 w

creates file with name consisting of one byte 0xe6 in the C locale.

It would be better to document this, but there is currently no place in the documentation where it would be appropriate.

And we need to add a workaround in the test.

@vstinner
Copy link
Member

vstinner commented Oct 7, 2023

It would be better to document this, but there is currently no place in the documentation where it would be appropriate.

Would it be possible to update the Python tkinter module to use the expected encoding, rather than documentation an issue?

@serhiy-storchaka
Copy link
Member Author

Possible -- maybe, but with a great difficulty. You need to add a list of Tcl command that takes or returns a path, a list of options that correspond to a path, a list of variables that store paths, and on every conversion between Python and Tcl look up in these tables and re-code string from UTF-8 to Latin1 and back. It may be even more difficult if there are other conditions of if the required information is not available at the point of conversion.

@vstinner
Copy link
Member

vstinner commented Oct 7, 2023

Can we configure Tcl to use another encoding than Latin1?

@serhiy-storchaka
Copy link
Member Author

Setting locale to C.UTF-8 helps.

But this is application-level change.

miss-islington pushed a commit to miss-islington/cpython that referenced this issue Oct 14, 2023
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Oct 14, 2023
serhiy-storchaka added a commit that referenced this issue Oct 14, 2023
serhiy-storchaka added a commit that referenced this issue Oct 14, 2023
@vstinner
Copy link
Member

Thanks for the fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-tkinter topic-unicode type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

3 participants