Make execpath() work correctly in Unicode build. #7251

dEajL3kA · 2021-06-14T09:48:52Z

Note: GetModuleFileNameA() won't work if the path of 'curl.exe' contains any characters that cannot be represented in system's ANSI codepage. So, instead of GetModuleFileNameA(), we should use GetModuleFileName(), which will expand to GetModuleFileNameW() in Unicode builds. Also adapted code to use TCHAR and appropriate generic-text routine mappings.

Note: The ANSI version of GetModuleFileName() won't work if the path of 'curl.exe' contains any characters that cannot be represented in system's ANSI codepage.

bagder · 2021-06-14T14:19:09Z

Build problems

x86_64-w64-mingw32-gcc -DHAVE_CONFIG_H   -I../include -I../lib -I../src -I../lib -I../src -DCURL_STATICLIB -isystem C:/msys64/mingw64/include -isystem C:/msys64/mingw64/include -DWINVER=0x0600  -Werror-implicit-function-declaration -O2 -Wno-system-headers -Wenum-conversion -Werror -pedantic-errors -MT tool_parsecfg.o -MD -MP -MF $depbase.Tpo -c -o tool_parsecfg.o tool_parsecfg.c &&\
mv -f $depbase.Tpo $depbase.Po
In file included from C:/msys64/mingw64/x86_64-w64-mingw32/include/minwindef.h:163,
                 from C:/msys64/mingw64/x86_64-w64-mingw32/include/windef.h:9,
                 from C:/msys64/mingw64/x86_64-w64-mingw32/include/windows.h:69,
                 from C:/msys64/mingw64/x86_64-w64-mingw32/include/winsock2.h:23,
                 from ../include/curl/system.h:422,
                 from ../include/curl/curl.h:38,
                 from ../lib/curl_setup.h:162,
                 from tool_setup.h:36,
                 from tool_parsecfg.c:22:
tool_parsecfg.c: In function 'execpath':
../lib/curl_setup.h:488:25: error: passing argument 2 of 'strrchr' makes integer from pointer without a cast [-Wint-conversion]
  488 | #  define DIR_CHAR      "\\"
      |                         ^~~~
      |                         |
      |                         char *
tool_parsecfg.c:57:52: note: in expansion of macro 'DIR_CHAR'
   57 |     TCHAR *lastdirchar = _tcsrchr(filebuffer, TEXT(DIR_CHAR));
      |                                                    ^~~~~~~~
In file included from C:/msys64/mingw64/x86_64-w64-mingw32/include/guiddef.h:154,
                 from C:/msys64/mingw64/x86_64-w64-mingw32/include/winnt.h:635,
                 from C:/msys64/mingw64/x86_64-w64-mingw32/include/minwindef.h:163,
                 from C:/msys64/mingw64/x86_64-w64-mingw32/include/windef.h:9,
                 from C:/msys64/mingw64/x86_64-w64-mingw32/include/windows.h:69,
                 from C:/msys64/mingw64/x86_64-w64-mingw32/include/winsock2.h:23,
                 from ../include/curl/system.h:422,
                 from ../include/curl/curl.h:38,
                 from ../lib/curl_setup.h:162,
                 from tool_setup.h:36,
                 from tool_parsecfg.c:22:
C:/msys64/mingw64/x86_64-w64-mingw32/include/string.h:93:60: note: expected 'int' but argument is of type 'char *'
   93 |   _CONST_RETURN char *__cdecl strrchr(const char *_Str,int _Ch);
      |                                                        ~~~~^~~

Code style issues

./tool_parsecfg.c:56:5: warning: if with space (SPACEBEFOREPAREN)
   if ((len > 0) && (len < MAX_MODULE_FILE_LENGTH)) {
     ^
./tool_parsecfg.c:61:81: warning: Longer than 79 columns (LONGLINE)
     if (_tcslen(filebuffer) + _tcslen(filename) + 1U < MAX_MODULE_FILE_LENGTH) {
./tool_parsecfg.c:61:7: warning: if with space (SPACEBEFOREPAREN)
     if (_tcslen(filebuffer) + _tcslen(filename) + 1U < MAX_MODULE_FILE_LENGTH) {
       ^
./tool_parsecfg.c:62:7: warning: use of _tcscat is banned (BANNEDFUNC)
       _tcscat(filebuffer, TEXT(DIR_CHAR));
       ^
./tool_parsecfg.c:63:7: warning: use of _tcscat is banned (BANNEDFUNC)
       _tcscat(filebuffer, filename);
       ^

dEajL3kA · 2021-06-14T14:47:51Z

Don't understand why _tcscat is "banned". It is a generic-text routine mapping that will expand to strcat or wcscat as needed, so that Unicode and ANSI build can both be supported with the same code. What should I use instead ❓

bagder · 2021-06-14T15:05:13Z

strcat() is similarly also banned. They're prohibited simply because they're too easy to misuse due to them not having any buffer size argument/boundary. Alternative functions to use are those that make sure that the target buffer is not overflowed. Like msnprintf() or plain old strncpy() etc.

dEajL3kA · 2021-06-14T15:11:32Z

I see. Does msnprintf() have a generic TCHAR version? Is _tcsncat() allowed?

ghost · 2021-06-14T16:29:32Z

Congratulations 🎉. DeepCode analyzed your code in 2.209 seconds and we found no issues. Enjoy a moment of no bugs ☀️.

👉 View analysis in DeepCode’s Dashboard | Configure the bot

👉 The DeepCode service and API will be deprecated in August, 2021. Here is the information how to migrate. Thank you for using DeepCode 🙏 ❤️ !

If you are using our plugins, you might be interested in their successors: Snyk's JetBrains plugin and Snyk's VS Code plugin.

vszakats · 2021-06-15T09:11:22Z

According to the CHECKSRC doc, *ncat() functions are not allowed and msnprintf() does not have a Windows-specific variant either.

A workable approach might be to convert the Win32 API result to UTF-8 right away and continue using the existing char string manipulation logic.

dEajL3kA · 2021-06-15T09:30:18Z

According to the CHECKSRC doc, *ncat() functions are not allowed and msnprintf() does not have a Windows-specific variant either.

I see how _tcsncat() can cause troubles, if the num parameter is not determined carefully. But I really wish we had a _tsomething() macro that does concatenation "safely". Probably many use cases for that, resulting in simpler code.

A workable approach might be to convert the Win32 API result to UTF-8 right away and continue using the existing char string manipulation logic.

I really wanted to avoid converting from UTF-16 to UTF-8 just for the string concatenation; we'd have to convert back to UTF-16 for the fopen (effectively wfopen) right away. Not only would this be inefficient, but also not better – except that it satisfies the code checker (cargo cult). Also the code would be more bloated/less readable with the additional (unneccessary) conversions.

So, latest version of the patch now uses _tcsncpy, which I hope is okay.

vszakats · 2021-06-15T09:36:50Z

@dEajL3kA: Fair point, I totally missed fopen() lurking there. Current version LGTM.

On closer inspection, the state of Unicode support in libcurl does not seem to be ready for production. Existing support extended certain Windows interfaces to use the Unicode flavour of the Windows API, but that also meant that the expected encoding/codepage of strings (e.g. local filenames, URLs) exchanged via the libcurl API became ambiguous and undefined. Previously all strings had to be passed in the active Windows locale, using an 8-bit codepage. In Unicode libcurl builds, the expected string encoding became an undocumented mixture of UTF-8 and 8-bit locale, depending on the actual API/option, certain dynamic and static "fallback" logic inside libcurl and even in OpenSSL, while some parts of libcurl kept using 8-bit strings internally. From the user's perspective this poses an unreasonably difficult task in finding out how to pass a certain non-ASCII string to a specific API without unwanted or accidental (possibly lossy) conversions or other side-effects. Missing the correct encoding may result in unexpected behaviour, e.g. in some cases not finding files, finding different files, accessing the wrong URL or passing a corrupt username or password. Note that these issues may _only_ affect strings with _non-ASCII_ content. For now the best solution seems to be to revert back to how libcurl/curl worked for most of its existence and only re-enable Unicode once the remaining parts of Windows Unicode support are well-understood, ironed out and documented. Unicode was enabled in curl-for-win about a year ago with 7.71.0. Hopefully this period had the benefit to have surfaced some of these issues. Ref: curl/curl#6089 Ref: curl/curl#7246 Ref: curl/curl#7251 Ref: curl/curl#7252 Ref: curl/curl#7257 Ref: curl/curl#7281 Ref: curl/curl#7421 Ref: https://github.com/curl/curl/wiki/libcurl-and-expected-string-encodings Ref: 8023ee5

On closer inspection, the state of Windows Unicode support in libcurl does not seem to be ready for production. Existing support extended certain Windows interfaces to use the Unicode flavour of the Windows API, but that also meant that the expected encoding/codepage of strings (e.g. local filenames, URLs) exchanged via the libcurl API became ambiguous and undefined. Previously all strings had to be passed in the active Windows locale, using an 8-bit codepage. In Unicode libcurl builds, the expected string encoding became an undocumented mixture of UTF-8 and 8-bit locale, depending on the actual API, build options/dependencies, internal fallback logic based on runtime auto-detection of passed string, and the result of file operations (scheduled for removal in 7.78.0). While some parts of libcurl kept using 8-bit strings internally, e.g. when reading the environment. From the user's perspective this poses an unreasonably complex task in finding out how to pass (or read) a certain non-ASCII string to (from) a specific API without unwanted or accidental conversions or other side-effects. Missing the correct encoding may result in unexpected behaviour, e.g. in some cases not finding files, reading/writing a different file, accessing the wrong URL or passing a corrupt username or password. Note that these issues may only affect strings with _non-7-bit-ASCII_ content. For now the least bad solution seems to be to revert back to how libcurl/curl worked for most of its existence and only re-enable Unicode once the remaining parts of Windows Unicode support are well-understood, ironed out and documented. Unicode was enabled in curl-for-win about a year ago with 7.71.0. Hopefully this period had the benefit to have surfaced some of these issues. Ref: curl/curl#6089 Ref: curl/curl#7246 Ref: curl/curl#7251 Ref: curl/curl#7252 Ref: curl/curl#7257 Ref: curl/curl#7281 Ref: curl/curl#7421 Ref: https://github.com/curl/curl/wiki/libcurl-and-expected-string-encodings Ref: 8023ee5

bagder · 2023-08-06T21:07:59Z

I don't see any action on this.

Make execpath() work correctly in Unicode build.

facc1d8

Note: The ANSI version of GetModuleFileName() won't work if the path of 'curl.exe' contains any characters that cannot be represented in system's ANSI codepage.

bagder added the Windows Windows-specific label Jun 14, 2021

dEajL3kA force-pushed the master branch from c2b9de4 to facc1d8 Compare June 14, 2021 13:00

Small fix.

a346792

Tiny fix.

d05fb79

Workaround for _tcscat() and also _tcsncat() not being allowed.

327062e

dEajL3kA force-pushed the master branch from bba79d6 to 327062e Compare June 14, 2021 15:52

Workaround for _tcscat() and also _tcsncat() not being allowed.

324fe4a

vszakats mentioned this pull request Jul 20, 2021

curl: revert to non-Unicode builds [ci skip] curl/curl-for-win#20

Merged

bagder requested a review from jay September 29, 2022 11:58

vszakats added the Unicode Unicode, code page, character encoding label Feb 16, 2023

bagder closed this Aug 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Make execpath() work correctly in Unicode build. #7251

Make execpath() work correctly in Unicode build. #7251

Uh oh!

dEajL3kA commented Jun 14, 2021

Uh oh!

bagder commented Jun 14, 2021

Uh oh!

dEajL3kA commented Jun 14, 2021 •

edited

Loading

Uh oh!

bagder commented Jun 14, 2021

Uh oh!

dEajL3kA commented Jun 14, 2021

Uh oh!

ghost commented Jun 14, 2021 •

edited by ghost

Loading

Uh oh!

vszakats commented Jun 15, 2021

Uh oh!

dEajL3kA commented Jun 15, 2021 •

edited

Loading

Uh oh!

vszakats commented Jun 15, 2021

Uh oh!

bagder commented Aug 6, 2023

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

Uh oh!

Make execpath() work correctly in Unicode build. #7251

Make execpath() work correctly in Unicode build. #7251

Uh oh!

Conversation

dEajL3kA commented Jun 14, 2021

Uh oh!

bagder commented Jun 14, 2021

Build problems

Code style issues

Uh oh!

dEajL3kA commented Jun 14, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bagder commented Jun 14, 2021

Uh oh!

dEajL3kA commented Jun 14, 2021

Uh oh!

ghost commented Jun 14, 2021 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

👉 View analysis in DeepCode’s Dashboard | Configure the bot

👉 The DeepCode service and API will be deprecated in August, 2021. Here is the information how to migrate. Thank you for using DeepCode 🙏 ❤️ !

Uh oh!

vszakats commented Jun 15, 2021

Uh oh!

dEajL3kA commented Jun 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vszakats commented Jun 15, 2021

Uh oh!

bagder commented Aug 6, 2023

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

dEajL3kA commented Jun 14, 2021 •

edited

Loading

ghost commented Jun 14, 2021 •

edited by ghost

Loading

dEajL3kA commented Jun 15, 2021 •

edited

Loading