Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@dEajL3kA
Copy link

Note: GetModuleFileNameA() won't work if the path of 'curl.exe' contains any characters that cannot be represented in system's ANSI codepage. So, instead of GetModuleFileNameA(), we should use GetModuleFileName(), which will expand to GetModuleFileNameW() in Unicode builds. Also adapted code to use TCHAR and appropriate generic-text routine mappings.

Note: The ANSI version of GetModuleFileName() won't work if the path of 'curl.exe' contains any characters that cannot be represented in system's ANSI codepage.
@bagder bagder added the Windows Windows-specific label Jun 14, 2021
@bagder
Copy link
Member

bagder commented Jun 14, 2021

Build problems

x86_64-w64-mingw32-gcc -DHAVE_CONFIG_H   -I../include -I../lib -I../src -I../lib -I../src -DCURL_STATICLIB -isystem C:/msys64/mingw64/include -isystem C:/msys64/mingw64/include -DWINVER=0x0600  -Werror-implicit-function-declaration -O2 -Wno-system-headers -Wenum-conversion -Werror -pedantic-errors -MT tool_parsecfg.o -MD -MP -MF $depbase.Tpo -c -o tool_parsecfg.o tool_parsecfg.c &&\
mv -f $depbase.Tpo $depbase.Po
In file included from C:/msys64/mingw64/x86_64-w64-mingw32/include/minwindef.h:163,
                 from C:/msys64/mingw64/x86_64-w64-mingw32/include/windef.h:9,
                 from C:/msys64/mingw64/x86_64-w64-mingw32/include/windows.h:69,
                 from C:/msys64/mingw64/x86_64-w64-mingw32/include/winsock2.h:23,
                 from ../include/curl/system.h:422,
                 from ../include/curl/curl.h:38,
                 from ../lib/curl_setup.h:162,
                 from tool_setup.h:36,
                 from tool_parsecfg.c:22:
tool_parsecfg.c: In function 'execpath':
../lib/curl_setup.h:488:25: error: passing argument 2 of 'strrchr' makes integer from pointer without a cast [-Wint-conversion]
  488 | #  define DIR_CHAR      "\\"
      |                         ^~~~
      |                         |
      |                         char *
tool_parsecfg.c:57:52: note: in expansion of macro 'DIR_CHAR'
   57 |     TCHAR *lastdirchar = _tcsrchr(filebuffer, TEXT(DIR_CHAR));
      |                                                    ^~~~~~~~
In file included from C:/msys64/mingw64/x86_64-w64-mingw32/include/guiddef.h:154,
                 from C:/msys64/mingw64/x86_64-w64-mingw32/include/winnt.h:635,
                 from C:/msys64/mingw64/x86_64-w64-mingw32/include/minwindef.h:163,
                 from C:/msys64/mingw64/x86_64-w64-mingw32/include/windef.h:9,
                 from C:/msys64/mingw64/x86_64-w64-mingw32/include/windows.h:69,
                 from C:/msys64/mingw64/x86_64-w64-mingw32/include/winsock2.h:23,
                 from ../include/curl/system.h:422,
                 from ../include/curl/curl.h:38,
                 from ../lib/curl_setup.h:162,
                 from tool_setup.h:36,
                 from tool_parsecfg.c:22:
C:/msys64/mingw64/x86_64-w64-mingw32/include/string.h:93:60: note: expected 'int' but argument is of type 'char *'
   93 |   _CONST_RETURN char *__cdecl strrchr(const char *_Str,int _Ch);
      |                                                        ~~~~^~~

Code style issues

./tool_parsecfg.c:56:5: warning: if with space (SPACEBEFOREPAREN)
   if ((len > 0) && (len < MAX_MODULE_FILE_LENGTH)) {
     ^
./tool_parsecfg.c:61:81: warning: Longer than 79 columns (LONGLINE)
     if (_tcslen(filebuffer) + _tcslen(filename) + 1U < MAX_MODULE_FILE_LENGTH) {
./tool_parsecfg.c:61:7: warning: if with space (SPACEBEFOREPAREN)
     if (_tcslen(filebuffer) + _tcslen(filename) + 1U < MAX_MODULE_FILE_LENGTH) {
       ^
./tool_parsecfg.c:62:7: warning: use of _tcscat is banned (BANNEDFUNC)
       _tcscat(filebuffer, TEXT(DIR_CHAR));
       ^
./tool_parsecfg.c:63:7: warning: use of _tcscat is banned (BANNEDFUNC)
       _tcscat(filebuffer, filename);
       ^

@dEajL3kA
Copy link
Author

dEajL3kA commented Jun 14, 2021

Don't understand why _tcscat is "banned". It is a generic-text routine mapping that will expand to strcat or wcscat as needed, so that Unicode and ANSI build can both be supported with the same code. What should I use instead ❓

@bagder
Copy link
Member

bagder commented Jun 14, 2021

strcat() is similarly also banned. They're prohibited simply because they're too easy to misuse due to them not having any buffer size argument/boundary. Alternative functions to use are those that make sure that the target buffer is not overflowed. Like msnprintf() or plain old strncpy() etc.

@dEajL3kA
Copy link
Author

I see. Does msnprintf() have a generic TCHAR version? Is _tcsncat() allowed?

@ghost
Copy link

ghost commented Jun 14, 2021

Congratulations 🎉. DeepCode analyzed your code in 2.209 seconds and we found no issues. Enjoy a moment of no bugs ☀️.

👉 View analysis in DeepCode’s Dashboard | Configure the bot

👉 The DeepCode service and API will be deprecated in August, 2021. Here is the information how to migrate. Thank you for using DeepCode 🙏 ❤️ !

If you are using our plugins, you might be interested in their successors: Snyk's JetBrains plugin and Snyk's VS Code plugin.

@vszakats
Copy link
Member

According to the CHECKSRC doc, *ncat() functions are not allowed and msnprintf() does not have a Windows-specific variant either.

A workable approach might be to convert the Win32 API result to UTF-8 right away and continue using the existing char string manipulation logic.

@dEajL3kA
Copy link
Author

dEajL3kA commented Jun 15, 2021

According to the CHECKSRC doc, *ncat() functions are not allowed and msnprintf() does not have a Windows-specific variant either.

I see how _tcsncat() can cause troubles, if the num parameter is not determined carefully. But I really wish we had a _tsomething() macro that does concatenation "safely". Probably many use cases for that, resulting in simpler code.

A workable approach might be to convert the Win32 API result to UTF-8 right away and continue using the existing char string manipulation logic.

I really wanted to avoid converting from UTF-16 to UTF-8 just for the string concatenation; we'd have to convert back to UTF-16 for the fopen (effectively wfopen) right away. Not only would this be inefficient, but also not better – except that it satisfies the code checker (cargo cult). Also the code would be more bloated/less readable with the additional (unneccessary) conversions.

So, latest version of the patch now uses _tcsncpy, which I hope is okay.

@vszakats
Copy link
Member

@dEajL3kA: Fair point, I totally missed fopen() lurking there. Current version LGTM.

vszakats added a commit to curl/curl-for-win that referenced this pull request Jul 20, 2021
On closer inspection, the state of Unicode support in libcurl does not
seem to be ready for production. Existing support extended certain Windows
interfaces to use the Unicode flavour of the Windows API, but that also
meant that the expected encoding/codepage of strings (e.g. local filenames,
URLs) exchanged via the libcurl API became ambiguous and undefined.
Previously all strings had to be passed in the active Windows locale, using
an 8-bit codepage. In Unicode libcurl builds, the expected string encoding
became an undocumented mixture of UTF-8 and 8-bit locale, depending on the
actual API/option, certain dynamic and static "fallback" logic inside
libcurl and even in OpenSSL, while some parts of libcurl kept using 8-bit
strings internally. From the user's perspective this poses an unreasonably
difficult task in finding out how to pass a certain non-ASCII string to a
specific API without unwanted or accidental (possibly lossy) conversions or
other side-effects. Missing the correct encoding may result in unexpected
behaviour, e.g. in some cases not finding files, finding different files,
accessing the wrong URL or passing a corrupt username or password.

Note that these issues may _only_ affect strings with _non-ASCII_ content.

For now the best solution seems to be to revert back to how libcurl/curl
worked for most of its existence and only re-enable Unicode once the
remaining parts of Windows Unicode support are well-understood, ironed out
and documented.

Unicode was enabled in curl-for-win about a year ago with 7.71.0. Hopefully
this period had the benefit to have surfaced some of these issues.

Ref: curl/curl#6089
Ref: curl/curl#7246
Ref: curl/curl#7251
Ref: curl/curl#7252
Ref: curl/curl#7257
Ref: curl/curl#7281
Ref: curl/curl#7421
Ref: https://github.com/curl/curl/wiki/libcurl-and-expected-string-encodings
Ref: 8023ee5
vszakats added a commit to curl/curl-for-win that referenced this pull request Jul 20, 2021
On closer inspection, the state of Unicode support in libcurl does not
seem to be ready for production. Existing support extended certain Windows
interfaces to use the Unicode flavour of the Windows API, but that also
meant that the expected encoding/codepage of strings (e.g. local filenames,
URLs) exchanged via the libcurl API became ambiguous and undefined.
Previously all strings had to be passed in the active Windows locale, using
an 8-bit codepage. In Unicode libcurl builds, the expected string encoding
became an undocumented mixture of UTF-8 and 8-bit locale, depending on the
actual API/option, certain dynamic and static "fallback" logic inside
libcurl and even in OpenSSL, while some parts of libcurl kept using 8-bit
strings internally. From the user's perspective this poses an unreasonably
difficult task in finding out how to pass a certain non-ASCII string to a
specific API without unwanted or accidental (possibly lossy) conversions or
other side-effects. Missing the correct encoding may result in unexpected
behaviour, e.g. in some cases not finding files, finding different files,
accessing the wrong URL or passing a corrupt username or password.

Note that these issues may _only_ affect strings with _non-ASCII_ content.

For now the best solution seems to be to revert back to how libcurl/curl
worked for most of its existence and only re-enable Unicode once the
remaining parts of Windows Unicode support are well-understood, ironed out
and documented.

Unicode was enabled in curl-for-win about a year ago with 7.71.0. Hopefully
this period had the benefit to have surfaced some of these issues.

Ref: curl/curl#6089
Ref: curl/curl#7246
Ref: curl/curl#7251
Ref: curl/curl#7252
Ref: curl/curl#7257
Ref: curl/curl#7281
Ref: curl/curl#7421
Ref: https://github.com/curl/curl/wiki/libcurl-and-expected-string-encodings
Ref: 8023ee5
vszakats added a commit to curl/curl-for-win that referenced this pull request Jul 20, 2021
On closer inspection, the state of Unicode support in libcurl does not
seem to be ready for production. Existing support extended certain Windows
interfaces to use the Unicode flavour of the Windows API, but that also
meant that the expected encoding/codepage of strings (e.g. local filenames,
URLs) exchanged via the libcurl API became ambiguous and undefined.
Previously all strings had to be passed in the active Windows locale, using
an 8-bit codepage. In Unicode libcurl builds, the expected string encoding
became an undocumented mixture of UTF-8 and 8-bit locale, depending on the
actual API/option, certain dynamic and static "fallback" logic inside
libcurl and even in OpenSSL, while some parts of libcurl kept using 8-bit
strings internally. From the user's perspective this poses an unreasonably
difficult task in finding out how to pass a certain non-ASCII string to a
specific API without unwanted or accidental (possibly lossy) conversions or
other side-effects. Missing the correct encoding may result in unexpected
behaviour, e.g. in some cases not finding files, finding different files,
accessing the wrong URL or passing a corrupt username or password.

Note that these issues may _only_ affect strings with _non-ASCII_ content.

For now the best solution seems to be to revert back to how libcurl/curl
worked for most of its existence and only re-enable Unicode once the
remaining parts of Windows Unicode support are well-understood, ironed out
and documented.

Unicode was enabled in curl-for-win about a year ago with 7.71.0. Hopefully
this period had the benefit to have surfaced some of these issues.

Ref: curl/curl#6089
Ref: curl/curl#7246
Ref: curl/curl#7251
Ref: curl/curl#7252
Ref: curl/curl#7257
Ref: curl/curl#7281
Ref: curl/curl#7421
Ref: https://github.com/curl/curl/wiki/libcurl-and-expected-string-encodings
Ref: 8023ee5
vszakats added a commit to curl/curl-for-win that referenced this pull request Jul 20, 2021
On closer inspection, the state of Windows Unicode support in libcurl does
not seem to be ready for production. Existing support extended certain
Windows interfaces to use the Unicode flavour of the Windows API, but that
also meant that the expected encoding/codepage of strings (e.g. local
filenames, URLs) exchanged via the libcurl API became ambiguous and
undefined.

Previously all strings had to be passed in the active Windows locale, using
an 8-bit codepage. In Unicode libcurl builds, the expected string encoding
became an undocumented mixture of UTF-8 and 8-bit locale, depending on the
actual API, build options/dependencies, internal fallback logic based on
runtime auto-detection of passed string, and the result of file operations
(scheduled for removal in 7.78.0). While some parts of libcurl kept using
8-bit strings internally, e.g. when reading the environment.

From the user's perspective this poses an unreasonably complex task in
finding out how to pass (or read) a certain non-ASCII string to (from) a
specific API without unwanted or accidental conversions or other
side-effects. Missing the correct encoding may result in unexpected
behaviour, e.g. in some cases not finding files, reading/writing a
different file, accessing the wrong URL or passing a corrupt username or
password.

Note that these issues may only affect strings with _non-7-bit-ASCII_
content.

For now the least bad solution seems to be to revert back to how
libcurl/curl worked for most of its existence and only re-enable Unicode
once the remaining parts of Windows Unicode support are well-understood,
ironed out and documented.

Unicode was enabled in curl-for-win about a year ago with 7.71.0. Hopefully
this period had the benefit to have surfaced some of these issues.

Ref: curl/curl#6089
Ref: curl/curl#7246
Ref: curl/curl#7251
Ref: curl/curl#7252
Ref: curl/curl#7257
Ref: curl/curl#7281
Ref: curl/curl#7421
Ref: https://github.com/curl/curl/wiki/libcurl-and-expected-string-encodings
Ref: 8023ee5
vszakats added a commit to curl/curl-for-win that referenced this pull request Jul 20, 2021
On closer inspection, the state of Windows Unicode support in libcurl does
not seem to be ready for production. Existing support extended certain
Windows interfaces to use the Unicode flavour of the Windows API, but that
also meant that the expected encoding/codepage of strings (e.g. local
filenames, URLs) exchanged via the libcurl API became ambiguous and
undefined.

Previously all strings had to be passed in the active Windows locale, using
an 8-bit codepage. In Unicode libcurl builds, the expected string encoding
became an undocumented mixture of UTF-8 and 8-bit locale, depending on the
actual API, build options/dependencies, internal fallback logic based on
runtime auto-detection of passed string, and the result of file operations
(scheduled for removal in 7.78.0). While some parts of libcurl kept using
8-bit strings internally, e.g. when reading the environment.

From the user's perspective this poses an unreasonably complex task in
finding out how to pass (or read) a certain non-ASCII string to (from) a
specific API without unwanted or accidental conversions or other
side-effects. Missing the correct encoding may result in unexpected
behaviour, e.g. in some cases not finding files, reading/writing a
different file, accessing the wrong URL or passing a corrupt username or
password.

Note that these issues may only affect strings with _non-7-bit-ASCII_
content.

For now the least bad solution seems to be to revert back to how
libcurl/curl worked for most of its existence and only re-enable Unicode
once the remaining parts of Windows Unicode support are well-understood,
ironed out and documented.

Unicode was enabled in curl-for-win about a year ago with 7.71.0. Hopefully
this period had the benefit to have surfaced some of these issues.

Ref: curl/curl#6089
Ref: curl/curl#7246
Ref: curl/curl#7251
Ref: curl/curl#7252
Ref: curl/curl#7257
Ref: curl/curl#7281
Ref: curl/curl#7421
Ref: https://github.com/curl/curl/wiki/libcurl-and-expected-string-encodings
Ref: 8023ee5
@bagder bagder requested a review from jay September 29, 2022 11:58
@vszakats vszakats added the Unicode Unicode, code page, character encoding label Feb 16, 2023
@bagder
Copy link
Member

bagder commented Aug 6, 2023

I don't see any action on this.

@bagder bagder closed this Aug 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Unicode Unicode, code page, character encoding Windows Windows-specific

Development

Successfully merging this pull request may close these issues.

4 participants