Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@ismail
Copy link
Contributor

@ismail ismail commented Feb 23, 2017

Tests 1035, 2046 and 2047 needs LANG set to a UTF-8 compatible value for correctly testing idn features.

@mention-bot
Copy link

@ismail, thanks for your PR! By analyzing the history of the files in this pull request, we identified @dfandrich, @bagder and @mkauf to be potential reviewers.

@bagder bagder closed this in 2bfe550 Feb 23, 2017
@bagder
Copy link
Member

bagder commented Feb 23, 2017

thanks!

@ismail
Copy link
Contributor Author

ismail commented Feb 23, 2017

Ah, just wanted to write, this doesn't seem to work. I still get the same error. setenv blocks (looks like) are not setting the environment variables but HTTP variables only. Sorry my test literally just finished.

@ismail
Copy link
Contributor Author

ismail commented Feb 23, 2017

Ah no never mind, it does work. I was setting LC_ALL=POSIX which was overwriting LANG. So it's good to go.

@ismail
Copy link
Contributor Author

ismail commented Feb 23, 2017

And here is the log:

havana ~/t/c/curl/tests > echo $LANG
POSIX
havana ~/t/c/curl/tests > ./runtests.pl 1035 2046 2047
********* System characteristics ********

  • curl 7.53.0-DEV (x86_64-unknown-linux-gnu)
  • libcurl/7.53.0-DEV OpenSSL/1.0.2k zlib/1.2.8 libidn2/0.16
  • Features: IDN IPv6 Largefile NTLM NTLM_WB SSL libz TLS-SRP UnixSockets HTTPS-proxy
  • Host: havana
  • System: Linux havana 4.9.10-1-default docs/libcurl/libcurl.m4 bug #1 SMP PREEMPT Thu Feb 16 08:36:29 UTC 2017 (ffeeef5) x86_64 x86_64 x86_64 GNU/Linux
  • Servers: HTTP-IPv6 HTTP-unix FTP-IPv6
  • Env: Valgrind

test 1035...[HTTP over proxy with too long IDN host name]
-pd---e-v- OK (1 out of 3 , remaining: 00:09)
test 2046...[Connection re-use with IDN host name]
sp----e-v- OK (2 out of 3 , remaining: 00:03)
test 2047...[Connection re-use with IDN host name over HTTP proxy]
sp----e-v- OK (3 out of 3 , remaining: 00:00)
TESTDONE: 3 tests out of 3 reported OK: 100%
TESTDONE: 3 tests were considered during 10 seconds.

@dfandrich
Copy link
Contributor

I don't agree with the change to test1035. The URL is NOT in UTF-8 but rather ISO-8859-1. The test succeeds because the URL fails to be decoded, despite the incorrect character set. Does the test work for you with this in the section:

CHARSET=ISO-8859-1
LANG=ISO-8859-1
LC_CTYPE=ISO-8859-1

I believe that last one is the most standard one, and probably all the tests should be changed this way. I'm not sure how standard ISO-8859-1 is as a locale specifier, though, but it works for glibc.

@ismail
Copy link
Contributor Author

ismail commented Feb 23, 2017

There is no ISO-8859-1 locale though:


havana ~ > LC_CTYPE=ISO-8859-1 perl -V
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
        LANGUAGE = (unset),
        LC_ALL = (unset),
        LC_CTYPE = "ISO-8859-1",
        LANG = (unset)
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").

@ismail
Copy link
Contributor Author

ismail commented Feb 23, 2017

Also just tested, adding

CHARSET=ISO-8859-1
LANG=ISO-8859-1
LC_CTYPE=ISO-8859-1

results in a test failure.

@dfandrich
Copy link
Contributor

This could be tricky to find a cross-platform way of specifying an ISO 8859/1. This works fine for glibc, but I don't know if there's one that works everywhere. What is the output of:

CHARSET=ISO-8859-1 LC_CTYPE=ISO-8859-1 LANG=ISO-8859-1 locale charmap

Does this command show anything that looks sufficiently generic enough:

locale -a | grep 8859

@ismail
Copy link
Contributor Author

ismail commented Feb 23, 2017

on openSUSE

havana ~ > locale -a | grep 8859
en_GB.iso885915
en_US.iso885915
et_EE.iso885915

on Ubuntu 16.10:

ubuntu-ams3 ~ > locale -a | grep 8859
ubuntu-ams3 ~ >

@dfandrich
Copy link
Contributor

It looks like you don't have any ISO 8859/1 locales on those machine, which technically means that you shouldn't be able to run test 1035. This looks like it's going to be tricky to solve consistently in a cross-platform manner. Setting a UTF-8 locale is a hack that works on your machines but will fail on other people's machines (that don't have a UTF-8 locale). I think the answer is probably going to be a that ensures that the necessary locale is available on the target machine.

@dfandrich
Copy link
Contributor

dfandrich commented Feb 23, 2017

Wait a moment—there's no actual reason test 1035 needs to be in ISO 8859/1 (it's not relevant to the test). If it's converted to UTF-8, then they'll all be consistent. Does it work for you with the following in <setenv>?

CHARSET=UTF-8
LC_ALL=
LC_CTYPE=UTF-8

@dfandrich
Copy link
Contributor

I've pushed a change that I hope will fix this everywhere. I'm hoping the autobuilds will show these tests working (or being skipped) on Solaris and Windows instead of unconditionally failing now.

@bagder
Copy link
Member

bagder commented Feb 23, 2017

I'm afraid it broke already for me, in at least two different ways! Problem 1:

********* System characteristics ******** 
* curl 7.53.1-DEV (x86_64-pc-linux-gnu) 
* libcurl/7.53.1-DEV OpenSSL/1.1.0e zlib/1.2.8 c-ares/1.12.0 libidn2/0.16 libpsl/0.17.0 (+libidn2/0.16) nghttp2/1.20.0-DEV librtmp/2.3
* Features: AsynchDNS Debug TrackMemory IDN IPv6 Largefile GSS-API Kerberos SPNEGO NTLM NTLM_WB SSL libz TLS-SRP HTTP2 UnixSockets HTTPS-proxy Metalink PSL 
* Host: storebror.haxx.se
* System: Linux storebror.haxx.se 4.9.0-1-amd64 #1 SMP Debian 4.9.6-3 (2017-01-28) x86_64 GNU/Linux
* Servers: SSL HTTP-IPv6 HTTP-unix FTP-IPv6 
* Env: Valgrind 
***************************************** 
test 0165...[HTTP over proxy with IDN host name]

 165: protocol FAILED:
--- log/check-expected  2017-02-23 23:27:05.542013862 +0100
+++ log/check-generated 2017-02-23 23:27:05.542013862 +0100
@@ -1,10 +1,10 @@
-GET http://www.xn--4cab6c.se/page/165 HTTP/1.1[CR][LF]
-Host: www.xn--4cab6c.se[CR][LF]
+GET http://www..se/page/165 HTTP/1.1[CR][LF]
+Host: www..se[CR][LF]
 Accept: */*[CR][LF]
 Proxy-Connection: Keep-Alive[CR][LF]
 [CR][LF]
-GET http://www.xn--groe-xna.de/page/165 HTTP/1.1[CR][LF]
-Host: www.xn--groe-xna.de[CR][LF]
+GET http://www.groe.de/page/165 HTTP/1.1[CR][LF]
+Host: www.groe.de[CR][LF]
 Accept: */*[CR][LF]
 Proxy-Connection: Keep-Alive[CR][LF]
 [CR][LF]

 - abort tests
TESTDONE: 0 tests out of 1 reported OK: 0%
TESTFAIL: These test cases failed: 165 
TESTDONE: 1 tests were considered during 3 seconds.

It works again with this patch:

diff --git a/tests/data/test165 b/tests/data/test165
index 4d48c0c65..c1b70f70c 100644
--- a/tests/data/test165
+++ b/tests/data/test165
@@ -31,11 +31,11 @@ http
 idn
 </features>
 <setenv>
 CHARSET=UTF-8
 LC_ALL=
-LC_CTYPE=UTF-8
+LC_CTYPE=en_US.UTF-8
 </setenv>
 <precheck>
 perl -MI18N::Langinfo=langinfo,CODESET -e 'die "Needs a UTF-8 locale" if (lc(langinfo(CODESET())) ne "utf-8");'
 </precheck>
  <name>

@bagder
Copy link
Member

bagder commented Feb 23, 2017

Problem 2, more spectacular and not necessarily a problem in our code:

./runtests.pl 1034 
********* System characteristics ******** 
* curl 7.53.1-DEV (x86_64-pc-linux-gnu) 
* libcurl/7.53.1-DEV OpenSSL/1.1.0e zlib/1.2.8 c-ares/1.12.0 libidn2/0.16 libpsl/0.17.0 (+libidn2/0.16) nghttp2/1.20.0-DEV librtmp/2.3
* Features: AsynchDNS Debug TrackMemory IDN IPv6 Largefile GSS-API Kerberos SPNEGO NTLM NTLM_WB SSL libz TLS-SRP HTTP2 UnixSockets HTTPS-proxy Metalink PSL 
* Host: storebror.haxx.se
* System: Linux storebror.haxx.se 4.9.0-1-amd64 #1 SMP Debian 4.9.6-3 (2017-01-28) x86_64 GNU/Linux
* Servers: SSL HTTP-IPv6 HTTP-unix FTP-IPv6 
* Env: Valgrind 
***************************************** 
test 1034...[HTTP over proxy with malformatted IDN host name]
 valgrind ERROR ==19038== Conditional jump or move depends on uninitialised value(s)
==19038==    at 0x797DABC: libunistring_freea (in /usr/lib/x86_64-linux-gnu/libunistring.so.0.1.2)
==19038==    by 0x7980356: libunistring_mem_iconveha (in /usr/lib/x86_64-linux-gnu/libunistring.so.0.1.2)
==19038==    by 0x798716C: u8_conv_from_encoding (in /usr/lib/x86_64-linux-gnu/libunistring.so.0.1.2)
==19038==    by 0x79874F7: u8_strconv_from_encoding (in /usr/lib/x86_64-linux-gnu/libunistring.so.0.1.2)
==19038==    by 0x56AB222: idn2_lookup_ul (in /usr/lib/x86_64-linux-gnu/libidn2.so.0.1.4)
==19038==    by 0x16ABD8: fix_hostname (url.c:4054)
==19038==    by 0x16FB00: create_conn (url.c:6444)
==19038==    by 0x170A09: Curl_connect (url.c:6876)
==19038==    by 0x1422E7: multi_runsingle (multi.c:1432)
==19038==    by 0x143AC2: curl_multi_perform (multi.c:2164)
==19038==    by 0x13C498: easy_transfer (easy.c:700)
==19038==    by 0x13C675: easy_perform (easy.c:787)
==19038==    by 0x13C6BF: curl_easy_perform (easy.c:806)
==19038==    by 0x12C4DF: operate_do (tool_operate.c:1474)
==19038==    by 0x12D93E: operate (tool_operate.c:2023)
==19038==    by 0x125670: main (tool_main.c:252)
==19038== 

 - abort tests
TESTDONE: 0 tests out of 1 reported OK: 0%
TESTFAIL: These test cases failed: 1034 
TESTDONE: 1 tests were considered during 3 seconds.

@dfandrich
Copy link
Contributor

W.r.t. test 165, I guess there's something different in ways other than the character encoding between the en_US.UTF-8 locale and UTF-8 on your machine. Ideally, we'd add a check for the important bits in the <precheck>, but probably en_US.UTF-8 is going to be more standard than plain UTF-8, and possibly even more commonly-installed, so switching might be a good idea.

W.r.t. test 1034, that looks pretty strongly like a bug in libidn2. I don't see that problem using libidn2 v0.16, so maybe it's already been fixed.

@bagder
Copy link
Member

bagder commented Feb 23, 2017

I'm rocking libidn2/0.16 as well (from Debian), so something is clearly odd in there...

As I'm about to ship a 7.53.1 release in 9-10 hours we at least need a short term decision on what to include there. Right now, just reverting ecd1d02 for that release is one option.

@ismail
Copy link
Contributor Author

ismail commented Feb 24, 2017

There is no standalone UTF-8 charset:

havana ~ > LC_CTYPE=UTF-8 perl -V
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
	LANGUAGE = (unset),
	LC_ALL = (unset),
	LC_CTYPE = "UTF-8",
	LANG = (unset)
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
Summary of my perl5 (revision 5 version 24 subversion 0) configuration:

So just replace those UTF-8 with en_US.UTF-8 and we would be fine imho.

@bagder
Copy link
Member

bagder commented Feb 24, 2017

What do you say @dfandrich?

@ismail
Copy link
Contributor Author

ismail commented Feb 24, 2017

Btw. in a perfect world we would have C.UTF-8 locale everywhere, but unfortunately glibc does not have it by default (Fedora/openSUSE does.) : https://sourceware.org/glibc/wiki/Proposals/C.UTF-8

@dfandrich
Copy link
Contributor

W.r.t. test 165, I guess there's something different in ways other than the character encoding between the en_US.UTF-8 locale and UTF-8 on your machine. Ideally, we'd add a check for the important bits in the <precheck>, but probably en_US.UTF-8 is going to be more standard than plain UTF-8, and possibly even more commonly-installed, so switching might be a good idea.

W.r.t. test 1034, that looks pretty strongly like a bug in libidn2. I don't see that problem using libidn2 v0.16, so maybe it's already been fixed.

I've verified that the locale en_US.utf8 exists on 4 different Linux distributions, so that might be the one to use. I was pleased to see that the precheck at least made some of the Solaris builds green (because the tests were skipped, but still…)

@dfandrich
Copy link
Contributor

Sorry, my last comment was a bit stale. I'll try a few more distros/OSes and see if there's something better than en_US.utf8, but so far it looks to me like the best bet. And even if it doesn't exist somewhere, the precheck should cause the test to be skipped there (which didn't work for the "UTF-8" locale because it must be weird in other ways).

@jay
Copy link
Member

jay commented Feb 24, 2017

I haven't been following this but I tried the precheck in mingw msys and it fails. I don't know if that is expected.

perl -MI18N::Langinfo=langinfo,CODESET -e 'die "Needs a UTF-8 locale" if (lc(langinfo(CODESET())) ne "utf-8");'
Can't locate I18N/Langinfo.pm in @INC

@ismail
Copy link
Contributor Author

ismail commented Feb 24, 2017

@jay there seems to be a problem with your Perl installation because on Cygwin that file comes from perl_base:

latte ~ > cygcheck -f /usr/lib/perl5/5.22/x86_64-cygwin-threads/I18N/Langinfo.pm
perl_base-5.22.3-1

@jay
Copy link
Member

jay commented Feb 24, 2017

I am using mingw which just does not have that. Its msys environment which allows running autotools comes with an old perl, 5.8.8. Again I don't know how relevant it is for what you're doing I didn't read this thread, just checking in.

@ismail
Copy link
Contributor Author

ismail commented Feb 24, 2017

It's only relevant if you are going to run curl tests. However msys2 already has latest stable Perl: https://github.com/Alexpux/MSYS2-packages/tree/master/perl in case you need it.

@dfandrich
Copy link
Contributor

Jay, is the test at least skipped in your case? If the perl precheck fails, that's what should happen. Ideally, we'd find a detection algorithm that works everywhere, but it's probably better to skip the test than risk a false positive.

@dfandrich
Copy link
Contributor

I've confirmed that en_US.UTF-8 works on 7 different Linux & BSD distributions. It only failed in a couple of cases when locales were not even installed (and in which IDN wouldn't really be usable, anyway). I'll push that change unless there are any objections.

@bagder
Copy link
Member

bagder commented Feb 25, 2017

Nice work Dan, go for it! 👍

@ismail
Copy link
Contributor Author

ismail commented Feb 25, 2017

Works for me, thanks!

@dfandrich
Copy link
Contributor

Pushed as c6ddb60.

@lock lock bot locked as resolved and limited conversation to collaborators May 24, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Development

Successfully merging this pull request may close these issues.

5 participants