Set CHARSET & LANG to UTF-8 compatible values for idn tests #1283

ismail · 2017-02-23T09:30:22Z

Tests 1035, 2046 and 2047 needs LANG set to a UTF-8 compatible value for correctly testing idn features.

…6 and 2047

mention-bot · 2017-02-23T09:30:24Z

@ismail, thanks for your PR! By analyzing the history of the files in this pull request, we identified @dfandrich, @bagder and @mkauf to be potential reviewers.

bagder · 2017-02-23T10:12:17Z

thanks!

ismail · 2017-02-23T10:13:35Z

Ah, just wanted to write, this doesn't seem to work. I still get the same error. setenv blocks (looks like) are not setting the environment variables but HTTP variables only. Sorry my test literally just finished.

ismail · 2017-02-23T10:16:21Z

Ah no never mind, it does work. I was setting LC_ALL=POSIX which was overwriting LANG. So it's good to go.

ismail · 2017-02-23T10:18:47Z

And here is the log:

havana ~/t/c/curl/tests > echo $LANG
POSIX
havana ~/t/c/curl/tests > ./runtests.pl 1035 2046 2047
********* System characteristics ********

curl 7.53.0-DEV (x86_64-unknown-linux-gnu)

libcurl/7.53.0-DEV OpenSSL/1.0.2k zlib/1.2.8 libidn2/0.16

Features: IDN IPv6 Largefile NTLM NTLM_WB SSL libz TLS-SRP UnixSockets HTTPS-proxy

Host: havana

System: Linux havana 4.9.10-1-default docs/libcurl/libcurl.m4 bug #1 SMP PREEMPT Thu Feb 16 08:36:29 UTC 2017 (ffeeef5) x86_64 x86_64 x86_64 GNU/Linux

Servers: HTTP-IPv6 HTTP-unix FTP-IPv6

Env: Valgrind

test 1035...[HTTP over proxy with too long IDN host name]
-pd---e-v- OK (1 out of 3 , remaining: 00:09)
test 2046...[Connection re-use with IDN host name]
sp----e-v- OK (2 out of 3 , remaining: 00:03)
test 2047...[Connection re-use with IDN host name over HTTP proxy]
sp----e-v- OK (3 out of 3 , remaining: 00:00)
TESTDONE: 3 tests out of 3 reported OK: 100%
TESTDONE: 3 tests were considered during 10 seconds.

dfandrich · 2017-02-23T20:43:10Z

I don't agree with the change to test1035. The URL is NOT in UTF-8 but rather ISO-8859-1. The test succeeds because the URL fails to be decoded, despite the incorrect character set. Does the test work for you with this in the section:

CHARSET=ISO-8859-1
LANG=ISO-8859-1
LC_CTYPE=ISO-8859-1

I believe that last one is the most standard one, and probably all the tests should be changed this way. I'm not sure how standard ISO-8859-1 is as a locale specifier, though, but it works for glibc.

ismail · 2017-02-23T20:54:13Z

There is no ISO-8859-1 locale though:


havana ~ > LC_CTYPE=ISO-8859-1 perl -V
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
        LANGUAGE = (unset),
        LC_ALL = (unset),
        LC_CTYPE = "ISO-8859-1",
        LANG = (unset)
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").

ismail · 2017-02-23T20:57:30Z

Also just tested, adding

CHARSET=ISO-8859-1
LANG=ISO-8859-1
LC_CTYPE=ISO-8859-1

results in a test failure.

dfandrich · 2017-02-23T21:10:20Z

This could be tricky to find a cross-platform way of specifying an ISO 8859/1. This works fine for glibc, but I don't know if there's one that works everywhere. What is the output of:

CHARSET=ISO-8859-1 LC_CTYPE=ISO-8859-1 LANG=ISO-8859-1 locale charmap

Does this command show anything that looks sufficiently generic enough:

locale -a | grep 8859

ismail · 2017-02-23T21:15:25Z

on openSUSE

havana ~ > locale -a | grep 8859
en_GB.iso885915
en_US.iso885915
et_EE.iso885915

on Ubuntu 16.10:

ubuntu-ams3 ~ > locale -a | grep 8859
ubuntu-ams3 ~ >

dfandrich · 2017-02-23T21:24:35Z

It looks like you don't have any ISO 8859/1 locales on those machine, which technically means that you shouldn't be able to run test 1035. This looks like it's going to be tricky to solve consistently in a cross-platform manner. Setting a UTF-8 locale is a hack that works on your machines but will fail on other people's machines (that don't have a UTF-8 locale). I think the answer is probably going to be a that ensures that the necessary locale is available on the target machine.

dfandrich · 2017-02-23T21:53:03Z

Wait a moment—there's no actual reason test 1035 needs to be in ISO 8859/1 (it's not relevant to the test). If it's converted to UTF-8, then they'll all be consistent. Does it work for you with the following in <setenv>?

CHARSET=UTF-8
LC_ALL=
LC_CTYPE=UTF-8

dfandrich · 2017-02-23T22:21:10Z

I've pushed a change that I hope will fix this everywhere. I'm hoping the autobuilds will show these tests working (or being skipped) on Solaris and Windows instead of unconditionally failing now.

bagder · 2017-02-23T22:28:17Z

I'm afraid it broke already for me, in at least two different ways! Problem 1:

********* System characteristics ******** 
* curl 7.53.1-DEV (x86_64-pc-linux-gnu) 
* libcurl/7.53.1-DEV OpenSSL/1.1.0e zlib/1.2.8 c-ares/1.12.0 libidn2/0.16 libpsl/0.17.0 (+libidn2/0.16) nghttp2/1.20.0-DEV librtmp/2.3
* Features: AsynchDNS Debug TrackMemory IDN IPv6 Largefile GSS-API Kerberos SPNEGO NTLM NTLM_WB SSL libz TLS-SRP HTTP2 UnixSockets HTTPS-proxy Metalink PSL 
* Host: storebror.haxx.se
* System: Linux storebror.haxx.se 4.9.0-1-amd64 #1 SMP Debian 4.9.6-3 (2017-01-28) x86_64 GNU/Linux
* Servers: SSL HTTP-IPv6 HTTP-unix FTP-IPv6 
* Env: Valgrind 
***************************************** 
test 0165...[HTTP over proxy with IDN host name]

 165: protocol FAILED:
--- log/check-expected  2017-02-23 23:27:05.542013862 +0100
+++ log/check-generated 2017-02-23 23:27:05.542013862 +0100
@@ -1,10 +1,10 @@
-GET http://www.xn--4cab6c.se/page/165 HTTP/1.1[CR][LF]
-Host: www.xn--4cab6c.se[CR][LF]
+GET http://www..se/page/165 HTTP/1.1[CR][LF]
+Host: www..se[CR][LF]
 Accept: */*[CR][LF]
 Proxy-Connection: Keep-Alive[CR][LF]
 [CR][LF]
-GET http://www.xn--groe-xna.de/page/165 HTTP/1.1[CR][LF]
-Host: www.xn--groe-xna.de[CR][LF]
+GET http://www.groe.de/page/165 HTTP/1.1[CR][LF]
+Host: www.groe.de[CR][LF]
 Accept: */*[CR][LF]
 Proxy-Connection: Keep-Alive[CR][LF]
 [CR][LF]

 - abort tests
TESTDONE: 0 tests out of 1 reported OK: 0%
TESTFAIL: These test cases failed: 165 
TESTDONE: 1 tests were considered during 3 seconds.

It works again with this patch:

diff --git a/tests/data/test165 b/tests/data/test165
index 4d48c0c65..c1b70f70c 100644
--- a/tests/data/test165
+++ b/tests/data/test165
@@ -31,11 +31,11 @@ http
 idn
 </features>
 <setenv>
 CHARSET=UTF-8
 LC_ALL=
-LC_CTYPE=UTF-8
+LC_CTYPE=en_US.UTF-8
 </setenv>
 <precheck>
 perl -MI18N::Langinfo=langinfo,CODESET -e 'die "Needs a UTF-8 locale" if (lc(langinfo(CODESET())) ne "utf-8");'
 </precheck>
  <name>

bagder · 2017-02-23T22:29:22Z

Problem 2, more spectacular and not necessarily a problem in our code:

./runtests.pl 1034 
********* System characteristics ******** 
* curl 7.53.1-DEV (x86_64-pc-linux-gnu) 
* libcurl/7.53.1-DEV OpenSSL/1.1.0e zlib/1.2.8 c-ares/1.12.0 libidn2/0.16 libpsl/0.17.0 (+libidn2/0.16) nghttp2/1.20.0-DEV librtmp/2.3
* Features: AsynchDNS Debug TrackMemory IDN IPv6 Largefile GSS-API Kerberos SPNEGO NTLM NTLM_WB SSL libz TLS-SRP HTTP2 UnixSockets HTTPS-proxy Metalink PSL 
* Host: storebror.haxx.se
* System: Linux storebror.haxx.se 4.9.0-1-amd64 #1 SMP Debian 4.9.6-3 (2017-01-28) x86_64 GNU/Linux
* Servers: SSL HTTP-IPv6 HTTP-unix FTP-IPv6 
* Env: Valgrind 
***************************************** 
test 1034...[HTTP over proxy with malformatted IDN host name]
 valgrind ERROR ==19038== Conditional jump or move depends on uninitialised value(s)
==19038==    at 0x797DABC: libunistring_freea (in /usr/lib/x86_64-linux-gnu/libunistring.so.0.1.2)
==19038==    by 0x7980356: libunistring_mem_iconveha (in /usr/lib/x86_64-linux-gnu/libunistring.so.0.1.2)
==19038==    by 0x798716C: u8_conv_from_encoding (in /usr/lib/x86_64-linux-gnu/libunistring.so.0.1.2)
==19038==    by 0x79874F7: u8_strconv_from_encoding (in /usr/lib/x86_64-linux-gnu/libunistring.so.0.1.2)
==19038==    by 0x56AB222: idn2_lookup_ul (in /usr/lib/x86_64-linux-gnu/libidn2.so.0.1.4)
==19038==    by 0x16ABD8: fix_hostname (url.c:4054)
==19038==    by 0x16FB00: create_conn (url.c:6444)
==19038==    by 0x170A09: Curl_connect (url.c:6876)
==19038==    by 0x1422E7: multi_runsingle (multi.c:1432)
==19038==    by 0x143AC2: curl_multi_perform (multi.c:2164)
==19038==    by 0x13C498: easy_transfer (easy.c:700)
==19038==    by 0x13C675: easy_perform (easy.c:787)
==19038==    by 0x13C6BF: curl_easy_perform (easy.c:806)
==19038==    by 0x12C4DF: operate_do (tool_operate.c:1474)
==19038==    by 0x12D93E: operate (tool_operate.c:2023)
==19038==    by 0x125670: main (tool_main.c:252)
==19038== 

 - abort tests
TESTDONE: 0 tests out of 1 reported OK: 0%
TESTFAIL: These test cases failed: 1034 
TESTDONE: 1 tests were considered during 3 seconds.

dfandrich · 2017-02-23T23:07:18Z

W.r.t. test 165, I guess there's something different in ways other than the character encoding between the en_US.UTF-8 locale and UTF-8 on your machine. Ideally, we'd add a check for the important bits in the <precheck>, but probably en_US.UTF-8 is going to be more standard than plain UTF-8, and possibly even more commonly-installed, so switching might be a good idea.

W.r.t. test 1034, that looks pretty strongly like a bug in libidn2. I don't see that problem using libidn2 v0.16, so maybe it's already been fixed.

bagder · 2017-02-23T23:13:10Z

I'm rocking libidn2/0.16 as well (from Debian), so something is clearly odd in there...

As I'm about to ship a 7.53.1 release in 9-10 hours we at least need a short term decision on what to include there. Right now, just reverting ecd1d02 for that release is one option.

ismail · 2017-02-24T08:36:33Z

There is no standalone UTF-8 charset:

havana ~ > LC_CTYPE=UTF-8 perl -V
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
	LANGUAGE = (unset),
	LC_ALL = (unset),
	LC_CTYPE = "UTF-8",
	LANG = (unset)
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
Summary of my perl5 (revision 5 version 24 subversion 0) configuration:

So just replace those UTF-8 with en_US.UTF-8 and we would be fine imho.

bagder · 2017-02-24T14:29:09Z

What do you say @dfandrich?

ismail · 2017-02-24T14:32:33Z

Btw. in a perfect world we would have C.UTF-8 locale everywhere, but unfortunately glibc does not have it by default (Fedora/openSUSE does.) : https://sourceware.org/glibc/wiki/Proposals/C.UTF-8

dfandrich · 2017-02-24T18:35:18Z

W.r.t. test 165, I guess there's something different in ways other than the character encoding between the en_US.UTF-8 locale and UTF-8 on your machine. Ideally, we'd add a check for the important bits in the <precheck>, but probably en_US.UTF-8 is going to be more standard than plain UTF-8, and possibly even more commonly-installed, so switching might be a good idea.

W.r.t. test 1034, that looks pretty strongly like a bug in libidn2. I don't see that problem using libidn2 v0.16, so maybe it's already been fixed.

I've verified that the locale en_US.utf8 exists on 4 different Linux distributions, so that might be the one to use. I was pleased to see that the precheck at least made some of the Solaris builds green (because the tests were skipped, but still…)

dfandrich · 2017-02-24T18:39:15Z

Sorry, my last comment was a bit stale. I'll try a few more distros/OSes and see if there's something better than en_US.utf8, but so far it looks to me like the best bet. And even if it doesn't exist somewhere, the precheck should cause the test to be skipped there (which didn't work for the "UTF-8" locale because it must be weird in other ways).

jay · 2017-02-24T19:14:47Z

I haven't been following this but I tried the precheck in mingw msys and it fails. I don't know if that is expected.

perl -MI18N::Langinfo=langinfo,CODESET -e 'die "Needs a UTF-8 locale" if (lc(langinfo(CODESET())) ne "utf-8");'
Can't locate I18N/Langinfo.pm in @INC

ismail · 2017-02-24T19:25:04Z

@jay there seems to be a problem with your Perl installation because on Cygwin that file comes from perl_base:

latte ~ > cygcheck -f /usr/lib/perl5/5.22/x86_64-cygwin-threads/I18N/Langinfo.pm
perl_base-5.22.3-1

jay · 2017-02-24T19:34:03Z

I am using mingw which just does not have that. Its msys environment which allows running autotools comes with an old perl, 5.8.8. Again I don't know how relevant it is for what you're doing I didn't read this thread, just checking in.

ismail · 2017-02-24T20:05:46Z

It's only relevant if you are going to run curl tests. However msys2 already has latest stable Perl: https://github.com/Alexpux/MSYS2-packages/tree/master/perl in case you need it.

dfandrich · 2017-02-25T11:13:19Z

Jay, is the test at least skipped in your case? If the perl precheck fails, that's what should happen. Ideally, we'd find a detection algorithm that works everywhere, but it's probably better to skip the test than risk a false positive.

dfandrich · 2017-02-25T12:37:21Z

I've confirmed that en_US.UTF-8 works on 7 different Linux & BSD distributions. It only failed in a couple of cases when locales were not even installed (and in which IDN wouldn't really be usable, anyway). I'll push that change unless there are any objections.

bagder · 2017-02-25T12:38:13Z

Nice work Dan, go for it! 👍

ismail · 2017-02-25T14:11:29Z

Works for me, thanks!

dfandrich · 2017-02-25T14:36:15Z

Pushed as c6ddb60.

Set CHARSET & LANG to UTF-8 compatible values for idn tests 1035, 204…

aee082a

…6 and 2047

bagder approved these changes Feb 23, 2017

View reviewed changes

bagder closed this in 2bfe550 Feb 23, 2017

lock bot locked as resolved and limited conversation to collaborators May 24, 2018

Uh oh!

Uh oh!

Set CHARSET & LANG to UTF-8 compatible values for idn tests #1283

Set CHARSET & LANG to UTF-8 compatible values for idn tests #1283

Uh oh!

Conversation

ismail commented Feb 23, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mention-bot commented Feb 23, 2017

Uh oh!

bagder commented Feb 23, 2017

Uh oh!

ismail commented Feb 23, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ismail commented Feb 23, 2017

Uh oh!

ismail commented Feb 23, 2017

Uh oh!

dfandrich commented Feb 23, 2017

Uh oh!

ismail commented Feb 23, 2017

Uh oh!

ismail commented Feb 23, 2017

Uh oh!

dfandrich commented Feb 23, 2017

Uh oh!

ismail commented Feb 23, 2017

Uh oh!

dfandrich commented Feb 23, 2017

Uh oh!

dfandrich commented Feb 23, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dfandrich commented Feb 23, 2017

Uh oh!

bagder commented Feb 23, 2017

Uh oh!

bagder commented Feb 23, 2017

Uh oh!

dfandrich commented Feb 23, 2017

Uh oh!

bagder commented Feb 23, 2017

Uh oh!

ismail commented Feb 24, 2017

Uh oh!

bagder commented Feb 24, 2017

Uh oh!

ismail commented Feb 24, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dfandrich commented Feb 24, 2017

Uh oh!

dfandrich commented Feb 24, 2017

Uh oh!

jay commented Feb 24, 2017

Uh oh!

ismail commented Feb 24, 2017

Uh oh!

jay commented Feb 24, 2017

Uh oh!

ismail commented Feb 24, 2017

Uh oh!

dfandrich commented Feb 25, 2017

Uh oh!

dfandrich commented Feb 25, 2017

Uh oh!

bagder commented Feb 25, 2017

Uh oh!

ismail commented Feb 25, 2017

Uh oh!

dfandrich commented Feb 25, 2017

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

ismail commented Feb 23, 2017 •

edited

Loading

ismail commented Feb 23, 2017 •

edited

Loading

dfandrich commented Feb 23, 2017 •

edited

Loading

ismail commented Feb 24, 2017 •

edited

Loading