-
-
Couldn't load subscription status.
- Fork 6.9k
Set CHARSET & LANG to UTF-8 compatible values for idn tests #1283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@ismail, thanks for your PR! By analyzing the history of the files in this pull request, we identified @dfandrich, @bagder and @mkauf to be potential reviewers. |
|
thanks! |
|
Ah, just wanted to write, this doesn't seem to work. I still get the same error. setenv blocks (looks like) are not setting the environment variables but HTTP variables only. Sorry my test literally just finished. |
|
Ah no never mind, it does work. I was setting LC_ALL=POSIX which was overwriting LANG. So it's good to go. |
|
And here is the log:
|
|
I don't agree with the change to test1035. The URL is NOT in UTF-8 but rather ISO-8859-1. The test succeeds because the URL fails to be decoded, despite the incorrect character set. Does the test work for you with this in the section: CHARSET=ISO-8859-1 I believe that last one is the most standard one, and probably all the tests should be changed this way. I'm not sure how standard ISO-8859-1 is as a locale specifier, though, but it works for glibc. |
|
There is no ISO-8859-1 locale though: |
|
Also just tested, adding results in a test failure. |
|
This could be tricky to find a cross-platform way of specifying an ISO 8859/1. This works fine for glibc, but I don't know if there's one that works everywhere. What is the output of: CHARSET=ISO-8859-1 LC_CTYPE=ISO-8859-1 LANG=ISO-8859-1 locale charmap Does this command show anything that looks sufficiently generic enough: locale -a | grep 8859 |
|
on openSUSE on Ubuntu 16.10: |
|
It looks like you don't have any ISO 8859/1 locales on those machine, which technically means that you shouldn't be able to run test 1035. This looks like it's going to be tricky to solve consistently in a cross-platform manner. Setting a UTF-8 locale is a hack that works on your machines but will fail on other people's machines (that don't have a UTF-8 locale). I think the answer is probably going to be a that ensures that the necessary locale is available on the target machine. |
|
Wait a moment—there's no actual reason test 1035 needs to be in ISO 8859/1 (it's not relevant to the test). If it's converted to UTF-8, then they'll all be consistent. Does it work for you with the following in CHARSET=UTF-8 |
|
I've pushed a change that I hope will fix this everywhere. I'm hoping the autobuilds will show these tests working (or being skipped) on Solaris and Windows instead of unconditionally failing now. |
|
I'm afraid it broke already for me, in at least two different ways! Problem 1: It works again with this patch: diff --git a/tests/data/test165 b/tests/data/test165
index 4d48c0c65..c1b70f70c 100644
--- a/tests/data/test165
+++ b/tests/data/test165
@@ -31,11 +31,11 @@ http
idn
</features>
<setenv>
CHARSET=UTF-8
LC_ALL=
-LC_CTYPE=UTF-8
+LC_CTYPE=en_US.UTF-8
</setenv>
<precheck>
perl -MI18N::Langinfo=langinfo,CODESET -e 'die "Needs a UTF-8 locale" if (lc(langinfo(CODESET())) ne "utf-8");'
</precheck>
<name> |
|
Problem 2, more spectacular and not necessarily a problem in our code: |
|
W.r.t. test 165, I guess there's something different in ways other than the character encoding between the en_US.UTF-8 locale and UTF-8 on your machine. Ideally, we'd add a check for the important bits in the W.r.t. test 1034, that looks pretty strongly like a bug in libidn2. I don't see that problem using libidn2 v0.16, so maybe it's already been fixed. |
|
I'm rocking libidn2/0.16 as well (from Debian), so something is clearly odd in there... As I'm about to ship a 7.53.1 release in 9-10 hours we at least need a short term decision on what to include there. Right now, just reverting ecd1d02 for that release is one option. |
|
There is no standalone UTF-8 charset: So just replace those UTF-8 with en_US.UTF-8 and we would be fine imho. |
|
What do you say @dfandrich? |
|
Btw. in a perfect world we would have C.UTF-8 locale everywhere, but unfortunately glibc does not have it by default (Fedora/openSUSE does.) : https://sourceware.org/glibc/wiki/Proposals/C.UTF-8 |
|
W.r.t. test 165, I guess there's something different in ways other than the character encoding between the en_US.UTF-8 locale and UTF-8 on your machine. Ideally, we'd add a check for the important bits in the W.r.t. test 1034, that looks pretty strongly like a bug in libidn2. I don't see that problem using libidn2 v0.16, so maybe it's already been fixed. I've verified that the locale en_US.utf8 exists on 4 different Linux distributions, so that might be the one to use. I was pleased to see that the precheck at least made some of the Solaris builds green (because the tests were skipped, but still…) |
|
Sorry, my last comment was a bit stale. I'll try a few more distros/OSes and see if there's something better than en_US.utf8, but so far it looks to me like the best bet. And even if it doesn't exist somewhere, the precheck should cause the test to be skipped there (which didn't work for the "UTF-8" locale because it must be weird in other ways). |
|
I haven't been following this but I tried the precheck in mingw msys and it fails. I don't know if that is expected. |
|
@jay there seems to be a problem with your Perl installation because on Cygwin that file comes from perl_base: |
|
I am using mingw which just does not have that. Its msys environment which allows running autotools comes with an old perl, 5.8.8. Again I don't know how relevant it is for what you're doing I didn't read this thread, just checking in. |
|
It's only relevant if you are going to run curl tests. However msys2 already has latest stable Perl: https://github.com/Alexpux/MSYS2-packages/tree/master/perl in case you need it. |
|
Jay, is the test at least skipped in your case? If the perl precheck fails, that's what should happen. Ideally, we'd find a detection algorithm that works everywhere, but it's probably better to skip the test than risk a false positive. |
|
I've confirmed that en_US.UTF-8 works on 7 different Linux & BSD distributions. It only failed in a couple of cases when locales were not even installed (and in which IDN wouldn't really be usable, anyway). I'll push that change unless there are any objections. |
|
Nice work Dan, go for it! 👍 |
|
Works for me, thanks! |
|
Pushed as c6ddb60. |
Tests 1035, 2046 and 2047 needs LANG set to a UTF-8 compatible value for correctly testing idn features.