-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Make sure we use the C
locale for regcomp
on macOS.
#3953
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make sure we use the C
locale for regcomp
on macOS.
#3953
Conversation
@@ -507,6 +507,11 @@ ELSE () | |||
ENDIF () | |||
ENDIF() | |||
|
|||
CHECK_FUNCTION_EXISTS(regcomp_l HAVE_REGCOMP_L) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this use CHECK_SYMBOL_EXISTS(regcomp_l "xlocale.h" HAVE_REGCOMP_L)
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's what it looks like from its documentation. I wonder if CHECK_FUNCTION_EXISTS
might work here as well, since it's system-provided.
c4fedd9
to
74d243e
Compare
#include <xlocale.h> | ||
GIT_INLINE(int) p_regcomp(regex_t *preg, const char *pattern, int cflags) | ||
{ | ||
return regcomp_l(preg, pattern, cflags, (locale_t) 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If (locale_t) 0
defined somewhere to be special? I've found LC_GLOBAL_LOCALE
in the headers which is defined as (locale_t) -1
but not for 0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, I can't find anything either.
Initially, I created a new locale through newlocale(LC_ALL_MASK, "C", (locale_t) 0);
, but I tested using (locale_t) 0
directly and that seems to work as well.
We could change it back to using newlocale
and storing that locale somewhere if you think that'd be safer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From Apple's macOS source code release:
int
tre_regncomp_l(regex_t *preg, const char *regex, size_t n, int cflags, locale_t loc)
{
...
NORMALIZE_LOCALE(loc);
...
}
#define NORMALIZE_LOCALE(x) if ((x) == NULL) { \
(x) = _c_locale; \
} else if ((x) == LC_GLOBAL_LOCALE) { \
(x) = &__global_locale; \
}
So we should be fine passing (locale_t) 0
(or even just NULL
) here. Do you want me to change anything here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
newlocale
does mention LC_GLOBAL_LOCALE
as not being a valid input as the base, but (locale_t) 0
is, so I suppose it's fine to leave it with 0.
It'd be great to have a test here so we don't break this in the future. It should also let us know whether this kind of thing would fail in systems we're not aware of yet. |
What |
Yes, that should be enough. The main thing would be to test out few patterns which we know should fail if we don't do this fix, to make sure this does do something. |
dbc9985
to
371acaa
Compare
I added some tests, I hope they cover the basic functionality here. |
Those tests should be enough to notice things going awry. |
0812acc
to
ab96ca5
Compare
On macOS (and probably on other systems as well),
regcomp
behaves differently based on the currentlocale
.On macOS, this was causing some issues when enabling diff drivers: The
[\xc0-\xff][\x80-\xbf]
part of the regular expression defined at https://github.com/libgit2/libgit2/blob/master/src/userdiff.h#L23 was incorrectly parsed as multibyte if the locale character type was set toUTF-8
(the default on macOS), and maderegcomp
fail with aREG_ILLSEQ
error.Fortunately, macOS ships with
regcomp_l
, which allows specifying a fixed locale instead of using whatever locale was set on the system or via environment variables./cc @carlosmn @ethomson