Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Make sure we use the C locale for regcomp on macOS. #3953

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

arthurschreiber
Copy link
Member

@arthurschreiber arthurschreiber commented Oct 5, 2016

On macOS (and probably on other systems as well), regcomp behaves differently based on the current locale.

On macOS, this was causing some issues when enabling diff drivers: The [\xc0-\xff][\x80-\xbf] part of the regular expression defined at https://github.com/libgit2/libgit2/blob/master/src/userdiff.h#L23 was incorrectly parsed as multibyte if the locale character type was set to UTF-8 (the default on macOS), and made regcomp fail with a REG_ILLSEQ error.

Fortunately, macOS ships with regcomp_l, which allows specifying a fixed locale instead of using whatever locale was set on the system or via environment variables.

/cc @carlosmn @ethomson

@@ -507,6 +507,11 @@ ELSE ()
ENDIF ()
ENDIF()

CHECK_FUNCTION_EXISTS(regcomp_l HAVE_REGCOMP_L)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this use CHECK_SYMBOL_EXISTS(regcomp_l "xlocale.h" HAVE_REGCOMP_L) instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's what it looks like from its documentation. I wonder if CHECK_FUNCTION_EXISTS might work here as well, since it's system-provided.

@arthurschreiber arthurschreiber force-pushed the arthur/fix-regcomp-locale-issues branch 2 times, most recently from c4fedd9 to 74d243e Compare October 5, 2016 13:11
#include <xlocale.h>
GIT_INLINE(int) p_regcomp(regex_t *preg, const char *pattern, int cflags)
{
return regcomp_l(preg, pattern, cflags, (locale_t) 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If (locale_t) 0 defined somewhere to be special? I've found LC_GLOBAL_LOCALE in the headers which is defined as (locale_t) -1 but not for 0.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, I can't find anything either.

Initially, I created a new locale through newlocale(LC_ALL_MASK, "C", (locale_t) 0);, but I tested using (locale_t) 0 directly and that seems to work as well.

We could change it back to using newlocale and storing that locale somewhere if you think that'd be safer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From Apple's macOS source code release:

int
tre_regncomp_l(regex_t *preg, const char *regex, size_t n, int cflags, locale_t loc)
{
  ...

  NORMALIZE_LOCALE(loc);

  ...
}
#define NORMALIZE_LOCALE(x) if ((x) == NULL) { \
    (x) = _c_locale; \
} else if ((x) == LC_GLOBAL_LOCALE) { \
    (x) = &__global_locale; \
}

So we should be fine passing (locale_t) 0 (or even just NULL) here. Do you want me to change anything here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

newlocale does mention LC_GLOBAL_LOCALE as not being a valid input as the base, but (locale_t) 0 is, so I suppose it's fine to leave it with 0.

@carlosmn
Copy link
Member

carlosmn commented Oct 5, 2016

It'd be great to have a test here so we don't break this in the future. It should also let us know whether this kind of thing would fail in systems we're not aware of yet.

@arthurschreiber
Copy link
Member Author

It'd be great to have a test here so we don't break this in the future. It should also let us know whether this kind of thing would fail in systems we're not aware of yet.

What git does for testing the userdiff patterns is just looping over every pattern and trying to regcomp each one. Would that be good enough as a test?

@carlosmn
Copy link
Member

carlosmn commented Oct 5, 2016

Yes, that should be enough. The main thing would be to test out few patterns which we know should fail if we don't do this fix, to make sure this does do something.

@arthurschreiber arthurschreiber force-pushed the arthur/fix-regcomp-locale-issues branch from dbc9985 to 371acaa Compare October 5, 2016 16:02
@arthurschreiber
Copy link
Member Author

I added some tests, I hope they cover the basic functionality here.

@carlosmn
Copy link
Member

carlosmn commented Oct 5, 2016

Those tests should be enough to notice things going awry.

@arthurschreiber arthurschreiber force-pushed the arthur/fix-regcomp-locale-issues branch from 0812acc to ab96ca5 Compare October 6, 2016 11:17
@carlosmn carlosmn merged commit d11fcf8 into libgit2:master Oct 6, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants