Thanks to visit codestin.com
Credit goes to github.com

Skip to content

<regex> mishandles locale-based character classes outside of the char range #992

@AlexGuteniev

Description

@AlexGuteniev

Describe the bug
Regex does not handle non-ASCII characters.

@BillyONeal comments:

This is a longstanding bug in our regex engine -- when we form negated character classes (like \S), we negate the bitmap used for encoding units in the range [0-255], but don't have correct handling for encoding units outside that. We've known about this problem since at least October 5 of 2016, but it's ABI breaking to fix :(.

Command-line test case

d:\Temp2>type repro.cpp
#include <regex>
#include <iostream>

bool test(std::wstring line, std::wstring query)
{
    std::wregex regex(query);
    std::wsmatch res;
    return std::regex_search(line, res, regex);
}

int main()
{
    std::cout << test(L"xxxYxx\x0078xxxZxxx", L"Y\\S*Z") << std::endl; // 0078 is small Latin X
    std::cout << test(L"xxxYxx\xCF87xxxZxxx", L"Y\\S*Z") << std::endl; // CF87 is small Greek Chi 
}

d:\Temp2>cl /EHsc /W4 /WX .\repro.cpp
Microsoft (R) C/C++ Optimizing Compiler Version 19.27.29009.1 for x86
Copyright (C) Microsoft Corporation.  All rights reserved.

repro.cpp
Microsoft (R) Incremental Linker Version 14.27.29009.1
Copyright (C) Microsoft Corporation.  All rights reserved.

/out:repro.exe
repro.obj

d:\Temp2>.\repro.exe
1
0

Expected behavior
Given example should match, the correct output is:

1
1

STL version

Microsoft Visual Studio Professional 2019 Preview
Version 16.7.0 Preview 3.1

Additional context
Original repro:

#include <regex>
#include <iostream>

int main()
{
	std::wstring line = L"受注、製造、購買オーダのスケジューリングと自動補充生産機能";
	std::wstring query = L"造\\S*オ";
	std::wregex regex(query);
	std::wsmatch res;
	bool found = std::regex_search(line, res, regex);

	std::cout << found << std::endl;
}

This item is also tracked on Developer Community as DevCom-984204 and by Microsoft-internal VSO-273702 / AB#273702.

See also #405

vNext note: Resolving this issue will require breaking binary compatibility. We won't be able to accept pull requests for this issue until the vNext branch is available. See #169 for more information.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingfixedSomething works now, yay!regexmeow is a substring of homeowner

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions