Thanks to visit codestin.com
Credit goes to github.com

Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
164bb0a
Implemented IndexOf and LastIndexOf functions
mkhamoyan May 30, 2023
1e4e9a5
Updated test cases
mkhamoyan May 31, 2023
292b915
Remove not needed parts
mkhamoyan May 31, 2023
9d23f7b
Implemented IsPrefix, IsSuffix functions
mkhamoyan Jun 1, 2023
1f8b5d5
Remove logs
mkhamoyan Jun 1, 2023
7b91b6c
Fix CI build failures
mkhamoyan Jun 2, 2023
c4a2d3c
Refactored
mkhamoyan Jun 2, 2023
001c366
Fixed some test cases for OSX
mkhamoyan Jun 6, 2023
0ec8d79
Minor changes in test cases
mkhamoyan Jun 6, 2023
fa7322c
test case minor refactoring
mkhamoyan Jun 6, 2023
fc2837b
Merge branch 'main' into hybrid_collation_functions
mkhamoyan Jun 6, 2023
a7ab572
Merge branch 'main' into hybrid_collation_functions
mkhamoyan Jun 7, 2023
36de1a9
Merge branch 'main' into hybrid_collation_functions
mkhamoyan Jun 14, 2023
139a6ba
Changed IndexOf functions implementation
mkhamoyan Jun 15, 2023
f3da1e5
Fix build failue
mkhamoyan Jun 15, 2023
5c3f172
Minor fixes
mkhamoyan Jun 15, 2023
ab317f7
Minor fix
mkhamoyan Jun 15, 2023
24094fe
Refactor as per review comments
mkhamoyan Jun 16, 2023
c425d38
Refactored Indexing functions calls
mkhamoyan Jun 16, 2023
a3f44b4
Updated doc and added comments
mkhamoyan Jun 16, 2023
cbaaf80
Applied changes suggested by @jkotas
mkhamoyan Jun 16, 2023
caebcbe
Refactored some files
mkhamoyan Jun 19, 2023
71b026e
Make the doc more readable
mkhamoyan Jun 19, 2023
9dda83a
Refactored IndexOf function
mkhamoyan Jun 19, 2023
88c6861
Add more comments in IndexOF function
mkhamoyan Jun 19, 2023
460aba0
remove localizedStandardRangeOfString
mkhamoyan Jun 20, 2023
db0a8f8
Initial changes for casing functions
mkhamoyan Jun 20, 2023
3d5a195
Added exception in case mixed compositions
mkhamoyan Jun 21, 2023
006bdb7
Merge branch 'hybrid_collation_functions' into hybrid_casing_functions
mkhamoyan Jun 21, 2023
24136fc
Merge branch 'main' into hybrid_casing_functions
mkhamoyan Jun 21, 2023
d9fa03b
Update test cases
mkhamoyan Jun 21, 2023
356b250
Refactor casing functions
mkhamoyan Jun 22, 2023
040f214
Updated doc and did refactoing
mkhamoyan Jun 22, 2023
583f7bb
align code lines
mkhamoyan Jun 22, 2023
a244198
Update test comment
mkhamoyan Jun 22, 2023
f7823a3
Merge branch 'main' into hybrid_casing_functions
mkhamoyan Jun 23, 2023
8c2efb4
Order alphabetically function declarations
mkhamoyan Jun 23, 2023
ba2f1b3
Done minor refactoring
mkhamoyan Jun 23, 2023
ddd17c4
Refactor as requested by review
mkhamoyan Jun 26, 2023
f39dd83
Minor refactoring
mkhamoyan Jun 26, 2023
f46a9ba
Fix casing function implementation
mkhamoyan Jun 27, 2023
6661e4f
Update doc and test cases
mkhamoyan Jun 27, 2023
60a945a
Fix index in Append function
mkhamoyan Jun 27, 2023
2257cbe
minor refactoring
mkhamoyan Jun 27, 2023
3ff5de7
Update method comment and remove GetCurrentLocale
mkhamoyan Jun 27, 2023
bb6fbf5
Use Interop.GlobalizationInterop.ResultCode
mkhamoyan Jun 28, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
remove localizedStandardRangeOfString
  • Loading branch information
mkhamoyan committed Jun 20, 2023
commit 460aba0101bc6c48a6f7304d72c07cdd666c7a2f
8 changes: 4 additions & 4 deletions docs/design/features/globalization-hybrid-mode.md
Original file line number Diff line number Diff line change
Expand Up @@ -361,13 +361,13 @@ Affected public APIs:
Mapped to Apple Native API `rangeOfString:options:range:locale:`(https://developer.apple.com/documentation/foundation/nsstring/1417348-rangeofstring?language=objc)

In `rangeOfString:options:range:locale:` objects are compared by checking the Unicode canonical equivalence of their code point sequences.
In cases where search string contains diaeresis and has different normalization form than in source string result can be incorrect.
In cases where search string contains diacritics and has different normalization form than in source string result can be incorrect.

Characters in general are represented by unicode code points, and some characters can be represented in a single code point or by combining multiple characters (like diacritics/diaeresis). Normalization Form C will look to compress characters to their single code point format if they were originally represented as a sequence of multiple code points. Normalization Form D does the opposite and expands characters into their multiple code point formats if possible.

`NSString` `rangeOfString:options:range:locale:` uses canonical equivalence to find the position of the `searchString` within the `sourceString`, however, it does not automatically handle comparison of precomposed (single code point representation) or decomposed (most code points representation). Because the `searchString` and `sourceString` can be of differing formats, to properly find the index, we need to ensure that the searchString is in the same form as the sourceString by checking the `rangeOfString:options:range:locale:` using every single normalization form.

Here are the covered cases with diaeresis:
Here are the covered cases with diacritics:
1. Search string contains diaeresis and has same normalization form as in source string.
2. Search string contains diaeresis but with source string they have same letters with different char lengths but substring is normalized in source.

Expand All @@ -377,8 +377,8 @@ Here are the covered cases with diaeresis:

Not covered case:

Search string contains diaeresis and with source string they have same letters with different char lengths but substring is not
normalized in source. example: search string: `U\u0308 and \u00FC` source string: `Source is a\u0308\u0308a and \u0075\u0308`
Search string contains diacritics and with source string they have same letters with different char lengths but substring is not
normalized in source. example: search string: `U\u0308 and \u00FC` (Ü and ü) source string: `Source is \u00DC and \u0075\u0308` (Source is Ü and ü)
as it is visible from example normalizaing search string to form C or D will not help to find substring in source string.

- `IgnoreSymbols`
Expand Down
66 changes: 32 additions & 34 deletions src/native/libs/System.Globalization.Native/pal_collation.m
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,17 @@ int32_t GlobalizationNative_CompareStringNative(const uint16_t* localeName, int3
return modifiedString;
}

static int32_t IsIndexFound(int32_t fromBeginning, int32_t foundLocation, int32_t newLocation)
{
// last index
if (!fromBeginning && foundLocation > newLocation)
return 1;
// first index
if (fromBeginning && foundLocation != -2 && foundLocation < newLocation)
return 1;
return 0;
}

/*
Function: IndexOf
Find detailed explanation how this function works in https://github.com/dotnet/runtime/blob/main/docs/design/features/globalization-hybrid-mode.md
Expand Down Expand Up @@ -138,22 +149,10 @@ Range GlobalizationNative_IndexOfNative(const uint16_t* localeName, int32_t lNam
locale:currentLocale];

if (containsRange.location == NSNotFound)
{
return result;
}

// localizedStandardRangeOfString is performing a case and diacritic insensitive, locale-aware search and finding first occurance
if ((comparisonOptions & IgnoreCase) && lNameLength == 0 && fromBeginning)
{
NSRange localizedStandardRange = [sourceStrCleaned localizedStandardRangeOfString:searchStrCleaned];
if (localizedStandardRange.location != NSNotFound)
{
result.location = localizedStandardRange.location;
result.length = localizedStandardRange.length;
return result;
}
}

// in case search string is inside source string but we can't find the index return -2
result.location = -2;
// sourceString and searchString possibly have the same composition of characters
rangeOfReceiverToSearch = NSMakeRange(0, sourceStrCleaned.length);
NSRange nsRange = [sourceStrCleaned rangeOfString:searchStrCleaned
Expand All @@ -165,35 +164,34 @@ Range GlobalizationNative_IndexOfNative(const uint16_t* localeName, int32_t lNam
{
result.location = nsRange.location;
result.length = nsRange.length;
// in case of last index and CompareOptions.IgnoreCase
// if letters have different representations in source and search strings
// and case insensitive search appears more than one time in source string take last index
// in case of CompareOptions.IgnoreCase if letters have different representations in source and search strings
// and case insensitive search appears more than one time in source string take last index for LastIndexOf and first index for IndexOf
// e.g. new CultureInfo().CompareInfo.LastIndexOf("Is \u0055\u0308 or \u0075\u0308 the same as \u00DC or \u00FC?", "U\u0308", 25,18, CompareOptions.IgnoreCase);
// should return 24 but here it will be 9
if (fromBeginning || !(comparisonOptions & IgnoreCase))
if (!(comparisonOptions & IgnoreCase))
return result;
}

rangeOfReceiverToSearch = NSMakeRange(0, sourceStrCleaned.length);

// check if sourceString has precomposed form of characters and searchString has decomposed form of characters
// convert searchString to a precomposed form
NSRange precomposedRange = [sourceStrCleaned rangeOfString:searchStrPrecomposed
options:options
range:rangeOfReceiverToSearch
locale:currentLocale];
options:options
range:rangeOfReceiverToSearch
locale:currentLocale];

if (precomposedRange.location != NSNotFound)
{
// in case of last index and CompareOptions.IgnoreCase
// if letters have different representations in source and search strings
// and search appears more than one time in source string take last index
// in case of CompareOptions.IgnoreCase if letters have different representations in source and search strings
// and search appears more than one time in source string take last index for LastIndexOf and first index for IndexOf
// e.g. new CultureInfo().CompareInfo.LastIndexOf("Is \u0055\u0308 or \u0075\u0308 the same as \u00DC or \u00FC?", "U\u0308", 25,18, CompareOptions.IgnoreCase);
// this will return 24
if ((int32_t)result.location > (int32_t)precomposedRange.location && !fromBeginning && (comparisonOptions & IgnoreCase))
// this will return 24
if ((comparisonOptions & IgnoreCase) && IsIndexFound(fromBeginning, (int32_t)result.location, (int32_t)precomposedRange.location))
return result;

result.location = precomposedRange.location;
result.length = precomposedRange.length;
return result;
if (!(comparisonOptions & IgnoreCase))
return result;
}

// check if sourceString has decomposed form of characters and searchString has precomposed form of characters
Expand All @@ -206,21 +204,22 @@ Range GlobalizationNative_IndexOfNative(const uint16_t* localeName, int32_t lNam

if (decomposedRange.location != NSNotFound)
{
if ((comparisonOptions & IgnoreCase) && IsIndexFound(fromBeginning, (int32_t)result.location, (int32_t)decomposedRange.location))
return result;

result.location = decomposedRange.location;
result.length = decomposedRange.length;
return result;
}

result.location = -2;
return result;
}

/*
Return value is a "Win32 BOOL" (1 = true, 0 = false)
*/
int32_t GlobalizationNative_StartsWithNative(const uint16_t* localeName, int32_t lNameLength, const uint16_t* lpPrefix, int32_t cwPrefixLength,
const uint16_t* lpSource, int32_t cwSourceLength, int32_t comparisonOptions)

const uint16_t* lpSource, int32_t cwSourceLength, int32_t comparisonOptions)
{
NSStringCompareOptions options = ConvertFromCompareOptionsToNSStringCompareOptions(comparisonOptions);

Expand All @@ -247,8 +246,7 @@ int32_t GlobalizationNative_StartsWithNative(const uint16_t* localeName, int32_t
Return value is a "Win32 BOOL" (1 = true, 0 = false)
*/
int32_t GlobalizationNative_EndsWithNative(const uint16_t* localeName, int32_t lNameLength, const uint16_t* lpSuffix, int32_t cwSuffixLength,
const uint16_t* lpSource, int32_t cwSourceLength, int32_t comparisonOptions)

const uint16_t* lpSource, int32_t cwSourceLength, int32_t comparisonOptions)
{
NSStringCompareOptions options = ConvertFromCompareOptionsToNSStringCompareOptions(comparisonOptions);

Expand Down