Thanks to visit codestin.com
Credit goes to github.com

Skip to content

array_unique: add way to compare items with an identity check (===) #10526

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wernerm opened this issue Feb 7, 2023 · 10 comments
Closed

array_unique: add way to compare items with an identity check (===) #10526

wernerm opened this issue Feb 7, 2023 · 10 comments

Comments

@wernerm
Copy link

wernerm commented Feb 7, 2023

Description

The following code:

<?php
var_dump(array_unique([1, "1"], SORT_REGULAR));

Resulted in this output:

array(1) {
  [0]=>
  int(1)
}

But I expected this output instead:

array(2) {
  [0]=>
  int(1)
  [1]=>
  string(1) "1"
}

Adding the SORT_REGULAR flag should result in items being compared normally (not changing item types).

PHP Version

PHP 8.0.20

Operating System

No response

@Girgias
Copy link
Member

Girgias commented Feb 7, 2023

This is the expected behaviour, as SORT_REGULAR uses == under the hood.

From the sort() page (which has a better description then array_unique())

SORT_REGULAR - compare items normally; the details are described in the comparison operators section

Therefore, changing this to a feature request, by possibly introducing a new flag.

@Girgias Girgias changed the title array_unique: Implicit type-casting is performed in spite of the SORT_REGULAR flag array_unique: add way to compare items with an identity check (===) Feb 7, 2023
@nielsdos
Copy link
Member

nielsdos commented Feb 7, 2023

I just wonder what the behaviour should be in case someone decides to use this new flag in sorting and the values aren't comparable because they are of different types. An exception maybe? But then again that's probably a bit strange because array_unique wouldn't have this issue.

@blar
Copy link
Contributor

blar commented Feb 7, 2023

Should the default behaviour of array_uniq depend on declare(strict_types = 1)?

@damianwadley
Copy link
Member

Should the default behaviour of array_uniq depend on declare(strict_types = 1)?

strict_types is about function arguments at call time. It shouldn't control runtime behavior this way.

@KapitanOczywisty
Copy link

I just wonder what the behaviour should be in case someone decides to use this new flag in sorting and the values aren't comparable because they are of different types. An exception maybe? But then again that's probably a bit strange because array_unique wouldn't have this issue.

New sorting method have to sort by type first, because array_unique is relaying on consistent comparisons for optimization, so if used in sort it would sort by types then values - probably useless but would work fine.

Thus new method would look something like this:

function cmp_strict(mixed $a, mixed $b): int
{
	if ($a === $b) return 0;

	$typeA = gettype($a);
	$typeB = gettype($b);
	if ($typeA === $typeB) {
		// abstract function to compare values without type juggling
		return cmp_values_strict($a, $b);
	} else {
		return $typeA <=> $typeB;
	}
}

Therefore, changing this to a feature request, by possibly introducing a new flag.

As with sort functions, array_unique could allow use of custom comparison function to cover more complex cases (e.g. when multiple values have same meaning), though flag would be more convenient in almost every case.

@nielsdos
Copy link
Member

nielsdos commented Feb 8, 2023

Thanks for pointing that out, it looks indeed like a logical comparison function. I can spend some time this week to try and implement a PoC.

@nielsdos
Copy link
Member

Err wait... The example code of a comparator posted here wouldn't work too... For example cmp_strict(0, '0') would cause the branch with the spaceship operator to be executed, which results in a return value of 0. This means that array_unique would remove on of the two, which is the exact problem in this issue report.

The main problem is that there seems to be no total order relation on elements of all sorts of different type combinations. For example: is '0'<0, or is 0<'0'? If both answers are yes then we have an inconsistency in sorting. If both answers are no then they must be equal, which causes OP's problem. In other cases the ordering is arbitrary.

So although this issue proposes to add a new flag, we cannot use that flag in a consistent way for sorting. The flag would be unique to array_unique and must be handled in a special way AFAICT. Or we have to come up with an ordering scheme that works for all type combinations.

@KapitanOczywisty
Copy link

For example cmp_strict(0, '0') would cause the branch with the spaceship operator to be executed, which results in a return value of 0.

@nielsdos Branch with spaceship operator is only comparing types, which are always string and always starting with letter, so it's safe comparison. https://3v4l.org/jnLno

For example: is '0'<0, or is 0<'0'?

"integer" < "string" thus 0<'0'.

So although this issue proposes to add a new flag, we cannot use that flag in a consistent way for sorting.

Sorting by type first is consistent way of sorting, not sure if should be allowed in sort, but won't hurt anything.

Or we have to come up with an ordering scheme that works for all type combinations.

Comparing types first should be faster for mixed arrays, which is ideal for array_unique.

This is a bit closer to proper implementation: https://3v4l.org/2oRQt

@nielsdos
Copy link
Member

@KapitanOczywisty Sorry, my bad, I misread the code. So the solution here is the "arbitrary order" option I listed which seems not super arbitrary after all since it seems to be consistent with zend_compare. I'll try this tomorrow.

nielsdos added a commit to nielsdos/php-src that referenced this issue Feb 11, 2023
… identity check (===)

Implements phpGH-10526.

This adds a SORT_STRICT flag to use in sorting and in array_unique,
although it is most useful in the latter case.

If the types are equal we will use the identity check first before using
zend_compare to avoid possibly inconsistent results (e.g. null and false
are equal according to zend_compare). If the types are equal, but the
values are not identical we can rely on zend_compare to perform the
comparison without introducing inconsistencies.

If the types are not equal, we will sort in a way that groups the
values of the same type together, and within those groups the values
are sorted. We determine the order of the types based on the
alphabetical order (as exposed to userland!), which is consistent with
the inequality comparisons of `zend_compare`
(e.g. false < object < true etc.)
Note that double comes before long because the name exposed to userland
is float which comes alphabetically before long.
@iluuu1994
Copy link
Member

This is essentially the same as #9775, so I'm closing this as a duplicate. Please continue discussion there if you have anything to add.

@iluuu1994 iluuu1994 closed this as not planned Won't fix, can't repro, duplicate, stale Mar 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants