-
Notifications
You must be signed in to change notification settings - Fork 3.3k
[rc2] SQL Server: Don't transform equality to bitwise operations in predicate contexts #36809
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This was broken on EFCore 8 and works as intended in EFCore 9.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for submitting this @ranma42... A few comments:
- Ths approach here seems close to what I described in the OP of the other PR, which attempts to more narrowly identify cases where the bitwise transformation is problematic, rather than removing it whenever we're in a predicate.
- I can see the sense of doing that, but I'm wary of making assumptions about what the SQL Server query planner does and doesn't do (after all, this whole fix is because we were assuming it wouldn't see into CASE - which was my own assumption as well, and I think it was reasonable even if incorrect...).
- For example, IIUC you pass
allowOptimizedExpansion: falseto the operands of SqlBinaryExpression, meaning that if the CASE is nested inside AND, we'd be doing the bitwise transformation within THEN (hopefully I got that right). I'm not sure I want EF to go too far into trying to model/guess the SQL Server query planner/optimizer.
- For example, IIUC you pass
- Importantly: what's the downside of doing the broader thing as in my PR? IIUC the original bitwise transformation was done to fix bugs when the value is projected out of the database; but in my PR we only prevent the transformation in predicates, which never get projected out. So what do you think we're potentially losing by having more CASE constructs instead of bitwise? If we don't have a clear idea, I'd personally rather default to CASE (in order words, in the absence of specific data it seems less risky to me to have CASE instead of bitwise rather than the other way around).
- Regardless of all the above, in this PR I personally find the name
allowOptimizedExpansionconfusing here; I get the idea of aligning to SqlNullabilityProcessor, but in that visitor there's a single, clear meaning of "optimized expression": contexts where false and null are equivalent. Here, on the other, allowOptimizedExpansion currently seems to mean "don't transform to bitwise", which seems odd (at the very least it seems like it should start with "require" or "disallow", rather than "allow")... At least until we get other "optimizations", my preference would be a name that expresses more closely what that variable actually means/controls (e.g. flip the logic and call it allowBitwiseEqualityTransformation; or if you prefer your own approach, maybe at least rename to inPossibleIndexUsageContext).
Let me know what you think of all of the above! We have a couple more days before on of these must be merged for 10.
The idea is basically the same :)
Yes, that was definitely an invalid assumption. I would love to have something like https://www.sqlite.org/optoverview.html for SqlServer, but I am afraid it does not exist; also, I guess SqlServer has changed behavior over time regarding planning/query optimizations and I believe it might also be very complex and dependent on the live data statistics... so probably not something that EFCore can promptly rely on.
Sorry,
The risk is also around predicates, because it is legitimate to compare the result of a Boolean expression. A test that would emit a different query and a different set of results (for .NET9/this PR vs #36797) is the following one: [ConditionalFact]
public virtual void Where_not_equal_using_relational_null_semantics_complex_in_equals()
{
using var context = CreateContext(useRelationalNulls: true);
context.Entities1
.Where(e => (e.NullableBoolA != e.NullableBoolB) == e.NullableBoolC)
.Select(e => e.Id).ToList();
}
Comparison of the queries (and results) of the current(/this PR) behavior vs #36797, also available at https://dbfiddle.uk/DJ2muxpD.NET9 / #36809
The name is definitely bad, sorry, I didn't ally think much about it yesterday.
With all of these options, I can definitely see that
EDIT: added the queries/results for |
|
Thanks for all the discussion (as always), and absolutely don't worry about allowOptimizedExpansion naming :) I have to step out, but just to be sure... Are we aware of any specific issues (either bugs or performance issues) that my broader #36797 would have/create, which this PR would fix? In other words, I'm trying to be sure whether we're aware of any specific advantage between the two approach (aside from the purely simpler SQL here). At least in my current state of mind, if - as far as we know - the two are equivalent (bug- and perf-wise), I think I'd still prefer my broader approach, simply because it feels like there's more chance of some construct out there which would use the index when in the predicate with my PR and not with this one (in the same way that occured in #36291). Otherwise I'll think more about your comments and respond more in detail tomorrow! Thanks again. |
When NULLs and FALSEs are treated as equivalent, the result set includes records with some `NULL`fields; when they are interpreted following the usual relational semantics, the result set only includes the records: | NullableBoolA | NullableBoolB | NullableBoolC | | :-------------| :-------------| :-------------| | False | False | False | | False | True | True | | True | False | True | | True | True | False |
b06fa17 to
7c456bd
Compare
As I was force-updating the PR, I took the chance to also replace the name with
I updated the
I am unsure if it is considered relevant, as that test relies on
Note that the tests I added only check for correctness (which result set is returned), not for performance. I am not completely sure what is the right path for this; I definitely believe we can get the best of both worlds (efficient and correct queries) and I hope that this PR achieves that (barring known issues that are currently not being tackled).
Thank you for taking the time to look into this. I am afraid some parts of this are still not 100% clear because of the remaining glitches around nullable Booleans/BITs in SqlServer, but I am very happy to see that this is an area that is being improved upon 🚀 |
|
IIUC the deadline is approaching, so I'll try to do a brief recap. The EFCore provider for SqlServer sometimes translates nullable boolean expressions in a "lossy" way, that folds both NULL and FALSE into The two changesets from #36797 and #36809 are attempts to address the performance regression, which mainly differ in when the (in)equality is transformed into a #36797 uses the lossy translation whenever the comparison is within a predicate (in the current #36809 uses the lossy translation whenever the comparison is used in an expression that (it is known that it) does not distinguish Assuming no further bug is being introduced in either case, I would consider:
This choice involve a tradeoff for which I will defer to @roji and other EFCore maintainers. I still have plans to work more on #34001 and I believe that by providing additional nullability information to this step of the pipeline, a translation that is both correct and as efficient as expected could be implemented... but that will require a few additional intermediate changes, definitely not something for this RC (not this major). |
|
@ranma42 sorry I didn't get around to looking at this more yesterday, and thanks for the recap. I just spent some time thinking about this, and my understanding corresponds to the summary you just posted. Specifically, thanks for putting together a scenario which fails (correctness) with my broader PR, but passes with yours. I agree there's really no 100% satisfactory answer here, and I have a nagging doubt that there may be additional perf issues which would exist with your PR but not with mine... That's especially painful since performance issues like this are pretty hard for users to spot and narrow down (as in #36291). However, we're at a point where my PR still has a known correctness issue (the scenario you added in Where_not_equal_using_relational_null_semantics_complex_in_equals), whereas your PR doesn't have a known problem (either correctness or performance). So I'm going to go ahead and merge your PR for 10 rather than mine, and we can always revisit this again in the future. Thanks again (as always) for your investigation and insights!
Note #23125 which tracks adding a get-the-query-plan feature in EF; once that's done we could also assert on it in tests. As with SQL baselines, that would mainly help us avoid regressing performance, for cases where we originally suspected that there's a possible query performance concern and turned on the plan assertion. In other words, I somewhat doubt that this would have helped us catch this particular perf regression (though it might have). |
|
@artl93 @SamMonoRT I am merging this for RC2 in place of #36797, which has already been approved; #36797 and this PR do very similar changes, fix the same bug and the same servicing template notes apply to both. So to save time I'll go ahead and "transfer" the approval from #36797 to this PR. |
This is an alternative implementation of #36797.
I added a test which I think might make sense regardless, as it checks that the handling of nullable bools does not regress, at least in trivial cases (specifically, that test would fail on EFCore 8 and pass on 9).
The main difference in the approach when compared to #36797 is that instead of a field in the visitor, the
allowOptimizedExpansioninformation is passed while visiting, just like theinSearchConditionContext.The
allowOptimizedExpansionis named like this to match the same value in theSqlNullabilityProcessoras it has the same semantics, namely that "falsy" results (NULLs, FALSEs) can be clumped together.