-
Notifications
You must be signed in to change notification settings - Fork 5k
[API Proposal]: Interface dispatch cache to avoid method resolution at runtime #90592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Tagging subscribers to this area: @dotnet/area-system-runtime Issue DetailsBackground and motivationIt's known that virtual method call is faster than interface method call generally. Many internal abstractions within .NET and ASP.NET Core designed using abstract classes. I, personally, use the same technique in my open-source library. Location of actual implementation of the virtual method is just an offset within method table. But interface requires one more level of indirection and some instructions for resolution inserted at call site. This proposal provides a way to remove this indirection and offers the equal cost of interface method call and virtual method call. PGO may help to devirtualize interface method call and convert the call site to monomorphic call if it knows that there is only one implementation of the interface detected at the call site. If not, call site converted to polymorphic and then megamorphic version (which is the slowest). However, this optimization is applied at method level where the call site located. public sealed class MyClass
{
private readonly ISpanFormattable _formattable;
public MyClass(ISpanFormattable formattable)
{
_formattable = formattable;
}
public string FormatToString() => _formattable.Format(format: null, provider: null);
} There is one JITted/compiled version of So the main concept of this proposal is to move knowledge about actual interface implementation from type level to instance level. API ProposalA cache can be represented by special value type: namespace System.Runtime.CompilerServices;
public readonly struct InterfaceDispatchCache<T>
where T : class
{
public readonly T Instance;
public InterfaceDispatchCache(T instance); // actual implementation is provided by the runtime depends on actual type T
} This value type is specially treated by JIT/AOT in the following aspects: [StructLayout(LayoutKind.Sequential)]
public readonly struct InterfaceDispatchCache<ISpanFormattable>
{
public readonly ISpanFormattable Instance; // always at offset 0
private readonly void* method0; // pointer to ISpanFormattable.TryFormat method implementation
private readonly void* method1; // pointer to IFormattable.Format method implementation
public InterfaceDispatchCache(ISpanFormattable instance) // runtime-generated ctor
{
Instance = instance ?? throw new ArgumentNullException(nameof(instance));
method0 = ldvirtftn instance.TryFormat; // pseudo code using existing ldvirtftn IL opcode
method1 = ldvirtftn instance.Format
}
} JIT/AOT compiler specially treats InterfaceDispatchCache<ISpanFormattable> cache = ...;
cache.Instance.Format(format: null, provider: null); // converted to 'call cache.method1' API Usagepublic sealed class MyClass
{
private readonly InterfaceDispatchCache<ISpanFormattable> dispatchCache;
public MyClass(ISpanFormattable formattable)
{
dispatchCache = new(formattable);
}
public string FormatToString() => dispatchCache.Instance.Format(format: null, provider: null);
} This technique can be useful in DI as well. Alternative DesignsCurrently, interface method can be devirtualized and cached using delegate. RisksI don't see any.
|
It's an interesting idea, but what about just allow taking the pointer to an instance method? Then you could take the pointer to the method through the instance and cache it as needed, and do it on an individual basis rather than every single method in the interface(s). I imagine it would even be able to speed up virtual calls as well as interface calls since the v-table lookup would happen once, then you use the pointer directly.
|
@timcassell I thought about that to extend Let's imagine that we have IFormattable fmt = ...;
delegate* thiscall<IFormattable, string, IFormatProvider, string> ptrToFormatMethod = &fmt.ToString; this argument is explicit in this approach, there is no way to check that I calling the method exactly for the same object as used for method resolution.
It can be done currently using delegates. But delegate requires allocation. Otherwise, we need some version of delegate in the form of value type, or something like dotnet/csharplang#3452. I don't want to interfere with other proposals. |
Why does it have to be verified by the compiler? Function pointers require unsafe code, the onus should be on the programmer to use the same instance that was used to get the pointer. |
Here you go: interface I
{
void M();
}
class A : I
{
void M() { }
}
class B : I
{
void M() { }
}
I a = new A();
delegate* thiscall<I, void> methodPtr = &a.M;
a = new B();
methodPtr(a);
|
Right, I understand the unsafe-ness of it. But my point is it has to be used in an unsafe context, and there are plenty of ways for programmers to shoot themselves in the foot with unsafe code, so why is this case different? [Edit] Also, your own proposal could have the same issue happen if the struct gets torn. |
This is why I don't propose it for structs.
Correct, but it's not a reason to add a new one |
I'm talking about the
No, the reason is performance, of course. The unsafe-ness is just a reason not to add it. But it doesn't seem more unsafe than other unsafe features to me. |
Not the only one. Convenience too. Individual pointer for each method requires a lot of boilerplate code. Actually, I can do the same right now with delegates or even IL weaving (e.g. InlineIL weaver). Does presence of IL weaving should prevent introduction of new low-level API? I hope not. People use IL weaving as a last resort, because they have no choice (and me too). |
I actually didn't know that instance method pointers are already supported in IL. I tend to avoid using IL directly since not every runtime supports all instructions (looking at Unity's IL2CPP).
Sure, but I'd rather be able to be fine-grained about it when I'm focused on optimal efficiency. And I think it wouldn't hurt to be able to optimize virtual calls the same way you are proposing to optimize interface calls. |
Yes, they are.
I don't think that it's needed. Virtual method has known offset in vtable at the time of compilation (JIT or AOT) while interface method should be resolved dynamically through multiple indirections. So virtual method call is already optimized by runtime as |
Out of curiosity, I went ahead and benchmarked a virtual call against a function pointer call (static instead of instance, since it would be too cumbersome to create a benchmark from IL, but I suspect the performance should be similar). It looks to be ~73% faster on my machine. So I think it would be worth it. (For reference, I previously benchmarked interface calls to be ~2x slower than virtual calls).
Code
public unsafe class Benchmark
{
private abstract class Base
{
public abstract void CallVirt();
}
private class A : Base
{
public override void CallVirt() { }
}
private class B : Base
{
public override void CallVirt()
{
throw new NotImplementedException();
}
}
private Base instance;
private delegate*<Base, void> _fPtr;
[GlobalSetup]
public void Setup()
{
instance = new A();
_fPtr = &FuncStatic;
}
[Benchmark]
[MethodImpl(MethodImplOptions.NoOptimization)]
public void CallFuncPtr()
{
_fPtr(instance);
}
[Benchmark]
[MethodImpl(MethodImplOptions.NoOptimization)]
public void CallVirt()
{
instance.CallVirt();
}
private static void FuncStatic(Base instance) { }
} |
Hmm, interesting observation. Probably, it happens because the object header doesn't store vtable, instead it stores a pointer to type information with vtable, and we have another indirection: public class C {
public virtual void M() {
}
public static void S(C obj) => obj.M();
} C.S(C)
L0000: push ebp
L0001: mov ebp, esp
L0003: mov eax, [ecx] ; ecx stores object reference, so [ecx] stores a pointer to type info
L0005: mov eax, [eax+0x28] ; obtains a pointer to vtable stored in type info
L0008: call dword ptr [eax+0x10] ; offset to appropriate slot in vtable
L000b: pop ebp
L000c: ret 3 indirections (at least on x64) are needed to reach actual implementation of virtual method. Too many indirections also break data locality and increase a chance of CPU cache miss. In that case it's reasonable to extend the original proposal. |
Is this addressed by #111771? |
Background and motivation
It's known that virtual method call is faster than interface method call generally. Many internal abstractions within .NET and ASP.NET Core designed using abstract classes. I, personally, use the same technique in my open-source library. Location of actual implementation of the virtual method is just an offset within method table. But interface requires one more level of indirection and some instructions for resolution inserted at call site. This proposal provides a way to remove this indirection and offers the equal cost of interface method call and virtual method call.
PGO may help to devirtualize interface method call and convert the call site to monomorphic call if it knows that there is only one implementation of the interface detected at the call site. If not, call site converted to polymorphic and then megamorphic version (which is the slowest). However, this optimization is applied at method level where the call site located.
There is one JITted/compiled version of
FormatToString
method because the instance method is just a regular function taking this implicitly. Currently, the compiler doesn't provide various versions of the same method for each instance ofMyClass
. As a result, PGO doesn't take into account the fact that the extra info for call site optimization can be propagated from instance level.So the main concept of this proposal is to move knowledge about actual interface implementation from type level to instance level.
API Proposal
A cache can be represented by special value type:
This value type is specially treated by JIT/AOT in the following aspects:
The size and layout of the value type depends on the actual generic argument
T
. IfT
is a class then runtime doesn't apply any special semantics and its size is equal tonative int
(we need to keep the reference only). IfT
is an interface, the size depends on the number of instance methods of the interface and its parents. For instance,InterfaceDispatchCache<ISpanFormattable>
can have the following layout at runtime:JIT/AOT compiler specially treats
a.Instance.MethodCall(args)
call site ifa
is of typeInterfaceDispatchCache<T>
. The compiler knows how to resolve the invocation of interface method in this case. It's just indirect call using the pointer stored in the cache that points to the interface method implementation. For instance:API Usage
This technique can be useful in DI as well.
Alternative Designs
Currently, interface method can be devirtualized and cached using delegate.
Risks
I don't see any.
The text was updated successfully, but these errors were encountered: