Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[API Proposal]: Interface dispatch cache to avoid method resolution at runtime #90592

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
sakno opened this issue Aug 15, 2023 · 14 comments
Open
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Runtime needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration
Milestone

Comments

@sakno
Copy link
Contributor

sakno commented Aug 15, 2023

Background and motivation

It's known that virtual method call is faster than interface method call generally. Many internal abstractions within .NET and ASP.NET Core designed using abstract classes. I, personally, use the same technique in my open-source library. Location of actual implementation of the virtual method is just an offset within method table. But interface requires one more level of indirection and some instructions for resolution inserted at call site. This proposal provides a way to remove this indirection and offers the equal cost of interface method call and virtual method call.

PGO may help to devirtualize interface method call and convert the call site to monomorphic call if it knows that there is only one implementation of the interface detected at the call site. If not, call site converted to polymorphic and then megamorphic version (which is the slowest). However, this optimization is applied at method level where the call site located.

public sealed class MyClass
{
    private readonly ISpanFormattable _formattable;

    public MyClass(ISpanFormattable formattable)
    {
        _formattable = formattable;
    }

    public string FormatToString() => _formattable.Format(format: null, provider: null);
}

There is one JITted/compiled version of FormatToString method because the instance method is just a regular function taking this implicitly. Currently, the compiler doesn't provide various versions of the same method for each instance of MyClass. As a result, PGO doesn't take into account the fact that the extra info for call site optimization can be propagated from instance level.

So the main concept of this proposal is to move knowledge about actual interface implementation from type level to instance level.

API Proposal

A cache can be represented by special value type:

namespace System.Runtime.CompilerServices;

public readonly struct InterfaceDispatchCache<T>
    where T : class
{
    public readonly T Instance;

    public InterfaceDispatchCache(T instance); // actual implementation is provided by the runtime depends on actual type T
}

This value type is specially treated by JIT/AOT in the following aspects:
The size and layout of the value type depends on the actual generic argument T. If T is a class then runtime doesn't apply any special semantics and its size is equal to native int (we need to keep the reference only). If T is an interface, the size depends on the number of instance methods of the interface and its parents. For instance, InterfaceDispatchCache<ISpanFormattable> can have the following layout at runtime:

[StructLayout(LayoutKind.Sequential)]
public readonly struct InterfaceDispatchCache<ISpanFormattable>
{
    public readonly ISpanFormattable Instance; // always at offset 0
    private readonly void* method0; // pointer to ISpanFormattable.TryFormat method implementation
    private readonly void* method1; // pointer to IFormattable.Format method implementation

    public InterfaceDispatchCache(ISpanFormattable instance) // runtime-generated ctor
    {
        Instance = instance ?? throw new ArgumentNullException(nameof(instance));
        method0 = ldvirtftn instance.TryFormat; // pseudo code using existing ldvirtftn IL opcode
        method1 = ldvirtftn instance.Format
    }
}

JIT/AOT compiler specially treats a.Instance.MethodCall(args) call site if a is of type InterfaceDispatchCache<T>. The compiler knows how to resolve the invocation of interface method in this case. It's just indirect call using the pointer stored in the cache that points to the interface method implementation. For instance:

InterfaceDispatchCache<ISpanFormattable> cache = ...;
cache.Instance.Format(format: null, provider: null); // converted to 'call cache.method1'

API Usage

public sealed class MyClass
{
    private readonly InterfaceDispatchCache<ISpanFormattable> dispatchCache;

    public MyClass(ISpanFormattable formattable)
    {
        dispatchCache = new(formattable);
    }

    public string FormatToString() => dispatchCache.Instance.Format(format: null, provider: null);
}

This technique can be useful in DI as well.

Alternative Designs

Currently, interface method can be devirtualized and cached using delegate.

Risks

I don't see any.

@sakno sakno added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Aug 15, 2023
@ghost ghost added needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners untriaged New issue has not been triaged by the area owner labels Aug 15, 2023
@SingleAccretion SingleAccretion added area-System.Runtime and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Aug 15, 2023
@ghost
Copy link

ghost commented Aug 15, 2023

Tagging subscribers to this area: @dotnet/area-system-runtime
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and motivation

It's known that virtual method call is faster than interface method call generally. Many internal abstractions within .NET and ASP.NET Core designed using abstract classes. I, personally, use the same technique in my open-source library. Location of actual implementation of the virtual method is just an offset within method table. But interface requires one more level of indirection and some instructions for resolution inserted at call site. This proposal provides a way to remove this indirection and offers the equal cost of interface method call and virtual method call.

PGO may help to devirtualize interface method call and convert the call site to monomorphic call if it knows that there is only one implementation of the interface detected at the call site. If not, call site converted to polymorphic and then megamorphic version (which is the slowest). However, this optimization is applied at method level where the call site located.

public sealed class MyClass
{
    private readonly ISpanFormattable _formattable;

    public MyClass(ISpanFormattable formattable)
    {
        _formattable = formattable;
    }

    public string FormatToString() => _formattable.Format(format: null, provider: null);
}

There is one JITted/compiled version of FormatToString method because the instance method is just a regular function taking this implicitly. Currently, the compiler doesn't provide various versions of the same method for each instance of MyClass. As a result, PGO doesn't take into account the fact that the extra info for call site optimization can be propagated from instance level.

So the main concept of this proposal is to move knowledge about actual interface implementation from type level to instance level.

API Proposal

A cache can be represented by special value type:

namespace System.Runtime.CompilerServices;

public readonly struct InterfaceDispatchCache<T>
    where T : class
{
    public readonly T Instance;

    public InterfaceDispatchCache(T instance); // actual implementation is provided by the runtime depends on actual type T
}

This value type is specially treated by JIT/AOT in the following aspects:
The size and layout of the value type depends on the actual generic argument T. If T is a class then runtime doesn't apply any special semantics and its size is equal to native int (we need to keep the reference only). If T is an interface, the size depends on the number of instance methods of the interface and its parents. For instance, InterfaceDispatchCache<ISpanFormattable> can have the following layout at runtime:

[StructLayout(LayoutKind.Sequential)]
public readonly struct InterfaceDispatchCache<ISpanFormattable>
{
    public readonly ISpanFormattable Instance; // always at offset 0
    private readonly void* method0; // pointer to ISpanFormattable.TryFormat method implementation
    private readonly void* method1; // pointer to IFormattable.Format method implementation

    public InterfaceDispatchCache(ISpanFormattable instance) // runtime-generated ctor
    {
        Instance = instance ?? throw new ArgumentNullException(nameof(instance));
        method0 = ldvirtftn instance.TryFormat; // pseudo code using existing ldvirtftn IL opcode
        method1 = ldvirtftn instance.Format
    }
}

JIT/AOT compiler specially treats a.Instance.MethodCall(args) call site if a is of type InterfaceDispatchCache<T>. The compiler knows how to resolve the invocation of interface method in this case. It's just indirect call using the pointer stored in the cache that points to the interface method implementation. For instance:

InterfaceDispatchCache<ISpanFormattable> cache = ...;
cache.Instance.Format(format: null, provider: null); // converted to 'call cache.method1'

API Usage

public sealed class MyClass
{
    private readonly InterfaceDispatchCache<ISpanFormattable> dispatchCache;

    public MyClass(ISpanFormattable formattable)
    {
        dispatchCache = new(formattable);
    }

    public string FormatToString() => dispatchCache.Instance.Format(format: null, provider: null);
}

This technique can be useful in DI as well.

Alternative Designs

Currently, interface method can be devirtualized and cached using delegate.

Risks

I don't see any.

Author: sakno
Assignees: -
Labels:

api-suggestion, area-System.Runtime, untriaged

Milestone: -

@timcassell
Copy link

timcassell commented Oct 25, 2023

It's an interesting idea, but what about just allow taking the pointer to an instance method? Then you could take the pointer to the method through the instance and cache it as needed, and do it on an individual basis rather than every single method in the interface(s). I imagine it would even be able to speed up virtual calls as well as interface calls since the v-table lookup would happen once, then you use the pointer directly.

Also, I don't see how this will work with mutable structs, since you have to copy the struct to store in the field. (Looking at generics. Obviously there is no reason to use this on known structs.) [Edit] I see you put a class constraint.

@sakno
Copy link
Contributor Author

sakno commented Oct 26, 2023

@timcassell I thought about that to extend delegate*<> syntax for instance methods. But I found that it is not possible to represent dynamic dispatch of interface method as some static construct that can be verified and checked by the compiler.

Let's imagine that we have delegate* thiscall<> syntax for instance methods:

IFormattable fmt = ...;
delegate* thiscall<IFormattable, string, IFormatProvider, string> ptrToFormatMethod = &fmt.ToString;

this argument is explicit in this approach, there is no way to check that I calling the method exactly for the same object as used for method resolution.

Then you could take the pointer to the method through the instance and cache it as needed

It can be done currently using delegates. But delegate requires allocation. Otherwise, we need some version of delegate in the form of value type, or something like dotnet/csharplang#3452. I don't want to interfere with other proposals.

@timcassell
Copy link

@timcassell I thought about that to extend delegate*<> syntax for instance methods. But I found that it is not possible to represent dynamic dispatch of interface method as some static construct that can be verified and checked by the compiler.

Why does it have to be verified by the compiler? Function pointers require unsafe code, the onus should be on the programmer to use the same instance that was used to get the pointer.

@sakno
Copy link
Contributor Author

sakno commented Oct 27, 2023

Why does it have to be verified by the compiler?

Here you go:

interface I
{
   void M();
}

class A : I
{
   void M() { }
}

class B : I
{
  void M() { }
}

I a = new A();
delegate* thiscall<I, void> methodPtr = &a.M;
a = new B();
methodPtr(a);

methodPtr points to A.M() but it's called with an instance of B. It is completely unsafe because A.M() expects another memory layout and fields which are not compatible with B in real life. In other words, it's non-obvious and error-prone way to reinterpret this argument for instance method which make no sense in most situations.

@timcassell
Copy link

timcassell commented Oct 27, 2023

Right, I understand the unsafe-ness of it. But my point is it has to be used in an unsafe context, and there are plenty of ways for programmers to shoot themselves in the foot with unsafe code, so why is this case different?

[Edit] Also, your own proposal could have the same issue happen if the struct gets torn.

@sakno
Copy link
Contributor Author

sakno commented Oct 27, 2023

Also, your own proposal could have the same issue happen if the struct gets torn.

This is why I don't propose it for structs.

here are plenty of ways for programmers to shoot themselves in the foot

Correct, but it's not a reason to add a new one

@timcassell
Copy link

This is why I don't propose it for structs.

I'm talking about the InterfaceDispatchCache struct.

Correct, but it's not a reason to add a new one

No, the reason is performance, of course. The unsafe-ness is just a reason not to add it. But it doesn't seem more unsafe than other unsafe features to me.

@sakno
Copy link
Contributor Author

sakno commented Oct 27, 2023

The unsafe-ness is just a reason not to add it

Not the only one. Convenience too. Individual pointer for each method requires a lot of boilerplate code. Actually, I can do the same right now with delegates or even IL weaving (e.g. InlineIL weaver). Does presence of IL weaving should prevent introduction of new low-level API? I hope not. People use IL weaving as a last resort, because they have no choice (and me too).

@timcassell
Copy link

timcassell commented Oct 27, 2023

Actually, I can do the same right now with delegates or even IL weaving (e.g. InlineIL weaver). Does presence of IL weaving should prevent introduction of new low-level API? I hope not. People use IL weaving as a last resort, because they have no choice (and me too).

I actually didn't know that instance method pointers are already supported in IL. I tend to avoid using IL directly since not every runtime supports all instructions (looking at Unity's IL2CPP).

Individual pointer for each method requires a lot of boilerplate code.

Sure, but I'd rather be able to be fine-grained about it when I'm focused on optimal efficiency.

And I think it wouldn't hurt to be able to optimize virtual calls the same way you are proposing to optimize interface calls.

@sakno
Copy link
Contributor Author

sakno commented Oct 27, 2023

actually didn't know that instance method pointers are already supported in IL

Yes, they are. method instance void *(object) as TypeDescr and calli instance void(object) for indirect call (CallSiteDescr).

t wouldn't hurt to be able to optimize virtual calls

I don't think that it's needed. Virtual method has known offset in vtable at the time of compilation (JIT or AOT) while interface method should be resolved dynamically through multiple indirections. So virtual method call is already optimized by runtime as call dword ptr [vtable_address+virtual_method_slot_index] with single level of indirection.

@timcassell
Copy link

I don't think that it's needed. Virtual method has known offset in vtable at the time of compilation (JIT or AOT) while interface method should be resolved dynamically through multiple indirections. So virtual method call is already optimized by runtime as call dword ptr [vtable_address+virtual_method_slot_index] with single level of indirection.

Out of curiosity, I went ahead and benchmarked a virtual call against a function pointer call (static instead of instance, since it would be too cumbersome to create a benchmark from IL, but I suspect the performance should be similar).

It looks to be ~73% faster on my machine. So I think it would be worth it. (For reference, I previously benchmarked interface calls to be ~2x slower than virtual calls).

Method Mean Error StdDev
CallFuncPtr 4.517 ns 0.0177 ns 0.0148 ns
CallVirt 7.829 ns 0.1306 ns 0.1019 ns
Code

public unsafe class Benchmark
{
    private abstract class Base
    {
        public abstract void CallVirt();
    }

    private class A : Base
    {
        public override void CallVirt() { }
    }

    private class B : Base
    {
        public override void CallVirt()
        {
            throw new NotImplementedException();
        }
    }

    private Base instance;
    private delegate*<Base, void> _fPtr;

    [GlobalSetup]
    public void Setup()
    {
        instance = new A();
        _fPtr = &FuncStatic;
    }

    [Benchmark]
    [MethodImpl(MethodImplOptions.NoOptimization)]
    public void CallFuncPtr()
    {
        _fPtr(instance);
    }

    [Benchmark]
    [MethodImpl(MethodImplOptions.NoOptimization)]
    public void CallVirt()
    {
        instance.CallVirt();
    }

    private static void FuncStatic(Base instance) { }
}

@sakno
Copy link
Contributor Author

sakno commented Oct 27, 2023

Hmm, interesting observation. Probably, it happens because the object header doesn't store vtable, instead it stores a pointer to type information with vtable, and we have another indirection:

public class C {
    public virtual void M() {
    }
    
    public static void S(C obj) => obj.M();
}
C.S(C)
    L0000: push ebp
    L0001: mov ebp, esp
    L0003: mov eax, [ecx] ; ecx stores object reference, so [ecx] stores a pointer to type info
    L0005: mov eax, [eax+0x28] ; obtains a pointer to vtable stored in type info
    L0008: call dword ptr [eax+0x10] ; offset to appropriate slot in vtable
    L000b: pop ebp
    L000c: ret

3 indirections (at least on x64) are needed to reach actual implementation of virtual method. Too many indirections also break data locality and increase a chance of CPU cache miss. In that case it's reasonable to extend the original proposal.

@tannergooding tannergooding added needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration and removed untriaged New issue has not been triaged by the area owner labels Jun 24, 2024
@stephentoub stephentoub added this to the Future milestone Jul 19, 2024
@huoyaoyuan
Copy link
Member

Is this addressed by #111771?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Runtime needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration
Projects
None yet
Development

No branches or pull requests

6 participants