Codestin Search App

Aaronontheweb · 2026-05-10T20:24:03Z

⚠️ This PR is now reframed as experimental research work. The goal here is to land enough of the V2 serialization POC inside Akka.NET's real MessageSerializer + AkkaPduCodec path that we can run RemotePingPong against it and learn what end-to-end Akka.Remote sees. This branch will not be merged as-is. Once we've absorbed the learnings, the production V2 work will land in a separate, cleaner PR.

What this PR is for

Validate the V2 serialization POC design against the real MessageSerializer + AkkaPduCodec wrap pipeline.
Get concrete benchmark numbers comparing V1 (today's path) vs V2 (the new design) for both write and read on AckAndEnvelopeContainer / RemoteEnvelope / Payload wire format.
Wire the V2 spike into EndpointWriter so RemotePingPong actually exercises it, and capture aggregate throughput data the per-message benchmark can't reach.

What's on the branch right now

Foundation (will likely survive into the clean PR in some form)

SerializerV2 — buffer-aware base class (Serialize(IBufferWriter<byte>, object) → int, Deserialize(ReadOnlySequence<byte>, string) → object, Manifest, Identifier). Virtual byte[] bridges keep legacy callsites working during transition.
SerializerV1Adapter — wraps Serializer / SerializerWithStringManifest as SerializerV2. Reproduces V1 manifest dispatch internally; overrides ToBinary / FromBinary to skip the buffer round trip when the inner is byte[]-native. Inner property exposes the wrapped V1.
SerializerV2Extensions — AsV1<T>() / TryAsV1<T>() helpers for callsites holding strongly-typed V1 references.
Serialization.cs — internal storage migrated to SerializerV2. HOCON + SerializationSetup V1 instances auto-wrap on registration. FindSerializerFor* returns SerializerV2. AddSerializer / AddSerializationMap get V1 + V2 overloads. Deserialize(byte[], int, string) simplified to uniform V2 dispatch.
MessageSerializer.cs (Akka.Remote) — single uniform Manifest() dispatch path.
ByteArraySerializer (ID 4) and PrimitiveSerializers (ID 17) — ported to V2-native, byte-identical wire format.

The spike (this is the experimental piece)

PatchingBufferWriter — IBufferWriter<byte> with PatchSpan(offset, length) for in-place length-prefix patching.
ProtoWire — hand-rolled Protobuf wire-format helpers: tag, varint, fixed-width 5-byte varint placeholder + patch, fixed64, length-delimited, string. Mirrors CodedOutputStream internals but operates against IBufferWriter<byte> directly.
V2SerializerRegistry — Type → SerializerV2 / ID → SerializerV2 lookup. Static dispatch at serialize time, ID-keyed dispatch at deserialize. No Type.GetType(manifest) reflection.
V2RemoteEnvelopeWriter / V2RemoteEnvelopeReader — write and read the AckAndEnvelopeContainer / RemoteEnvelope / Payload wire format directly via IBufferWriter<byte> / ReadOnlyMemory<byte>. Inner V2 serializer invoked inline against the same buffer.
V2ProtoBenchmarks — V1 path uses real MessageSerializer.Serialize + AkkaPduProtobuffCodec.ConstructMessage; V2 path uses the spike. Both produce wire-equivalent AckAndEnvelopeContainer bytes (verified at setup via AckAndEnvelopeContainer.Parser.ParseFrom).

Key learning so far

Per-message serialization throughput, single-threaded, hardware: AMD Ryzen 9 9900X:

Payload	V1 send msg/s	V2 send msg/s	Improvement
StringShort	263,000	421,000	+60%
BytesSmall	658,000	2,053,000	3.1×
BytesLarge	109,000	920,000	8.5×

Send-side allocations drop to zero for byte[] payloads. Receive-side allocations drop by 50–67%. Wire format is byte-equivalent — V1 peers parse V2 output, V2 reader parses V1 output. Full numbers in this comment.

What still needs to land on this branch

Move V2 spike helpers (PatchingBufferWriter, ProtoWire, V2SerializerRegistry, V2RemoteEnvelopeWriter, V2RemoteEnvelopeReader) from the benchmark project into Akka.Remote/Serialization/V2 so they can be referenced from production code.
Build V2SerializerRegistry from Serialization.Serialization at startup so the registry knows about all V2-native and V1-wrapped serializers.
Add ConstructMessageV2 to AkkaPduProtobuffCodec and wire EndpointWriter.WriteSend to call it (skipping the intermediate SerializedMessage proto object).
Run RemotePingPong on this hardware with V2 enabled, capture aggregate msg/sec at 1/5/10/15/20/25/30 clients (matches the dev-branch baseline in openspec/IMPLEMENTATION_ORDER.md).
Decide whether receive-side V2 is also worth wiring (V1 DecodeMessage parses V2 wire bytes correctly because Google.Protobuf accepts over-long varints).

What this PR will NOT become

This branch will not be the production V2 PR. It accumulates experimental code, partial integrations, scope churn (see commit e6b49676 for the spec-narrowing decision), and benchmark scaffolding.
The clean PR that follows will: pick a stable V2 surface based on learnings, integrate cleanly without leaving the spike helpers in Akka.Remote/Serialization/V2, ship the necessary subset, and have a proper review cycle.

What this PR is producing

Concrete perf data on whether the V2 design pays off for Akka.Remote.
Working integration code that proves the wire-compat story end-to-end.
A list of design decisions surfaced by integration (e.g. how V2SerializerRegistry initializes, where ConstructMessageV2 lives, how V1 and V2 paths coexist during transition).

Commits (current state, may keep evolving)

e6b49676 — Narrow milestone scope; create serializer-v2-codegen placeholder
1d5d2a04 — SerializerV2 + SerializerV1Adapter foundation + infrastructure
1c1c36a7 — Update existing tests for V1Adapter wrapping
cc4bed24 — Port ByteArraySerializer + PrimitiveSerializers to V2
e1f7261f — SerializerV1AdapterSpec unit tests
6b7d8d02 — Transport-envelope benchmark + API baselines
92719443 — V2 wrap-pipeline spike (PatchingBufferWriter + AckAndEnvelopeContainer wire format)
35e109aa — V2 spike read path + benchmarks for both directions

Design docs

openspec/changes/serializer-v2/proposal.md
openspec/changes/serializer-v2/design.md
openspec/changes/serializer-v2-codegen/proposal.md (deferred scope)

Scope split for Milestone 2 of the 1.6 transport epic. Originally the change carried the full V2 stack: foundation (base class, V1 adapter, infrastructure) plus the user-facing codec story (MessagePackSerializer, AkkaWriter/AkkaReader, attributes, the Akka.Serialization.V2 NuGet package) plus the Roslyn source generator. That's a lot of public API surface to lock in before the foundation has been validated by anything downstream. This change rewrites the openspec docs so Milestone 2 ships only the foundation and one set of reference serializers, plus a benchmark that proves the API earns its keep before Spec 3 builds on it: - SerializerV2 base class (IBufferWriter<byte> / ReadOnlySequence<byte> primary API, virtual byte[] bridge) - SerializerV1Adapter that wraps legacy Serializer/SerializerWithStringManifest - Serialization.cs and MessageSerializer.cs infrastructure changes - ByteArraySerializer + PrimitiveSerializers ported to SerializerV2 (covers all hand-rolled primitive paths: string via UTF-8, int32/int64 via BinaryPrimitives, byte[] passthrough; same IDs, byte-identical wire format) - Standalone transport-envelope benchmark in src/benchmark/ that simulates EndpointWriter's serialize-frame-deserialize chain on the V2 API and measures V2-direct vs V1-bridge Everything else moves to a new placeholder change (serializer-v2-codegen): MessagePackSerializer, sealed AkkaWriter/AkkaReader, the three attributes, the Akka.Serialization.V2 package, the Roslyn generator, and the mechanical port of the remaining Protobuf-based internal serializers (ClusterMessageSerializer, SystemMessageSerializer, the four WrappedPayloadSupport serializers). Rationale captured in the new proposal: the runtime codec API and the codegen that targets it must be designed together, since the runtime is the generator's emission target. Files: - openspec/IMPLEMENTATION_ORDER.md: retitle Milestone 2 "foundation only", document the 2026-05-10 scope change, point at serializer-v2-codegen - openspec/changes/serializer-v2/proposal.md: rewrite around the narrowed scope; explicit "Deferred to a future change" section - openspec/changes/serializer-v2/design.md: revised decisions (single layer in core Akka, reference impl scoped to Primitive+ByteArray, new decision documenting the benchmark as the validation gate) - openspec/changes/serializer-v2/tasks.md: restructured into 6 sections with explicit string/int32/int64/byte[] coverage, multi-segment input tests, and a "Section 6: Out of Scope (Documented Follow-On)" punch list for the Protobuf serializer ports - openspec/changes/serializer-v2/specs/serializer-v2-base/spec.md: per-serializer requirements with byte-identical wire format scenarios and multi-segment input scenarios; new requirement for the benchmark - openspec/changes/serializer-v2/specs/messagepack-serializer/spec.md: deleted (capability moves to serializer-v2-codegen) - openspec/changes/serializer-v2-codegen/: new placeholder change with .openspec.yaml and proposal.md sketching the deferred scope and arguing why we don't ship the runtime layer alone first

Establishes the V2 serialization API as the new internal foundation for Akka.NET's serialization subsystem. V1 serializers continue to work unchanged via a transparent adapter. New types in core Akka: - SerializerV2 (abstract): Buffer-aware serializer base class with Serialize(IBufferWriter<byte>, object), Deserialize(ReadOnlySequence<byte>, string), Manifest(object), and Identifier. Virtual byte[] bridges (ToBinary/FromBinary) keep V1-style call sites working. - SerializerV1Adapter: Wraps Serializer/SerializerWithStringManifest as a SerializerV2. Reproduces the V1 manifest dispatch (TypeQualifiedName for IncludeManifest=true plain serializers, custom manifest for SerializerWithStringManifest, empty otherwise) and delegates ToBinary/ FromBinary/FromBinary(Type) directly to the inner V1 to avoid pointless buffer round trips. Inner property exposes the wrapped V1 instance. - SerializerV2Extensions: AsV1<T>()/TryAsV1<T>() helpers for callers that hold strongly-typed references to V1 serializers (e.g. cast sites in tests and durable stores). Serialization.cs changes: - Internal storage migrated to SerializerV2 (auto-wraps V1 on registration from HOCON, SerializationSetup, AddSerializer, AddSerializationMap) - FindSerializerFor / FindSerializerForType / GetSerializerById / GetSerializerByName return SerializerV2 - ManifestFor(SerializerV2, object) overload added (just delegates to Manifest()); legacy ManifestFor(Serializer, object) preserved for back compat - AddSerializer / AddSerializationMap each have V1 + V2 overloads (V1 auto-wraps) - Deserialize(byte[], int, string) simplified to uniform V2 dispatch — the V1 adapter handles the type-vs-manifest dance internally Akka.Remote.MessageSerializer.cs: - Single uniform path for manifest dispatch via SerializerV2.Manifest(); no more `is SerializerWithStringManifest` type check or `IncludeManifest` branch Call site fixes (mechanical): - ~25 cast sites in tests/benchmarks switched to .AsV1<T>() (covers Hyperion, Newtonsoft, Cluster, Sharding, ClusterClient, PubSub, Singleton, ReplicatedDataSerializer, custom test serializers, and Akka.Remote.Tests primitive/misc) - Field declarations on Replicator._serializer and LocalSnapshotStore._wrapperSerializer changed to SerializerV2 - ActorSystemImpl.WarnIfJsonIsDefaultSerializer uses SerializerV1Adapter pattern match - Akka.Persistence.Custom example simplified to use uniform V2 Manifest() dispatch - ClusterMessageSerializer.GetObjectManifest takes SerializerV2 Adapter overrides bridge methods (ToBinary, FromBinary(byte[],string), FromBinary(byte[],Type)) to delegate to the inner V1 directly. V2-native Deserialize materializes the ReadOnlySequence<byte> to byte[] before calling FromBinary, since the wrapped V1 is byte[]-native — no performance win is possible for V1, only API parity. V2-native serializers (coming next: ByteArraySerializer + PrimitiveSerializers ports) get the actual zero-copy benefit. Build: 0 errors, 0 warnings on full solution.

Existing tests asserted on the V1 serializer type via .Should().BeOfType<T>() or by direct cast. With V2 dispatch wrapping V1 serializers in SerializerV1Adapter, those assertions now see the adapter type and fail. Mechanical fix across the affected test files: replace .Should().BeOfType<V1>() with .AsV1<V1>() (which throws if the V2 instance isn't a SerializerV1Adapter wrapping V1). The implicit assertion has the same intent — verify the right V1 serializer is bound — without requiring the test to know about the adapter wrapping. Affected test files: - Akka.Tests: SerializationSpec, SerializationSetupSpec, CustomSerializerSpec - Akka.Remote.Tests: DaemonMsgCreateSerializerSpec, MessageContainerSerializerSpec, MiscMessageSerializerSpec, ProtobufSerializerSpec, SystemMessageSerializationSpec - Akka.Cluster.Tests: ReliableDeliverySerializerSpecs - Akka.Cluster.Tools.Tests: ClusterClientSerializerSpec - Akka.DistributedData.Tests: ReplicatedDataSerializerSpec - Akka.Cluster.Sharding.Tests: DDataClusterShardingConfigSpec `using Akka.Serialization;` added where missing to bring the AsV1 / TryAsV1 extensions into scope. Test status: - Akka.Tests: 1248 passing, 23 skipped, 0 failing (full suite) - Akka.Remote.Tests serialization: 105 passing, 1 skipped, 0 failing - Akka.Cluster.Tests serialization: 49 passing, 0 failing - Akka.Cluster.Tools.Tests serialization: 51 passing, 0 failing

Both serializers now extend SerializerV2 directly instead of the legacy Serializer base class, while preserving their serializer IDs (4 and 17 respectively) and producing byte-identical wire format. They are the V2-native reference implementations used to validate the API. ByteArraySerializer (Akka core, ID 4): - Identity transform — the byte[] is the wire format - Serialize(IBufferWriter<byte>, byte[]) copies the input into the writer (the writer contract requires it; the V2 path costs one copy) - Deserialize(ReadOnlySequence<byte>, manifest) materializes a fresh byte[] via seq.ToArray() — callers may retain the returned reference, so we cannot alias to potentially pooled backing memory - ToBinary/FromBinary bridges overridden to skip the buffer round trip and pass the byte[] through directly (V1's zero-alloc behavior is preserved for the bridge path) - Manifest() returns string.Empty (no manifest needed) - Null handling drops V1's null passthrough — Serialization.cs routes null through NullSerializer, so the path was unreachable PrimitiveSerializers (Akka.Remote, ID 17): - Covers string / int32 / int64 with the same six manifest aliases as V1 (S/I/L plus the long-form .NET Core and .NET Framework type-name variants that legacy peers emit) - String serialize: Encoding.UTF8.GetBytes(string, Span<byte>) into the writer's span — no intermediate byte[] allocation - Int32/Int64 serialize: BinaryPrimitives.WriteInt*LittleEndian into fixed-width spans - String deserialize: Encoding.UTF8.GetString(ReadOnlySequence<byte>) on net6+, handles split codepoints across multi-segment input - Int32/Int64 deserialize: BinaryPrimitives.ReadInt*LittleEndian; falls back to a stack copy when the value spans a segment boundary - ToBinary/FromBinary bridges overridden — strings use Encoding.UTF8.GetBytes(string), ints use BitConverter on little-endian platforms (matching V1 byte-for-byte) and a manual little-endian fallback otherwise - use-legacy-behavior config flag preserved - SizeHint(o) returns precise sizes for fixed-width values and GetMaxByteCount(string.Length) for strings — gives the ArrayBufferWriter inside the bridge a tight initial allocation Test fixes: - SerializationSpec.cs / SerializationSpec.AllowUnregisteredTypesSpec: ByteArraySerializer is V2-native now, so the previous AsV1<ByteArraySerializer>() pattern is invalid (the type constraint T : Serializer rejects V2 types). Replaced with direct Should().BeOfType<ByteArraySerializer>() on the result. - PrimitiveSerializersSpec.cs: same — switched from .AsV1<PrimitiveSerializers>() to .Should().BeOfType<PrimitiveSerializers>().Subject (FluentAssertions pattern that returns the typed subject). Test status: - Akka.Tests serialization: 65 passing, 0 failing - Akka.Remote.Tests PrimitiveSerializersSpec: 17 passing, 0 failing

Direct unit tests for SerializerV1Adapter that exercise the wrapping behavior independently of Serialization's registration plumbing. Full HOCON / V1 auto-wrap path coverage continues to live in SerializationSpec and CustomSerializerSpec — this spec is for the adapter's own contract. Coverage: - Round-trip through buffer API (Serialize/Deserialize) for plain V1 with and without IncludeManifest, and for SerializerWithStringManifest - Round-trip through byte[] bridge (ToBinary/FromBinary) for both string-manifest and Type-typed FromBinary overloads - Manifest behavior matches V1 dispatch (TypeQualifiedName for IncludeManifest=true plain serializers, custom string for SerializerWithStringManifest, empty for IncludeManifest=false) - Identifier preserved from inner V1 - Inner property returns the wrapped instance unchanged - ToBinary/FromBinary bridge overrides produce byte-identical output to the inner V1 (so the bridge skip is correct) - Multi-segment ReadOnlySequence<byte> input handled correctly - AsV1<T>/TryAsV1<T> extension methods unwrap correctly and have the expected null-vs-throw failure semantics Three V1 fixture classes cover the three V1 dispatch flavors; a small ReadOnlySequenceSegment<byte> helper synthesizes multi-segment input without depending on a Pipe. 15/15 tests pass.

Benchmark (src/benchmark/Akka.Benchmarks/Serialization/SerializerV2EnvelopeBenchmarks.cs): Simulates what EndpointWriter will do once Spec 3 wires the Streams TCP transport to call SerializerV2.Serialize(IBufferWriter<byte>) directly: writes a Remote-shaped envelope to an ArrayBufferWriter<byte>, wraps the result as a ReadOnlySequence<byte>, reads the header via SequenceReader<byte>, hands the payload slice to SerializerV2.Deserialize(ReadOnlySequence<byte>, manifest). Compares against the V1-bridge path (ToBinary → byte[] → FromBinary) on the same serializer instance and the same payload. Envelope shape: [serializerId: int32 LE][manifestLen: int32 LE] [manifest: utf8][payload: bytes-to-end]. Payload length is implicit — the outer frame boundary is the boundary, matching how the real Streams TCP transport will frame messages in Spec 3. Payload matrix exercises every V2-native primitive path: - string short (5 chars), medium (256), long (4 KB) - int32, int64 - byte[] small (16 B), medium (1 KB), large (16 KB) No Akka.Remote / DotNetty / socket dependencies — pure shape benchmark to validate the V2 buffer API earns its keep on allocations and throughput before downstream specs build on it. Reusable for Spec 3 integration: drop in a real FrameBufferWriter in place of ArrayBufferWriter and re-run to confirm no regression. Configured with MicroBenchmarkConfig (MemoryDiagnoser + GitHub markdown exporter); run via the standard BenchmarkDotNet console runner. API baselines (src/core/Akka.API.Tests/verify/): - CoreAPISpec.ApproveCore.DotNet.verified.txt: ByteArraySerializer now extends SerializerV2; Serialization gets AddSerializer/AddSerializationMap V2 overloads, FindSerializerFor[Type] return SerializerV2, ManifestFor V2 overload; new public types SerializerV2, SerializerV1Adapter, SerializerV2Extensions. - CoreAPISpec.ApproveRemote.DotNet.verified.txt: PrimitiveSerializers now extends SerializerV2 with the buffer Serialize/Deserialize methods. Test status: 18/18 Akka.API.Tests passing; 0 errors / 0 warnings on full solution build.

to11mtm · 2026-05-10T21:20:09Z

+        /// </summary>
+        /// <param name="o">The object whose manifest is requested.</param>
+        /// <returns>The manifest string, or <see cref="string.Empty"/> if no manifest is needed.</returns>
+        public abstract string Manifest(object o);


Nitpick: Would be nice to have a ReadOnlyMemory or ReadOnlySpan overload here for other use.

to11mtm · 2026-05-10T21:22:19Z

+        /// </summary>
+        /// <param name="o">The object whose encoded size is being estimated.</param>
+        /// <returns>An estimate of the encoded byte length.</returns>
+        public virtual int SizeHint(object o) => 256;


One minor concern with this is whether it will lead to excessive boxing for consumers who misuse structs...

to11mtm · 2026-05-10T21:25:33Z

+        /// <param name="buffer">The byte sequence containing the serialized object.</param>
+        /// <param name="manifest">The manifest hint, or <see cref="string.Empty"/>.</param>
+        /// <returns>The deserialized object.</returns>
+        public abstract object Deserialize(ReadOnlySequence<byte> buffer, string manifest);


There might come a time where being able to accept ReadOnlyMemory<char> manifest will be useful... so I at least want to risk bringing it up...

to11mtm · 2026-05-10T21:26:27Z

+        /// </summary>
+        /// <param name="buffer">The buffer to write into.</param>
+        /// <param name="obj">The object to serialize.</param>
+        public abstract void Serialize(IBufferWriter<byte> buffer, object obj);


I don't hate this but it would be nice if we could get some form of SerializeEnvelope on here to keep things uniform and avoid branching.... but maybe that's too big...

Aaronontheweb

I think we're pretty far off from proving the concept on V2 serialization here

Aaronontheweb · 2026-05-10T20:57:23Z

+        /// </summary>
+        /// <param name="o">The object whose encoded size is being estimated.</param>
+        /// <returns>An estimate of the encoded byte length.</returns>
+        public virtual int SizeHint(object o) => 256;


Need some way of signaling, for backwards compat, "we don't know what the size of this object is - no size hint available"

Aaronontheweb · 2026-05-10T21:00:31Z

+        /// </summary>
+        /// <param name="buffer">The buffer to write into.</param>
+        /// <param name="obj">The object to serialize.</param>
+        public abstract void Serialize(IBufferWriter<byte> buffer, object obj);


some thoughts:

Ensure WriteMessagesAsync/SaveAsync is called asynchronously in Async… #8163 - the correct fix for this type of flow control problem in AKka.Persistence is for Serialize / Deserialize to be ValueTask-returning async functions. This naturally allows us to kick the serialization work out of band while avoiding some of the flow control problems 8163 introduced (and had to revert here: Revert Task.Yield() from AsyncWriteJournal and SnapshotStore (cherry-pick to dev) #8189

We should return some type of result here IMHO - either the length of the written bytes, a result object that includes that information, or something else. Before we could get that information via the length of the return byte[], now we don't get that information back since it's encapsulated inside the IBufferWriter<byte>.

Aaronontheweb · 2026-05-10T21:01:48Z

+        /// </summary>
+        /// <param name="obj">The object to serialize.</param>
+        /// <returns>A byte array containing the serialized object.</returns>
+        public virtual byte[] ToBinary(object obj)


Remove this - SerializerV2 doesn't need to be backwards compatible. That's a job for the SerializerV1Adapter. We're adapting V1 to V2, not V2 to V1.

Aaronontheweb · 2026-05-10T21:02:12Z

+        /// <param name="bytes">The serialized object's bytes.</param>
+        /// <param name="manifest">The manifest hint, or <see cref="string.Empty"/>.</param>
+        /// <returns>The deserialized object.</returns>
+        public virtual object FromBinary(byte[] bytes, string manifest)


Remove this.

Aaronontheweb · 2026-05-10T21:02:26Z

+        /// <param name="bytes">The serialized object's bytes.</param>
+        /// <param name="type">The expected runtime type, or <c>null</c> if unspecified.</param>
+        /// <returns>The deserialized object.</returns>
+        public virtual object FromBinary(byte[] bytes, Type? type)


Remove this - same comment as above.

Aaronontheweb · 2026-05-10T21:02:55Z

+        /// <param name="buffer">The byte sequence containing the serialized object.</param>
+        /// <param name="manifest">The manifest hint, or <see cref="string.Empty"/>.</param>
+        /// <returns>The deserialized object.</returns>
+        public abstract object Deserialize(ReadOnlySequence<byte> buffer, string manifest);


Should be async.

Aaronontheweb · 2026-05-11T13:51:54Z

+        /// Serializes the object and decorates serialized <see cref="IActorRef"/> instances using
+        /// the given <paramref name="address"/>.
+        /// </summary>
+        public byte[] ToBinaryWithAddress(Address address, object obj)


Where does this method get called usually and is there a better way of doing this than an implicit ThreadStatic variable? I believe this exists primarily for multi-transport Akka.Remote systems. Could we just require this context to be passed in explicitly in those callsites?

Per PR akkadotnet#8203 review feedback. The void return was hiding load-bearing information — callers (especially wrapped-payload outer serializers patching length prefixes) need to know how many bytes the Serialize call wrote to the buffer. They could fish it out of the writer state, but that's an indirect read that breaks if the writer is shared with other writes happening on the same call. This is the only API change being made before benchmarking. Other surface critiques (async/ValueTask, bridge removal, transport-info threading) remain held until perf data validates the basic V2 design. Deserialize signature is unchanged — the read side doesn't have an analogous patch-after-the-fact concern. Affected: - SerializerV2.Serialize: abstract int Serialize(IBufferWriter<byte>, object) - ByteArraySerializer.Serialize returns byte[].Length - PrimitiveSerializers.Serialize returns bytes-written per primitive type - SerializerV1Adapter.Serialize returns inner.ToBinary(obj).Length - Bridge ToBinary uses the returned count to size the ToArray slice - Benchmark + V1Adapter tests adjusted (return value discarded where callers don't need it) Tests: Akka.Tests serialization 80/80 passing.

Validates the SerializerV2 design against today's Akka.Remote MessageSerializer + AkkaPduCodec wrap pipeline. Reuses the existing Protobuf message types (AckAndEnvelopeContainer / RemoteEnvelope / Payload) without modification — the V2 path produces byte-equivalent wire output that Google.Protobuf parses transparently. What the spike contains - PatchingBufferWriter: IBufferWriter<byte> with a PatchSpan(offset, len) accessor for in-place length-prefix patching. - ProtoWire: hand-rolled Protobuf wire-format primitives (tag, varint, fixed-width varint placeholder + patch, fixed64, string). Mirrors what CodedOutputStream does internally, but writes against IBufferWriter directly (Google.Protobuf's WriteContext.Initialize(IBufferWriter) is internal and not callable from user code). - V2SerializerRegistry: Type -> SerializerV2 / ID -> SerializerV2 lookup. Static dispatch, no reflection at serialize time, no Type.GetType from manifest strings at deserialize time. - V2RemoteEnvelopeWriter: writes the full AckAndEnvelopeContainer pipeline directly into a PatchingBufferWriter. For each nested length-delimited field (envelope, payload, message bytes), reserves a fixed-width 5-byte varint placeholder, runs the inner write (using the bytes-written int return on SerializerV2.Serialize to know exactly how much was written), then patches the length prefix in place. How the patching technique works around Protobuf's nested-message length-prefix problem Protobuf's length-delimited wire format requires the inner's byte count to be known BEFORE the length prefix is written. Canonical varints are minimum-width, so a placeholder can't be patched retroactively without knowing how wide the varint will be. The trick: write the length prefix as a FIXED-WIDTH 5-byte varint always. 5 bytes give 35 bits of data — plenty for any uint32. Small values are encoded as over-long varints (continuation bits set on the first 4 bytes, value's low bits in byte 0, zeros in the rest). Google.Protobuf's CodedInputStream accepts up to 5 bytes for a uint32 varint and OR's the data bits regardless of canonicity, so the over-long form parses to the same value as the minimum-width form. That lets us reserve 5 bytes, run the inner write, then patch the varint in place using the int returned by Serialize. One pass, no scratch buffer, no intermediate byte[]. Wire-overhead cost: at most 4 extra bytes per length-delimited nested field. For Akka.Remote's three nesting levels, that's max 12 bytes per message — noise vs payload size. Wire compat is verified at benchmark setup: V2 output is parsed via AckAndEnvelopeContainer.Parser.ParseFrom and compared field-by-field against the V1 output (recipient path, payload bytes, serializer ID). Throws at setup if anything diverges. Benchmark results (V1 = real MessageSerializer + AkkaPduProtobuffCodec, V2 = the spike, both producing wire-equivalent AckAndEnvelopeContainer bytes): | Payload | V1 time | V2 time | Ratio | V1 alloc | V2 alloc | Alloc | |--------------|-----------|-----------|-------|----------|----------|-------| | StringShort | 3,522 ns | 2,375 ns | 0.68x | 1,512 B | 840 B | 0.56x | | StringMedium | 5,092 ns | 4,007 ns | 0.79x | 2,776 B | 1,600 B | 0.58x | | StringLong | 14,142 ns | 11,991 ns | 0.87x | 21,976 B | 13,120 B | 0.60x | | BytesSmall | 1,665 ns | 482 ns | 0.30x | 688 B | 0 B | 0.00x | | BytesLarge | 10,806 ns | 1,001 ns | 0.13x | 33,432 B | 0 B | 0.00x | byte[] payloads (the canonical wrapped-payload pattern when the inner is a binary blob) show the dramatic win: 3.5-8x faster and ZERO managed allocations. string payloads show smaller but real improvements (15-32% faster, 40-44% fewer allocations) with some residual allocation I haven't profiled yet (probably warmup/buffer growth artifacts in BDN's measurement; not blocking the design validation). The spike is benchmark-project-only — does not touch the running Akka.Remote infrastructure. Akka.Benchmarks already has InternalsVisibleTo from Akka.Remote so the spike can invoke real MessageSerializer.Serialize and AkkaPduProtobuffCodec.ConstructMessage directly for the V1 baseline. Files: - src/benchmark/Akka.Benchmarks/Serialization/V2ProtoSpike.cs - src/benchmark/Akka.Benchmarks/Serialization/V2ProtoBenchmarks.cs

Extends the V2 wrap-pipeline spike with a hand-rolled receive-side parser that mirrors AkkaPduProtobuffCodec.DecodeMessage + MessageSerializer.Deserialize, but without constructing any of the intermediate Protobuf message objects. New types - ProtoWire read helpers: ReadVarint32, ReadTag, ReadFixed64, ReadLengthDelimited, ReadString, SkipField. All take `ref ReadOnlySpan<byte>` and advance the span past consumed bytes. Accept both canonical and over-long varints, so V2 can parse V1's wire output as well as its own. - V2DeserializedEnvelope: result struct (RecipientPath, SenderPath, Seq, Payload). Mirrors what V1's AkkaPduCodec.DecodeMessage produces but without the IActorRef resolution step (deferred to the dispatcher). - V2RemoteEnvelopeReader: parses AckAndEnvelopeContainer wire bytes directly into the result struct. Has two entry points: Read(ReadOnlySpan<byte>) - for cases without a Memory backing Read(ReadOnlyMemory<byte>) - zero-copy slicing for the inner-payload bytes The Memory overload slices the original buffer at the inner-payload offset and wraps as a ReadOnlySequence<byte> for the V2 inner serializer's Deserialize. No intermediate byte[] for the inner payload — V1 allocates two (the ByteString backing plus ToByteArray). Dispatch is by integer serializer ID (registry.GetById), not by Type.GetType(manifest) — no reflection on receive side, no BinaryFormatter-class attack surface. Benchmark additions V2ProtoBenchmarks now runs both directions for every payload kind: - V1 read: AckAndEnvelopeContainer.Parser.ParseFrom + extract proto fields + MessageSerializer.Deserialize (the real Akka.Remote receive path) - V2 read: V2RemoteEnvelopeReader.Read using the Memory overload Setup verifies cross-version compat: the V2 reader correctly parses V1's canonical-varint wire bytes. Round-trip equality is asserted for both string and byte[] payloads (byte[] via SequenceEqual, since default object Equals on byte[] is reference equality). Results (V1 = real Akka.Remote path, V2 = spike) Writes: StringShort: 3849ns -> 2374ns (0.62x), 1512 B -> 840 B (0.56x) StringMedium: 4855ns -> 3690ns (0.76x), 2776 B -> 1600 B (0.58x) StringLong: 14058ns -> 11182ns (0.81x), 21976 B -> 13120 B (0.60x) BytesSmall: 1519ns -> 487ns (0.32x), 688 B -> 0 B (0.00x) BytesLarge: 9204ns -> 1087ns (0.16x), 33432 B -> 0 B (0.00x) Reads: StringShort: 3418ns -> 2688ns (0.79x), 3416 B -> 2976 B (0.87x) StringMedium: 9801ns -> 7787ns (0.79x), 4936 B -> 4240 B (0.86x) StringLong: 18267ns -> 16190ns (0.89x), 56720 B -> 52184 B (0.92x) BytesSmall: 1284ns -> 1275ns (0.99x), 664 B -> 216 B (0.33x) BytesLarge: 9092ns -> 7051ns (0.78x), 33400 B -> 16584 B (0.50x) Headlines - byte[] writes: 3-8x faster, ZERO managed allocations - byte[] reads: ~22% faster, 50-67% fewer allocations - string writes: 24-38% faster, 40-44% fewer allocations - string reads: 11-21% faster, modest allocation reduction (the result-string allocation dominates regardless of pipeline) Both directions show real Akka.Remote-equivalent benefits. The wrapped-payload pattern is essentially free in V2 for byte[] payloads.

Aaronontheweb · 2026-05-11T15:23:53Z

V2 wrap-pipeline spike — throughput numbers

Single-threaded serialization throughput on the real EndpointWriter.WriteSend path (V1 = today's MessageSerializer.Serialize + AkkaPduProtobuffCodec.ConstructMessage; V2 = spike with hand-written wire format + fixed-width varint patching + inline inner write). Wire format unchanged — V2 produces byte-equivalent AckAndEnvelopeContainer output parseable by V1 peers. Benchmark in src/benchmark/Akka.Benchmarks/Serialization/V2ProtoBenchmarks.cs.

Hardware: AMD Ryzen 9 9900X, .NET 10, BenchmarkDotNet MicroBenchmarkConfig. Numbers are per-EndpointWriter-thread; aggregate Akka.Remote throughput scales by concurrent endpoints and TCP pipelining.

Throughput (messages/sec per thread)

Send-side ceiling:

Payload	V1 msg/s	V2 msg/s	Improvement
StringShort	263,000	421,000	+60%
StringMedium	206,000	271,000	+32%
StringLong	71,000	89,000	+26%
BytesSmall	658,000	2,053,000	+212% (3.1×)
BytesLarge	109,000	920,000	+747% (8.5×)

Receive-side ceiling:

Payload	V1 msg/s	V2 msg/s	Improvement
StringShort	292,000	372,000	+27%
StringMedium	102,000	128,000	+26%
StringLong	55,000	62,000	+13%
BytesSmall	779,000	784,000	+1%
BytesLarge	110,000	142,000	+29%

Round-trip (send + receive on the same thread):

Payload	V1 msg/s	V2 msg/s	Improvement
StringShort	138,000	198,000	+44%
StringMedium	68,000	87,000	+28%
StringLong	31,000	37,000	+18%
BytesSmall	357,000	569,000	+60%
BytesLarge	55,000	123,000	+125% (2.2×)

Why byte[] payloads dominate the win

The wrapped-payload pattern (MiscMessageSerializer, ClusterShardingMessageSerializer, DistributedPubSubMessageSerializer, ReliableDeliverySerializer) wraps an inner user payload as a bytes field inside an outer Protobuf message. V1 today: inner serializer's byte[] → ByteString.CopyFrom → outer proto → .ToByteString(). Two byte[] allocations and one Protobuf serialization layer per wrap. V2: the inner V2 serializer writes its bytes directly into the same buffer the outer is writing to, and the length prefix is patched in place via the fixed-width-varint placeholder. No intermediate byte[]. Per-send allocations drop to zero on the V2 path; receive-side cuts allocations in half because V1's ByteString + payload.Message.ToByteArray() double-allocation is gone.

Allocation deltas (round-trip, send + receive)

Payload	V1 B/msg	V2 B/msg	Reduction
StringShort	4,928	3,816	−23%
StringMedium	7,712	5,840	−24%
StringLong	78,696	65,304	−17%
BytesSmall	1,352	216	−84%
BytesLarge	66,832	16,584	−75%

For a cluster pushing 100K msg/sec on byte[]-heavy traffic, V2 saves ~5 GB/sec of managed allocations — that's Gen-2 / pause-time relief that shows up in latency tails, not in single-thread throughput.

What this does NOT measure

End-to-end Akka.Remote throughput (RemotePingPong-style). That requires MessageSerializer integrated into the real EndpointWriter + transport path. The spike numbers tell you the per-thread serialization ceiling; whether serialization is the bottleneck in any specific workload depends on batching, network, and actor dispatch.
Latency under load.

Wire format

V2 output parses cleanly through AckAndEnvelopeContainer.Parser.ParseFrom (verified at setup).
V2 reader parses both canonical-varint (V1) and over-long-varint (V2) wire output (verified at setup).
Payload / RemoteEnvelope / AckAndEnvelopeContainer schemas unchanged. V2 nodes interoperate with V1 peers.

Next step

Contained integration pass: introduce MessageSerializerV2 + V2SerializerRegistry alongside V1 (no replacement), wire one wrapped-payload serializer end-to-end (probably MiscMessageSerializer), run RemotePingPong against the V2 path. That gives the aggregate-throughput data point the spike can't measure, while keeping V1 reversible.

Spike:

src/benchmark/Akka.Benchmarks/Serialization/V2ProtoSpike.cs
src/benchmark/Akka.Benchmarks/Serialization/V2ProtoBenchmarks.cs

Moves V2 spike code from the benchmark project into Akka.Remote so it can be referenced from production code, adds ConstructMessageV2 to AkkaPduProtobuffCodec, wires EndpointWriter.WriteSend to use it, and validates that RemotePingPong runs through V2 successfully. Changes - src/core/Akka.Remote/Serialization/V2/V2Codec.cs: spike code relocated from the benchmark project. Types are internal. Adds Ack support to V2RemoteEnvelopeWriter so the Akka.Remote ack-piggyback path works. V2SerializerRegistry takes a Serialization instance fallback so it can resolve any registered serializer (V2-native or V1Adapter-wrapped) without explicit Register calls. - src/core/Akka.Remote/Transport/AkkaPduCodec.cs: new ConstructMessageV2 method on AkkaPduProtobuffCodec. Skips the V1 SerializedMessage proto construction and AckAndEnvelopeContainer.ToByteString() in favor of hand-writing the wire format via PatchingBufferWriter. ThreadStatic buffer for per-thread pooling — EndpointWriter actors run on dispatcher threads, one buffer per thread, reset between calls. Final ByteString.CopyFrom is the only remaining unavoidable allocation (matches V1's ToByteString cost). - src/core/Akka.Remote/Endpoint.cs: EndpointWriter.WriteSend now calls ((AkkaPduProtobuffCodec)_codec).ConstructMessageV2(...) with the raw message object, bypassing SerializeMessage / SerializedMessage entirely. Validation - dotnet test src/core/Akka.Remote.Tests: 362 passing, 5 skipped, 0 failing. V2 send produces wire-compat bytes that V1 receive parses correctly — end-to-end Akka.Remote round-trips work with V2 on the send side. - RemotePingPong on AMD Ryzen 9 9900X (12 physical cores, ServerGC): | Clients | V1 msg/s | V2 msg/s | Delta | |--------:|----------:|----------:|------:| | 1 | 299,851 | 291,971 | -3% | | 5 | 361,795 | 415,455 | +15% | | 10 | 1,124,228 | 1,263,424 | +12% | | 15 | 1,348,921 | 1,306,621 | -3% | | 20 | 1,367,054 | 1,350,895 | -1% | | 25 | 1,333,334 | 1,316,830 | -1% | | 30 | 1,334,817 | 1,321,586 | -1% | V2 wins meaningfully (12-15%) at mid client counts where serialization dominates EndpointWriter throughput. At high client counts (15+) the per-thread plateau (~1.35M msg/s) is bottlenecked by something other than serialization — likely GC pressure or dispatcher contention — and the V2 win doesn't translate. At 1 client, network round-trip dominates so V2's marginal serialization win is invisible. Known limitations of this integration - ConstructMessageV2 still calls SerializeActorRef per call, which allocates ActorRefData + Path string. Caching these per-association on the EndpointWriter would eliminate that allocation for V2 (V1 cannot cache because it builds the whole proto graph fresh). - ByteString.CopyFrom at the end allocates a final byte[] for the wire bytes. UnsafeByteOperations.UnsafeWrap could eliminate this but requires sole ownership of the underlying byte[] — incompatible with the ThreadStatic buffer pool. Switching to ArrayPool rentals + UnsafeWrap would solve this but expands scope. - Receive-side stays on V1's AckAndEnvelopeContainer.Parser.ParseFrom + MessageSerializer.Deserialize. V2 receive integration is a follow-on. This is experimental work for learning purposes. The real V2 production PR will follow once we've absorbed what this integration teaches.

Aaronontheweb · 2026-05-11T15:42:33Z

V2 integrated into EndpointWriter — RemotePingPong results

Wired V2 into the real EndpointWriter.WriteSend path (commit ae54f85ca). Moved spike code from the benchmark project to Akka.Remote/Serialization/V2/, added ConstructMessageV2 to AkkaPduProtobuffCodec, added Ack support to V2RemoteEnvelopeWriter, added a ThreadStatic PatchingBufferWriter pool so the buffer isn't allocated per call.

Validation: Akka.Remote.Tests

362 passing, 5 skipped, 0 failing with V2 wired into WriteSend. Wire compat verified end-to-end — V2 send produces bytes that V1 receive (AckAndEnvelopeContainer.Parser.ParseFrom) parses correctly.

RemotePingPong throughput

Hardware: AMD Ryzen 9 9900X, 12 physical / 24 logical cores, .NET 10, ServerGC.

Clients	V1 msg/s	V2 msg/s	Delta
1	299,851	291,971	-3%
5	361,795	415,455	+15%
10	1,124,228	1,263,424	+12%
15	1,348,921	1,306,621	-3%
20	1,367,054	1,350,895	-1%
25	1,333,334	1,316,830	-1%
30	1,334,817	1,321,586	-1%

(Confirms @Aaronontheweb's 1.3M msg/s figure — V1 peaks at 1.37M msg/s at 20 clients on this hardware. The dev-branch baseline of ~680K msg/s at 30 clients in IMPLEMENTATION_ORDER.md was on different hardware; the 9900X comfortably doubles that.)

What this tells us

V2 wins meaningfully (12-15%) at 5-10 clients where the single EndpointWriter on each direction is serialization-bound. The per-message cost reduction translates directly to throughput.
V2 is flat (±3%) at 1 client and 15+ clients. At 1 client, network round-trip latency dominates so serialization savings are invisible. At 15+ clients, the system hits a different plateau (~1.35M msg/s) that the per-message benchmark doesn't capture — likely GC pressure from non-serializer allocations across many threads, or actor dispatch contention. Lowering serialization cost only helps until something else takes over as the bottleneck.

Known limitations of this integration

These would all be addressed in a clean production V2 PR — they're capped here because this branch is experimental:

ConstructMessageV2 still calls SerializeActorRef per call, allocating ActorRefData + a path string for recipient and sender. The per-association EndpointWriter could cache these (V2 design supports it; V1 cannot because the codec builds the whole proto graph fresh inside ConstructMessage).
Final ByteString.CopyFrom(buffer.WrittenSpan) allocates the wire byte[]. UnsafeByteOperations.UnsafeWrap would eliminate this but requires sole ownership of the underlying byte[] — incompatible with the ThreadStatic pool. ArrayPool rentals + UnsafeWrap is the way through.
Receive-side stays on V1 (AckAndEnvelopeContainer.Parser.ParseFrom + MessageSerializer.Deserialize). The synthetic benchmark showed receive-side wins are smaller anyway (10-30%, allocation-driven not throughput-driven), so prioritizing send-side here was right.

Implications for the real V2 PR

The experimental integration validates two important things:

V2 is correctness-compatible. Real Akka.Remote round-trips work. No wire-format incompatibility, no proto-parsing issues with over-long varints.
V2's per-message wins translate to real throughput improvements where serialization is the bottleneck. Mid-load (5-10 clients) saw the largest wins; the synthetic benchmark's per-message numbers (3-8× for byte[]) compose into 12-15% aggregate throughput.

To unlock V2's full potential, the production PR would need to:

Cache recipient/sender wire bytes on EndpointWriter (per-association)
Use ArrayPool<byte>.Shared.Rent + UnsafeByteOperations.UnsafeWrap for the final wire bytes (saves one byte[] per message)
Integrate V2 on receive (eliminate the AckAndEnvelopeContainer.Parser proto graph allocation)
Tackle whatever's bottlenecking the 15+ client case (likely GC pause + dispatcher) — possibly outside V2's scope

This branch is now in a state where the next step is closing it out and starting the clean PR with the learnings baked in.

Adds DecodeMessageV2 to AkkaPduProtobuffCodec that parses the AckAndEnvelopeContainer wire bytes directly into an AckAndMessage (same shape V1 produces) without allocating the proto graph (AckAndEnvelopeContainer / RemoteEnvelope / Payload / 2x ActorRefData objects). Inner payload bytes are wrapped zero-copy via UnsafeByteOperations.UnsafeWrap of a slice of the wire memory. Downstream pipeline is unchanged: the AckedReceiveBuffer / DeliverAndAck / Dispatch path operates on the V2-built AckAndMessage exactly the same way it operates on the V1-built one. Reliable delivery (re-delivery across reconnects, ordering, dedup) is preserved. V2 receive parsing is inline in the codec (helpers ParseEnvelopeMetadata, ParseAckMetadata, ParseEnvelopeFields, ParsePayloadFields, ExtractActorRefPath) rather than reusing V2RemoteEnvelopeReader.Read, because the reader's main entry point eagerly deserializes the inner payload via the V2 serializer registry. Keeping the inline parser metadata-only lets the SerializedMessage stay as the unit handed to the dispatcher — MessageSerializer.Deserialize runs in Dispatch the same way V1 does. Validation dotnet test src/core/Akka.Remote.Tests: 362 passing, 5 skipped, 0 failing. V2 on both send AND receive doesn't break any Akka.Remote behavior, including the reliable delivery / ack paths. RemotePingPong on AMD Ryzen 9 9900X, .NET 10, ServerGC: Clients V1 msg/s V2 send V2 send+recv 1 299,851 291,971 303,031 5 361,795 415,455 390,321 10 1,124,228 1,263,424 1,204,094 15 1,348,921 1,306,621 1,353,180 20 1,367,054 1,350,895 1,359,620 25 1,333,334 1,316,830 1,308,558 30 1,334,817 1,321,586 1,300,109 Adding V2 on receive shows small wins at low/mid client counts (8% at 5, 7% at 10) and is within noise of V1 (-3% to +1%) at high client counts. The plateau at ~1.35M msg/s persists in all configurations — confirming that something downstream of serialization (DotNetty buffering, GC pressure from other allocations, dispatcher contention, or actor scheduling) is the binding constraint at this load level. The V2 design's per-message savings can't translate to aggregate throughput once serialization stops being the bottleneck. This matches the broader Spec 3 hypothesis: realizing V2's full perf envelope requires the Streams TCP transport rewrite. DotNetty does its own internal buffering / copying that absorbs V2's zero-copy advantages before they reach the wire. Known limitations of this V2 receive integration - MessageSerializer.Deserialize still runs in Dispatch — calls payload.Message.ToByteArray() which materializes the wrapped inner bytes. UnsafeWrap is zero-copy on the ByteString construction but the later .ToByteArray() copies. Eliminating this would require pushing V2 dispatch into Endpoint.Dispatch with a deserialize-at-dispatch path that uses ReadOnlySequence directly. Out of scope for this experimental branch.

Aaronontheweb · 2026-05-11T16:56:53Z

Warmed-up RemotePingPong numbers + allocation profile

Re-ran RemotePingPong with 3 iterations on both V1 and V2 paths to remove JIT-warmup variance from the comparison. Best-of-3 per client count:

Clients	V1 best (msg/s)	V2 best (msg/s)	Improvement
1	917,432	980,393	+7%
5	1,253,133	1,362,398	+9%
10	1,317,524	1,409,444	+7%
15	1,385,042	1,423,150	+3%
20	1,384,084	1,414,928	+2%
25	1,328,375	1,424,908	+7%
30	1,300,673	1,402,853	+8%

V2 peak: ~1.42M msg/s vs V1 ~1.39M. Consistent +2% to +9% across all client counts, biggest wins at the low/serialization-dominated end (1-10 clients) and again at 25-30 clients where GC-pressure relief gives back headroom.

The single-iteration cold numbers reported earlier in this PR (V1 1.37M / V2 1.35M at 20 clients) were dominated by warmup noise — once both paths are warmed up, V2 is genuinely faster end-to-end.

Allocation profile (V2, 10s capture at 30 clients)

Captured via dotnet-trace collect --profile gc-verbose. Counters showed 6.6 GB/sec allocation rate, 72-93 Gen0 collections/sec, 4.7% of wall time in GC pauses.

Top allocators by bytes:

%	Bytes	Type	Source
40%	18.8 GB	`System.Byte[]`	`SerializeActorRef(...).ToByteArray()` ×2 per send + `ByteString.CopyFrom` for wire bytes + `payload.Message.ToByteArray()` per receive
21%	9.7 GB	`System.String`	Path strings from `actorRef.Path.ToSerializationFormat()` (×2 per send), UTF-8 decoded paths on receive
4.4%	2.1 GB	`Task<int>`	DotNetty's async write completions
4.2%	1.9 GB	`Google.Protobuf.ByteString`	Wire ByteString instances
3.8%	1.8 GB	`CodedOutputStream`	Protobuf write internals
3.8%	1.8 GB	`CodedInputStream`	Protobuf read internals
1.5%	0.7 GB	DotNetty `TaskCompletionSource` + continuations	DotNetty send-completion plumbing
1.4%	0.65 GB	`Akka.Remote.RemoteActorRef`	Actor-ref allocations on receive
1.4%	0.63 GB	`Akka.Remote.Transport.Message`	V2 still builds these for `AckedReceiveBuffer`
1.3%	0.60 GB	`ActorRefData` proto	`SerializeActorRef` build (V1 inherited per-send)
1.0%	0.49 GB	`Payload` proto	V2's `SerializedMessage` build per receive

Where V2 tightening has room

V2-fixable (10-15% allocation reduction available):

SerializeActorRef(...).ToByteArray() ×2 per send — cache wire bytes per association (recipient is fixed for the lifetime of an EndpointWriter; senders typically come from a small set)
ByteString.CopyFrom(buffer.WrittenSpan) final wire bytes — ArrayPool + UnsafeByteOperations.UnsafeWrap
payload.Message.ToByteArray() on receive — bypass via direct SerializerV2.Deserialize(ReadOnlySequence, manifest) on the V2 slice
Payload proto allocation on receive — go fully V2 on the dispatch path

Not V2-fixable (transport rewrite territory — Spec 3):

~13% of allocations are DotNetty internals (Task<int>, TaskCompletionSource, UnpooledHeapByteBuffer, IOVector[], etc.) — needs the Streams TCP transport to reach.

Verdict on this experimental branch

The V2 design is validated end-to-end:

✅ Wire format compatibility verified (Akka.Remote.Tests 362/362)
✅ Throughput improvement is real and consistent (+2-9% across client counts, +3% peak)
✅ GC pressure relief is real (4.7% pause time today; tightening targets identified to cut another 10-15%)
✅ Architectural ceiling identified — Spec 3 transport rewrite is the load-bearing piece for the next aggregate gain

Ready to land this as the "experimental research artifact" and start the clean V2 PR with these learnings baked in.

to11mtm · 2026-05-11T21:42:51Z

+        /// Tag = (field_number &lt;&lt; 3) | wire_type, varint-encoded. For field numbers 1–15
+        /// the tag fits in one byte.
+        /// </summary>
+        public static int WriteTag(IBufferWriter<byte> buffer, int fieldNumber, byte wireType)


FWIW back in COVID I found that it was better, at least for akka PDU hotwiring, to have pre-computed bytes/bytestrings for the PDU bits.

to11mtm · 2026-05-11T21:45:33Z

+        public static int WriteVarint32(IBufferWriter<byte> buffer, uint value)
+        {
+            var span = buffer.GetSpan(5);
+            var written = 0;
+            while (value >= 0x80)
+            {
+                span[written++] = (byte)(value | 0x80);
+                value >>= 7;
+            }
+            span[written++] = (byte)value;
+            buffer.Advance(written);
+            return written;
+        }


This is -why- it was better to use precomputed bits for the PDU; VarInt128 does not lend itself well to minimal branching in code, unless you are brave enough to try switch statements based on range and see if that works better...

to11mtm · 2026-05-11T21:47:00Z

+        public static void PatchFixedWidthVarint(Span<byte> placeholder, uint value)
+        {
+            // 5 bytes × 7 data bits = 35 bits — plenty for a uint32.
+            placeholder[0] = (byte)((value & 0x7F) | 0x80);
+            placeholder[1] = (byte)(((value >> 7) & 0x7F) | 0x80);
+            placeholder[2] = (byte)(((value >> 14) & 0x7F) | 0x80);
+            placeholder[3] = (byte)(((value >> 21) & 0x7F) | 0x80);
+            placeholder[4] = (byte)((value >> 28) & 0x7F);
+        }


curious that this logic is diverges so much from the bufferwriter write

Aaronontheweb added 6 commits May 10, 2026 12:13

to11mtm reviewed May 10, 2026

View reviewed changes

Aaronontheweb commented May 11, 2026

View reviewed changes

Aaronontheweb added 3 commits May 11, 2026 09:10

Aaronontheweb changed the title ~~Add SerializerV2 foundation (Spec 4, Milestone 2 — foundation only)~~ Experimental: integrating the V2 serialization POC into Akka.Remote (research / not for merge) May 11, 2026

Aaronontheweb mentioned this pull request May 11, 2026

Cache remote actor ref resolutions in RemoteActorRefProvider #8204

Open

to11mtm reviewed May 11, 2026

View reviewed changes

Conversation

Aaronontheweb commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR is for

What's on the branch right now

Foundation (will likely survive into the clean PR in some form)

The spike (this is the experimental piece)

Key learning so far

What still needs to land on this branch

What this PR will NOT become

What this PR is producing

Commits (current state, may keep evolving)

Design docs

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Aaronontheweb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Aaronontheweb commented May 11, 2026

V2 wrap-pipeline spike — throughput numbers

Throughput (messages/sec per thread)

Why byte[] payloads dominate the win

Allocation deltas (round-trip, send + receive)

What this does NOT measure

Wire format

Next step

Uh oh!

Aaronontheweb commented May 11, 2026

V2 integrated into EndpointWriter — RemotePingPong results

Validation: Akka.Remote.Tests

RemotePingPong throughput

What this tells us

Known limitations of this integration

Implications for the real V2 PR

Uh oh!

Aaronontheweb commented May 11, 2026

Warmed-up RemotePingPong numbers + allocation profile

Allocation profile (V2, 10s capture at 30 clients)

Where V2 tightening has room

Verdict on this experimental branch

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Aaronontheweb commented May 10, 2026 •

edited

Loading