Experimental: integrating the V2 serialization POC into Akka.Remote (research / not for merge)#8203
Experimental: integrating the V2 serialization POC into Akka.Remote (research / not for merge)#8203Aaronontheweb wants to merge 11 commits into
Conversation
Scope split for Milestone 2 of the 1.6 transport epic. Originally the change carried the full V2 stack: foundation (base class, V1 adapter, infrastructure) plus the user-facing codec story (MessagePackSerializer, AkkaWriter/AkkaReader, attributes, the Akka.Serialization.V2 NuGet package) plus the Roslyn source generator. That's a lot of public API surface to lock in before the foundation has been validated by anything downstream. This change rewrites the openspec docs so Milestone 2 ships only the foundation and one set of reference serializers, plus a benchmark that proves the API earns its keep before Spec 3 builds on it: - SerializerV2 base class (IBufferWriter<byte> / ReadOnlySequence<byte> primary API, virtual byte[] bridge) - SerializerV1Adapter that wraps legacy Serializer/SerializerWithStringManifest - Serialization.cs and MessageSerializer.cs infrastructure changes - ByteArraySerializer + PrimitiveSerializers ported to SerializerV2 (covers all hand-rolled primitive paths: string via UTF-8, int32/int64 via BinaryPrimitives, byte[] passthrough; same IDs, byte-identical wire format) - Standalone transport-envelope benchmark in src/benchmark/ that simulates EndpointWriter's serialize-frame-deserialize chain on the V2 API and measures V2-direct vs V1-bridge Everything else moves to a new placeholder change (serializer-v2-codegen): MessagePackSerializer, sealed AkkaWriter/AkkaReader, the three attributes, the Akka.Serialization.V2 package, the Roslyn generator, and the mechanical port of the remaining Protobuf-based internal serializers (ClusterMessageSerializer, SystemMessageSerializer, the four WrappedPayloadSupport serializers). Rationale captured in the new proposal: the runtime codec API and the codegen that targets it must be designed together, since the runtime is the generator's emission target. Files: - openspec/IMPLEMENTATION_ORDER.md: retitle Milestone 2 "foundation only", document the 2026-05-10 scope change, point at serializer-v2-codegen - openspec/changes/serializer-v2/proposal.md: rewrite around the narrowed scope; explicit "Deferred to a future change" section - openspec/changes/serializer-v2/design.md: revised decisions (single layer in core Akka, reference impl scoped to Primitive+ByteArray, new decision documenting the benchmark as the validation gate) - openspec/changes/serializer-v2/tasks.md: restructured into 6 sections with explicit string/int32/int64/byte[] coverage, multi-segment input tests, and a "Section 6: Out of Scope (Documented Follow-On)" punch list for the Protobuf serializer ports - openspec/changes/serializer-v2/specs/serializer-v2-base/spec.md: per-serializer requirements with byte-identical wire format scenarios and multi-segment input scenarios; new requirement for the benchmark - openspec/changes/serializer-v2/specs/messagepack-serializer/spec.md: deleted (capability moves to serializer-v2-codegen) - openspec/changes/serializer-v2-codegen/: new placeholder change with .openspec.yaml and proposal.md sketching the deferred scope and arguing why we don't ship the runtime layer alone first
Establishes the V2 serialization API as the new internal foundation for Akka.NET's serialization subsystem. V1 serializers continue to work unchanged via a transparent adapter. New types in core Akka: - SerializerV2 (abstract): Buffer-aware serializer base class with Serialize(IBufferWriter<byte>, object), Deserialize(ReadOnlySequence<byte>, string), Manifest(object), and Identifier. Virtual byte[] bridges (ToBinary/FromBinary) keep V1-style call sites working. - SerializerV1Adapter: Wraps Serializer/SerializerWithStringManifest as a SerializerV2. Reproduces the V1 manifest dispatch (TypeQualifiedName for IncludeManifest=true plain serializers, custom manifest for SerializerWithStringManifest, empty otherwise) and delegates ToBinary/ FromBinary/FromBinary(Type) directly to the inner V1 to avoid pointless buffer round trips. Inner property exposes the wrapped V1 instance. - SerializerV2Extensions: AsV1<T>()/TryAsV1<T>() helpers for callers that hold strongly-typed references to V1 serializers (e.g. cast sites in tests and durable stores). Serialization.cs changes: - Internal storage migrated to SerializerV2 (auto-wraps V1 on registration from HOCON, SerializationSetup, AddSerializer, AddSerializationMap) - FindSerializerFor / FindSerializerForType / GetSerializerById / GetSerializerByName return SerializerV2 - ManifestFor(SerializerV2, object) overload added (just delegates to Manifest()); legacy ManifestFor(Serializer, object) preserved for back compat - AddSerializer / AddSerializationMap each have V1 + V2 overloads (V1 auto-wraps) - Deserialize(byte[], int, string) simplified to uniform V2 dispatch — the V1 adapter handles the type-vs-manifest dance internally Akka.Remote.MessageSerializer.cs: - Single uniform path for manifest dispatch via SerializerV2.Manifest(); no more `is SerializerWithStringManifest` type check or `IncludeManifest` branch Call site fixes (mechanical): - ~25 cast sites in tests/benchmarks switched to .AsV1<T>() (covers Hyperion, Newtonsoft, Cluster, Sharding, ClusterClient, PubSub, Singleton, ReplicatedDataSerializer, custom test serializers, and Akka.Remote.Tests primitive/misc) - Field declarations on Replicator._serializer and LocalSnapshotStore._wrapperSerializer changed to SerializerV2 - ActorSystemImpl.WarnIfJsonIsDefaultSerializer uses SerializerV1Adapter pattern match - Akka.Persistence.Custom example simplified to use uniform V2 Manifest() dispatch - ClusterMessageSerializer.GetObjectManifest takes SerializerV2 Adapter overrides bridge methods (ToBinary, FromBinary(byte[],string), FromBinary(byte[],Type)) to delegate to the inner V1 directly. V2-native Deserialize materializes the ReadOnlySequence<byte> to byte[] before calling FromBinary, since the wrapped V1 is byte[]-native — no performance win is possible for V1, only API parity. V2-native serializers (coming next: ByteArraySerializer + PrimitiveSerializers ports) get the actual zero-copy benefit. Build: 0 errors, 0 warnings on full solution.
Existing tests asserted on the V1 serializer type via .Should().BeOfType<T>() or by direct cast. With V2 dispatch wrapping V1 serializers in SerializerV1Adapter, those assertions now see the adapter type and fail. Mechanical fix across the affected test files: replace .Should().BeOfType<V1>() with .AsV1<V1>() (which throws if the V2 instance isn't a SerializerV1Adapter wrapping V1). The implicit assertion has the same intent — verify the right V1 serializer is bound — without requiring the test to know about the adapter wrapping. Affected test files: - Akka.Tests: SerializationSpec, SerializationSetupSpec, CustomSerializerSpec - Akka.Remote.Tests: DaemonMsgCreateSerializerSpec, MessageContainerSerializerSpec, MiscMessageSerializerSpec, ProtobufSerializerSpec, SystemMessageSerializationSpec - Akka.Cluster.Tests: ReliableDeliverySerializerSpecs - Akka.Cluster.Tools.Tests: ClusterClientSerializerSpec - Akka.DistributedData.Tests: ReplicatedDataSerializerSpec - Akka.Cluster.Sharding.Tests: DDataClusterShardingConfigSpec `using Akka.Serialization;` added where missing to bring the AsV1 / TryAsV1 extensions into scope. Test status: - Akka.Tests: 1248 passing, 23 skipped, 0 failing (full suite) - Akka.Remote.Tests serialization: 105 passing, 1 skipped, 0 failing - Akka.Cluster.Tests serialization: 49 passing, 0 failing - Akka.Cluster.Tools.Tests serialization: 51 passing, 0 failing
Both serializers now extend SerializerV2 directly instead of the legacy Serializer base class, while preserving their serializer IDs (4 and 17 respectively) and producing byte-identical wire format. They are the V2-native reference implementations used to validate the API. ByteArraySerializer (Akka core, ID 4): - Identity transform — the byte[] is the wire format - Serialize(IBufferWriter<byte>, byte[]) copies the input into the writer (the writer contract requires it; the V2 path costs one copy) - Deserialize(ReadOnlySequence<byte>, manifest) materializes a fresh byte[] via seq.ToArray() — callers may retain the returned reference, so we cannot alias to potentially pooled backing memory - ToBinary/FromBinary bridges overridden to skip the buffer round trip and pass the byte[] through directly (V1's zero-alloc behavior is preserved for the bridge path) - Manifest() returns string.Empty (no manifest needed) - Null handling drops V1's null passthrough — Serialization.cs routes null through NullSerializer, so the path was unreachable PrimitiveSerializers (Akka.Remote, ID 17): - Covers string / int32 / int64 with the same six manifest aliases as V1 (S/I/L plus the long-form .NET Core and .NET Framework type-name variants that legacy peers emit) - String serialize: Encoding.UTF8.GetBytes(string, Span<byte>) into the writer's span — no intermediate byte[] allocation - Int32/Int64 serialize: BinaryPrimitives.WriteInt*LittleEndian into fixed-width spans - String deserialize: Encoding.UTF8.GetString(ReadOnlySequence<byte>) on net6+, handles split codepoints across multi-segment input - Int32/Int64 deserialize: BinaryPrimitives.ReadInt*LittleEndian; falls back to a stack copy when the value spans a segment boundary - ToBinary/FromBinary bridges overridden — strings use Encoding.UTF8.GetBytes(string), ints use BitConverter on little-endian platforms (matching V1 byte-for-byte) and a manual little-endian fallback otherwise - use-legacy-behavior config flag preserved - SizeHint(o) returns precise sizes for fixed-width values and GetMaxByteCount(string.Length) for strings — gives the ArrayBufferWriter inside the bridge a tight initial allocation Test fixes: - SerializationSpec.cs / SerializationSpec.AllowUnregisteredTypesSpec: ByteArraySerializer is V2-native now, so the previous AsV1<ByteArraySerializer>() pattern is invalid (the type constraint T : Serializer rejects V2 types). Replaced with direct Should().BeOfType<ByteArraySerializer>() on the result. - PrimitiveSerializersSpec.cs: same — switched from .AsV1<PrimitiveSerializers>() to .Should().BeOfType<PrimitiveSerializers>().Subject (FluentAssertions pattern that returns the typed subject). Test status: - Akka.Tests serialization: 65 passing, 0 failing - Akka.Remote.Tests PrimitiveSerializersSpec: 17 passing, 0 failing
Direct unit tests for SerializerV1Adapter that exercise the wrapping behavior independently of Serialization's registration plumbing. Full HOCON / V1 auto-wrap path coverage continues to live in SerializationSpec and CustomSerializerSpec — this spec is for the adapter's own contract. Coverage: - Round-trip through buffer API (Serialize/Deserialize) for plain V1 with and without IncludeManifest, and for SerializerWithStringManifest - Round-trip through byte[] bridge (ToBinary/FromBinary) for both string-manifest and Type-typed FromBinary overloads - Manifest behavior matches V1 dispatch (TypeQualifiedName for IncludeManifest=true plain serializers, custom string for SerializerWithStringManifest, empty for IncludeManifest=false) - Identifier preserved from inner V1 - Inner property returns the wrapped instance unchanged - ToBinary/FromBinary bridge overrides produce byte-identical output to the inner V1 (so the bridge skip is correct) - Multi-segment ReadOnlySequence<byte> input handled correctly - AsV1<T>/TryAsV1<T> extension methods unwrap correctly and have the expected null-vs-throw failure semantics Three V1 fixture classes cover the three V1 dispatch flavors; a small ReadOnlySequenceSegment<byte> helper synthesizes multi-segment input without depending on a Pipe. 15/15 tests pass.
Benchmark (src/benchmark/Akka.Benchmarks/Serialization/SerializerV2EnvelopeBenchmarks.cs): Simulates what EndpointWriter will do once Spec 3 wires the Streams TCP transport to call SerializerV2.Serialize(IBufferWriter<byte>) directly: writes a Remote-shaped envelope to an ArrayBufferWriter<byte>, wraps the result as a ReadOnlySequence<byte>, reads the header via SequenceReader<byte>, hands the payload slice to SerializerV2.Deserialize(ReadOnlySequence<byte>, manifest). Compares against the V1-bridge path (ToBinary → byte[] → FromBinary) on the same serializer instance and the same payload. Envelope shape: [serializerId: int32 LE][manifestLen: int32 LE] [manifest: utf8][payload: bytes-to-end]. Payload length is implicit — the outer frame boundary is the boundary, matching how the real Streams TCP transport will frame messages in Spec 3. Payload matrix exercises every V2-native primitive path: - string short (5 chars), medium (256), long (4 KB) - int32, int64 - byte[] small (16 B), medium (1 KB), large (16 KB) No Akka.Remote / DotNetty / socket dependencies — pure shape benchmark to validate the V2 buffer API earns its keep on allocations and throughput before downstream specs build on it. Reusable for Spec 3 integration: drop in a real FrameBufferWriter in place of ArrayBufferWriter and re-run to confirm no regression. Configured with MicroBenchmarkConfig (MemoryDiagnoser + GitHub markdown exporter); run via the standard BenchmarkDotNet console runner. API baselines (src/core/Akka.API.Tests/verify/): - CoreAPISpec.ApproveCore.DotNet.verified.txt: ByteArraySerializer now extends SerializerV2; Serialization gets AddSerializer/AddSerializationMap V2 overloads, FindSerializerFor[Type] return SerializerV2, ManifestFor V2 overload; new public types SerializerV2, SerializerV1Adapter, SerializerV2Extensions. - CoreAPISpec.ApproveRemote.DotNet.verified.txt: PrimitiveSerializers now extends SerializerV2 with the buffer Serialize/Deserialize methods. Test status: 18/18 Akka.API.Tests passing; 0 errors / 0 warnings on full solution build.
| /// </summary> | ||
| /// <param name="o">The object whose manifest is requested.</param> | ||
| /// <returns>The manifest string, or <see cref="string.Empty"/> if no manifest is needed.</returns> | ||
| public abstract string Manifest(object o); |
There was a problem hiding this comment.
Nitpick: Would be nice to have a ReadOnlyMemory or ReadOnlySpan overload here for other use.
| /// </summary> | ||
| /// <param name="o">The object whose encoded size is being estimated.</param> | ||
| /// <returns>An estimate of the encoded byte length.</returns> | ||
| public virtual int SizeHint(object o) => 256; |
There was a problem hiding this comment.
One minor concern with this is whether it will lead to excessive boxing for consumers who misuse structs...
| /// <param name="buffer">The byte sequence containing the serialized object.</param> | ||
| /// <param name="manifest">The manifest hint, or <see cref="string.Empty"/>.</param> | ||
| /// <returns>The deserialized object.</returns> | ||
| public abstract object Deserialize(ReadOnlySequence<byte> buffer, string manifest); |
There was a problem hiding this comment.
There might come a time where being able to accept ReadOnlyMemory<char> manifest will be useful... so I at least want to risk bringing it up...
| /// </summary> | ||
| /// <param name="buffer">The buffer to write into.</param> | ||
| /// <param name="obj">The object to serialize.</param> | ||
| public abstract void Serialize(IBufferWriter<byte> buffer, object obj); |
There was a problem hiding this comment.
I don't hate this but it would be nice if we could get some form of SerializeEnvelope on here to keep things uniform and avoid branching.... but maybe that's too big...
Aaronontheweb
left a comment
There was a problem hiding this comment.
I think we're pretty far off from proving the concept on V2 serialization here
| /// </summary> | ||
| /// <param name="o">The object whose encoded size is being estimated.</param> | ||
| /// <returns>An estimate of the encoded byte length.</returns> | ||
| public virtual int SizeHint(object o) => 256; |
There was a problem hiding this comment.
Need some way of signaling, for backwards compat, "we don't know what the size of this object is - no size hint available"
| /// </summary> | ||
| /// <param name="buffer">The buffer to write into.</param> | ||
| /// <param name="obj">The object to serialize.</param> | ||
| public abstract void Serialize(IBufferWriter<byte> buffer, object obj); |
There was a problem hiding this comment.
some thoughts:
- Ensure WriteMessagesAsync/SaveAsync is called asynchronously in Async… #8163 - the correct fix for this type of flow control problem in AKka.Persistence is for Serialize / Deserialize to be
ValueTask-returning async functions. This naturally allows us to kick the serialization work out of band while avoiding some of the flow control problems 8163 introduced (and had to revert here: Revert Task.Yield() from AsyncWriteJournal and SnapshotStore (cherry-pick to dev) #8189 - We should return some type of result here IMHO - either the length of the written bytes, a result object that includes that information, or something else. Before we could get that information via the length of the return
byte[], now we don't get that information back since it's encapsulated inside theIBufferWriter<byte>.
| /// </summary> | ||
| /// <param name="obj">The object to serialize.</param> | ||
| /// <returns>A byte array containing the serialized object.</returns> | ||
| public virtual byte[] ToBinary(object obj) |
There was a problem hiding this comment.
Remove this - SerializerV2 doesn't need to be backwards compatible. That's a job for the SerializerV1Adapter. We're adapting V1 to V2, not V2 to V1.
| /// <param name="bytes">The serialized object's bytes.</param> | ||
| /// <param name="manifest">The manifest hint, or <see cref="string.Empty"/>.</param> | ||
| /// <returns>The deserialized object.</returns> | ||
| public virtual object FromBinary(byte[] bytes, string manifest) |
| /// <param name="bytes">The serialized object's bytes.</param> | ||
| /// <param name="type">The expected runtime type, or <c>null</c> if unspecified.</param> | ||
| /// <returns>The deserialized object.</returns> | ||
| public virtual object FromBinary(byte[] bytes, Type? type) |
There was a problem hiding this comment.
Remove this - same comment as above.
| /// <param name="buffer">The byte sequence containing the serialized object.</param> | ||
| /// <param name="manifest">The manifest hint, or <see cref="string.Empty"/>.</param> | ||
| /// <returns>The deserialized object.</returns> | ||
| public abstract object Deserialize(ReadOnlySequence<byte> buffer, string manifest); |
| /// Serializes the object and decorates serialized <see cref="IActorRef"/> instances using | ||
| /// the given <paramref name="address"/>. | ||
| /// </summary> | ||
| public byte[] ToBinaryWithAddress(Address address, object obj) |
There was a problem hiding this comment.
Where does this method get called usually and is there a better way of doing this than an implicit ThreadStatic variable? I believe this exists primarily for multi-transport Akka.Remote systems. Could we just require this context to be passed in explicitly in those callsites?
Per PR akkadotnet#8203 review feedback. The void return was hiding load-bearing information — callers (especially wrapped-payload outer serializers patching length prefixes) need to know how many bytes the Serialize call wrote to the buffer. They could fish it out of the writer state, but that's an indirect read that breaks if the writer is shared with other writes happening on the same call. This is the only API change being made before benchmarking. Other surface critiques (async/ValueTask, bridge removal, transport-info threading) remain held until perf data validates the basic V2 design. Deserialize signature is unchanged — the read side doesn't have an analogous patch-after-the-fact concern. Affected: - SerializerV2.Serialize: abstract int Serialize(IBufferWriter<byte>, object) - ByteArraySerializer.Serialize returns byte[].Length - PrimitiveSerializers.Serialize returns bytes-written per primitive type - SerializerV1Adapter.Serialize returns inner.ToBinary(obj).Length - Bridge ToBinary uses the returned count to size the ToArray slice - Benchmark + V1Adapter tests adjusted (return value discarded where callers don't need it) Tests: Akka.Tests serialization 80/80 passing.
Validates the SerializerV2 design against today's Akka.Remote MessageSerializer + AkkaPduCodec wrap pipeline. Reuses the existing Protobuf message types (AckAndEnvelopeContainer / RemoteEnvelope / Payload) without modification — the V2 path produces byte-equivalent wire output that Google.Protobuf parses transparently. What the spike contains - PatchingBufferWriter: IBufferWriter<byte> with a PatchSpan(offset, len) accessor for in-place length-prefix patching. - ProtoWire: hand-rolled Protobuf wire-format primitives (tag, varint, fixed-width varint placeholder + patch, fixed64, string). Mirrors what CodedOutputStream does internally, but writes against IBufferWriter directly (Google.Protobuf's WriteContext.Initialize(IBufferWriter) is internal and not callable from user code). - V2SerializerRegistry: Type -> SerializerV2 / ID -> SerializerV2 lookup. Static dispatch, no reflection at serialize time, no Type.GetType from manifest strings at deserialize time. - V2RemoteEnvelopeWriter: writes the full AckAndEnvelopeContainer pipeline directly into a PatchingBufferWriter. For each nested length-delimited field (envelope, payload, message bytes), reserves a fixed-width 5-byte varint placeholder, runs the inner write (using the bytes-written int return on SerializerV2.Serialize to know exactly how much was written), then patches the length prefix in place. How the patching technique works around Protobuf's nested-message length-prefix problem Protobuf's length-delimited wire format requires the inner's byte count to be known BEFORE the length prefix is written. Canonical varints are minimum-width, so a placeholder can't be patched retroactively without knowing how wide the varint will be. The trick: write the length prefix as a FIXED-WIDTH 5-byte varint always. 5 bytes give 35 bits of data — plenty for any uint32. Small values are encoded as over-long varints (continuation bits set on the first 4 bytes, value's low bits in byte 0, zeros in the rest). Google.Protobuf's CodedInputStream accepts up to 5 bytes for a uint32 varint and OR's the data bits regardless of canonicity, so the over-long form parses to the same value as the minimum-width form. That lets us reserve 5 bytes, run the inner write, then patch the varint in place using the int returned by Serialize. One pass, no scratch buffer, no intermediate byte[]. Wire-overhead cost: at most 4 extra bytes per length-delimited nested field. For Akka.Remote's three nesting levels, that's max 12 bytes per message — noise vs payload size. Wire compat is verified at benchmark setup: V2 output is parsed via AckAndEnvelopeContainer.Parser.ParseFrom and compared field-by-field against the V1 output (recipient path, payload bytes, serializer ID). Throws at setup if anything diverges. Benchmark results (V1 = real MessageSerializer + AkkaPduProtobuffCodec, V2 = the spike, both producing wire-equivalent AckAndEnvelopeContainer bytes): | Payload | V1 time | V2 time | Ratio | V1 alloc | V2 alloc | Alloc | |--------------|-----------|-----------|-------|----------|----------|-------| | StringShort | 3,522 ns | 2,375 ns | 0.68x | 1,512 B | 840 B | 0.56x | | StringMedium | 5,092 ns | 4,007 ns | 0.79x | 2,776 B | 1,600 B | 0.58x | | StringLong | 14,142 ns | 11,991 ns | 0.87x | 21,976 B | 13,120 B | 0.60x | | BytesSmall | 1,665 ns | 482 ns | 0.30x | 688 B | 0 B | 0.00x | | BytesLarge | 10,806 ns | 1,001 ns | 0.13x | 33,432 B | 0 B | 0.00x | byte[] payloads (the canonical wrapped-payload pattern when the inner is a binary blob) show the dramatic win: 3.5-8x faster and ZERO managed allocations. string payloads show smaller but real improvements (15-32% faster, 40-44% fewer allocations) with some residual allocation I haven't profiled yet (probably warmup/buffer growth artifacts in BDN's measurement; not blocking the design validation). The spike is benchmark-project-only — does not touch the running Akka.Remote infrastructure. Akka.Benchmarks already has InternalsVisibleTo from Akka.Remote so the spike can invoke real MessageSerializer.Serialize and AkkaPduProtobuffCodec.ConstructMessage directly for the V1 baseline. Files: - src/benchmark/Akka.Benchmarks/Serialization/V2ProtoSpike.cs - src/benchmark/Akka.Benchmarks/Serialization/V2ProtoBenchmarks.cs
Extends the V2 wrap-pipeline spike with a hand-rolled receive-side
parser that mirrors AkkaPduProtobuffCodec.DecodeMessage +
MessageSerializer.Deserialize, but without constructing any of the
intermediate Protobuf message objects.
New types
- ProtoWire read helpers: ReadVarint32, ReadTag, ReadFixed64,
ReadLengthDelimited, ReadString, SkipField. All take
`ref ReadOnlySpan<byte>` and advance the span past consumed bytes.
Accept both canonical and over-long varints, so V2 can parse V1's
wire output as well as its own.
- V2DeserializedEnvelope: result struct (RecipientPath, SenderPath,
Seq, Payload). Mirrors what V1's AkkaPduCodec.DecodeMessage produces
but without the IActorRef resolution step (deferred to the dispatcher).
- V2RemoteEnvelopeReader: parses AckAndEnvelopeContainer wire bytes
directly into the result struct. Has two entry points:
Read(ReadOnlySpan<byte>) - for cases without a Memory backing
Read(ReadOnlyMemory<byte>) - zero-copy slicing for the inner-payload bytes
The Memory overload slices the original buffer at the inner-payload
offset and wraps as a ReadOnlySequence<byte> for the V2 inner
serializer's Deserialize. No intermediate byte[] for the inner
payload — V1 allocates two (the ByteString backing plus ToByteArray).
Dispatch is by integer serializer ID (registry.GetById), not by
Type.GetType(manifest) — no reflection on receive side, no
BinaryFormatter-class attack surface.
Benchmark additions
V2ProtoBenchmarks now runs both directions for every payload kind:
- V1 read: AckAndEnvelopeContainer.Parser.ParseFrom + extract proto
fields + MessageSerializer.Deserialize (the real Akka.Remote receive
path)
- V2 read: V2RemoteEnvelopeReader.Read using the Memory overload
Setup verifies cross-version compat: the V2 reader correctly parses
V1's canonical-varint wire bytes. Round-trip equality is asserted for
both string and byte[] payloads (byte[] via SequenceEqual, since
default object Equals on byte[] is reference equality).
Results (V1 = real Akka.Remote path, V2 = spike)
Writes:
StringShort: 3849ns -> 2374ns (0.62x), 1512 B -> 840 B (0.56x)
StringMedium: 4855ns -> 3690ns (0.76x), 2776 B -> 1600 B (0.58x)
StringLong: 14058ns -> 11182ns (0.81x), 21976 B -> 13120 B (0.60x)
BytesSmall: 1519ns -> 487ns (0.32x), 688 B -> 0 B (0.00x)
BytesLarge: 9204ns -> 1087ns (0.16x), 33432 B -> 0 B (0.00x)
Reads:
StringShort: 3418ns -> 2688ns (0.79x), 3416 B -> 2976 B (0.87x)
StringMedium: 9801ns -> 7787ns (0.79x), 4936 B -> 4240 B (0.86x)
StringLong: 18267ns -> 16190ns (0.89x), 56720 B -> 52184 B (0.92x)
BytesSmall: 1284ns -> 1275ns (0.99x), 664 B -> 216 B (0.33x)
BytesLarge: 9092ns -> 7051ns (0.78x), 33400 B -> 16584 B (0.50x)
Headlines
- byte[] writes: 3-8x faster, ZERO managed allocations
- byte[] reads: ~22% faster, 50-67% fewer allocations
- string writes: 24-38% faster, 40-44% fewer allocations
- string reads: 11-21% faster, modest allocation reduction (the
result-string allocation dominates regardless of pipeline)
Both directions show real Akka.Remote-equivalent benefits. The
wrapped-payload pattern is essentially free in V2 for byte[] payloads.
V2 wrap-pipeline spike — throughput numbersSingle-threaded serialization throughput on the real Hardware: AMD Ryzen 9 9900X, .NET 10, BenchmarkDotNet Throughput (messages/sec per thread)Send-side ceiling:
Receive-side ceiling:
Round-trip (send + receive on the same thread):
Why byte[] payloads dominate the winThe wrapped-payload pattern ( Allocation deltas (round-trip, send + receive)
For a cluster pushing 100K msg/sec on byte[]-heavy traffic, V2 saves ~5 GB/sec of managed allocations — that's Gen-2 / pause-time relief that shows up in latency tails, not in single-thread throughput. What this does NOT measure
Wire format
Next stepContained integration pass: introduce Spike:
|
Moves V2 spike code from the benchmark project into Akka.Remote so it can be referenced from production code, adds ConstructMessageV2 to AkkaPduProtobuffCodec, wires EndpointWriter.WriteSend to use it, and validates that RemotePingPong runs through V2 successfully. Changes - src/core/Akka.Remote/Serialization/V2/V2Codec.cs: spike code relocated from the benchmark project. Types are internal. Adds Ack support to V2RemoteEnvelopeWriter so the Akka.Remote ack-piggyback path works. V2SerializerRegistry takes a Serialization instance fallback so it can resolve any registered serializer (V2-native or V1Adapter-wrapped) without explicit Register calls. - src/core/Akka.Remote/Transport/AkkaPduCodec.cs: new ConstructMessageV2 method on AkkaPduProtobuffCodec. Skips the V1 SerializedMessage proto construction and AckAndEnvelopeContainer.ToByteString() in favor of hand-writing the wire format via PatchingBufferWriter. ThreadStatic buffer for per-thread pooling — EndpointWriter actors run on dispatcher threads, one buffer per thread, reset between calls. Final ByteString.CopyFrom is the only remaining unavoidable allocation (matches V1's ToByteString cost). - src/core/Akka.Remote/Endpoint.cs: EndpointWriter.WriteSend now calls ((AkkaPduProtobuffCodec)_codec).ConstructMessageV2(...) with the raw message object, bypassing SerializeMessage / SerializedMessage entirely. Validation - dotnet test src/core/Akka.Remote.Tests: 362 passing, 5 skipped, 0 failing. V2 send produces wire-compat bytes that V1 receive parses correctly — end-to-end Akka.Remote round-trips work with V2 on the send side. - RemotePingPong on AMD Ryzen 9 9900X (12 physical cores, ServerGC): | Clients | V1 msg/s | V2 msg/s | Delta | |--------:|----------:|----------:|------:| | 1 | 299,851 | 291,971 | -3% | | 5 | 361,795 | 415,455 | +15% | | 10 | 1,124,228 | 1,263,424 | +12% | | 15 | 1,348,921 | 1,306,621 | -3% | | 20 | 1,367,054 | 1,350,895 | -1% | | 25 | 1,333,334 | 1,316,830 | -1% | | 30 | 1,334,817 | 1,321,586 | -1% | V2 wins meaningfully (12-15%) at mid client counts where serialization dominates EndpointWriter throughput. At high client counts (15+) the per-thread plateau (~1.35M msg/s) is bottlenecked by something other than serialization — likely GC pressure or dispatcher contention — and the V2 win doesn't translate. At 1 client, network round-trip dominates so V2's marginal serialization win is invisible. Known limitations of this integration - ConstructMessageV2 still calls SerializeActorRef per call, which allocates ActorRefData + Path string. Caching these per-association on the EndpointWriter would eliminate that allocation for V2 (V1 cannot cache because it builds the whole proto graph fresh). - ByteString.CopyFrom at the end allocates a final byte[] for the wire bytes. UnsafeByteOperations.UnsafeWrap could eliminate this but requires sole ownership of the underlying byte[] — incompatible with the ThreadStatic buffer pool. Switching to ArrayPool rentals + UnsafeWrap would solve this but expands scope. - Receive-side stays on V1's AckAndEnvelopeContainer.Parser.ParseFrom + MessageSerializer.Deserialize. V2 receive integration is a follow-on. This is experimental work for learning purposes. The real V2 production PR will follow once we've absorbed what this integration teaches.
V2 integrated into EndpointWriter — RemotePingPong resultsWired V2 into the real Validation: Akka.Remote.Tests362 passing, 5 skipped, 0 failing with V2 wired into RemotePingPong throughputHardware: AMD Ryzen 9 9900X, 12 physical / 24 logical cores, .NET 10, ServerGC.
(Confirms @Aaronontheweb's 1.3M msg/s figure — V1 peaks at 1.37M msg/s at 20 clients on this hardware. The dev-branch baseline of ~680K msg/s at 30 clients in What this tells us
Known limitations of this integrationThese would all be addressed in a clean production V2 PR — they're capped here because this branch is experimental:
Implications for the real V2 PRThe experimental integration validates two important things:
To unlock V2's full potential, the production PR would need to:
This branch is now in a state where the next step is closing it out and starting the clean PR with the learnings baked in. |
Adds DecodeMessageV2 to AkkaPduProtobuffCodec that parses the
AckAndEnvelopeContainer wire bytes directly into an AckAndMessage
(same shape V1 produces) without allocating the proto graph
(AckAndEnvelopeContainer / RemoteEnvelope / Payload / 2x ActorRefData
objects). Inner payload bytes are wrapped zero-copy via
UnsafeByteOperations.UnsafeWrap of a slice of the wire memory.
Downstream pipeline is unchanged: the AckedReceiveBuffer / DeliverAndAck /
Dispatch path operates on the V2-built AckAndMessage exactly the same
way it operates on the V1-built one. Reliable delivery (re-delivery
across reconnects, ordering, dedup) is preserved.
V2 receive parsing is inline in the codec (helpers ParseEnvelopeMetadata,
ParseAckMetadata, ParseEnvelopeFields, ParsePayloadFields,
ExtractActorRefPath) rather than reusing V2RemoteEnvelopeReader.Read,
because the reader's main entry point eagerly deserializes the inner
payload via the V2 serializer registry. Keeping the inline parser
metadata-only lets the SerializedMessage stay as the unit handed to
the dispatcher — MessageSerializer.Deserialize runs in Dispatch the same
way V1 does.
Validation
dotnet test src/core/Akka.Remote.Tests: 362 passing, 5 skipped, 0 failing.
V2 on both send AND receive doesn't break any Akka.Remote behavior,
including the reliable delivery / ack paths.
RemotePingPong on AMD Ryzen 9 9900X, .NET 10, ServerGC:
Clients V1 msg/s V2 send V2 send+recv
1 299,851 291,971 303,031
5 361,795 415,455 390,321
10 1,124,228 1,263,424 1,204,094
15 1,348,921 1,306,621 1,353,180
20 1,367,054 1,350,895 1,359,620
25 1,333,334 1,316,830 1,308,558
30 1,334,817 1,321,586 1,300,109
Adding V2 on receive shows small wins at low/mid client counts (8% at 5,
7% at 10) and is within noise of V1 (-3% to +1%) at high client counts.
The plateau at ~1.35M msg/s persists in all configurations — confirming
that something downstream of serialization (DotNetty buffering, GC
pressure from other allocations, dispatcher contention, or actor
scheduling) is the binding constraint at this load level. The V2 design's
per-message savings can't translate to aggregate throughput once
serialization stops being the bottleneck.
This matches the broader Spec 3 hypothesis: realizing V2's full perf
envelope requires the Streams TCP transport rewrite. DotNetty does its
own internal buffering / copying that absorbs V2's zero-copy advantages
before they reach the wire.
Known limitations of this V2 receive integration
- MessageSerializer.Deserialize still runs in Dispatch — calls
payload.Message.ToByteArray() which materializes the wrapped inner
bytes. UnsafeWrap is zero-copy on the ByteString construction but the
later .ToByteArray() copies. Eliminating this would require pushing
V2 dispatch into Endpoint.Dispatch with a deserialize-at-dispatch path
that uses ReadOnlySequence directly. Out of scope for this experimental
branch.
Warmed-up RemotePingPong numbers + allocation profileRe-ran RemotePingPong with 3 iterations on both V1 and V2 paths to remove JIT-warmup variance from the comparison. Best-of-3 per client count:
V2 peak: ~1.42M msg/s vs V1 ~1.39M. Consistent +2% to +9% across all client counts, biggest wins at the low/serialization-dominated end (1-10 clients) and again at 25-30 clients where GC-pressure relief gives back headroom. The single-iteration cold numbers reported earlier in this PR (V1 1.37M / V2 1.35M at 20 clients) were dominated by warmup noise — once both paths are warmed up, V2 is genuinely faster end-to-end. Allocation profile (V2, 10s capture at 30 clients)Captured via Top allocators by bytes:
Where V2 tightening has roomV2-fixable (10-15% allocation reduction available):
Not V2-fixable (transport rewrite territory — Spec 3):
Verdict on this experimental branchThe V2 design is validated end-to-end:
Ready to land this as the "experimental research artifact" and start the clean V2 PR with these learnings baked in. |
| /// Tag = (field_number << 3) | wire_type, varint-encoded. For field numbers 1–15 | ||
| /// the tag fits in one byte. | ||
| /// </summary> | ||
| public static int WriteTag(IBufferWriter<byte> buffer, int fieldNumber, byte wireType) |
There was a problem hiding this comment.
FWIW back in COVID I found that it was better, at least for akka PDU hotwiring, to have pre-computed bytes/bytestrings for the PDU bits.
| public static int WriteVarint32(IBufferWriter<byte> buffer, uint value) | ||
| { | ||
| var span = buffer.GetSpan(5); | ||
| var written = 0; | ||
| while (value >= 0x80) | ||
| { | ||
| span[written++] = (byte)(value | 0x80); | ||
| value >>= 7; | ||
| } | ||
| span[written++] = (byte)value; | ||
| buffer.Advance(written); | ||
| return written; | ||
| } |
There was a problem hiding this comment.
This is -why- it was better to use precomputed bits for the PDU; VarInt128 does not lend itself well to minimal branching in code, unless you are brave enough to try switch statements based on range and see if that works better...
| public static void PatchFixedWidthVarint(Span<byte> placeholder, uint value) | ||
| { | ||
| // 5 bytes × 7 data bits = 35 bits — plenty for a uint32. | ||
| placeholder[0] = (byte)((value & 0x7F) | 0x80); | ||
| placeholder[1] = (byte)(((value >> 7) & 0x7F) | 0x80); | ||
| placeholder[2] = (byte)(((value >> 14) & 0x7F) | 0x80); | ||
| placeholder[3] = (byte)(((value >> 21) & 0x7F) | 0x80); | ||
| placeholder[4] = (byte)((value >> 28) & 0x7F); | ||
| } |
There was a problem hiding this comment.
curious that this logic is diverges so much from the bufferwriter write
What this PR is for
MessageSerializer+AkkaPduCodecwrap pipeline.AckAndEnvelopeContainer/RemoteEnvelope/Payloadwire format.EndpointWriterso RemotePingPong actually exercises it, and capture aggregate throughput data the per-message benchmark can't reach.What's on the branch right now
Foundation (will likely survive into the clean PR in some form)
SerializerV2— buffer-aware base class (Serialize(IBufferWriter<byte>, object) → int,Deserialize(ReadOnlySequence<byte>, string) → object,Manifest,Identifier). Virtualbyte[]bridges keep legacy callsites working during transition.SerializerV1Adapter— wrapsSerializer/SerializerWithStringManifestasSerializerV2. Reproduces V1 manifest dispatch internally; overridesToBinary/FromBinaryto skip the buffer round trip when the inner isbyte[]-native.Innerproperty exposes the wrapped V1.SerializerV2Extensions—AsV1<T>()/TryAsV1<T>()helpers for callsites holding strongly-typed V1 references.Serialization.cs— internal storage migrated toSerializerV2. HOCON +SerializationSetupV1 instances auto-wrap on registration.FindSerializerFor*returnsSerializerV2.AddSerializer/AddSerializationMapget V1 + V2 overloads.Deserialize(byte[], int, string)simplified to uniform V2 dispatch.MessageSerializer.cs(Akka.Remote) — single uniformManifest()dispatch path.ByteArraySerializer(ID 4) andPrimitiveSerializers(ID 17) — ported to V2-native, byte-identical wire format.The spike (this is the experimental piece)
PatchingBufferWriter—IBufferWriter<byte>withPatchSpan(offset, length)for in-place length-prefix patching.ProtoWire— hand-rolled Protobuf wire-format helpers: tag, varint, fixed-width 5-byte varint placeholder + patch, fixed64, length-delimited, string. MirrorsCodedOutputStreaminternals but operates againstIBufferWriter<byte>directly.V2SerializerRegistry— Type → SerializerV2 / ID → SerializerV2 lookup. Static dispatch at serialize time, ID-keyed dispatch at deserialize. NoType.GetType(manifest)reflection.V2RemoteEnvelopeWriter/V2RemoteEnvelopeReader— write and read theAckAndEnvelopeContainer/RemoteEnvelope/Payloadwire format directly viaIBufferWriter<byte>/ReadOnlyMemory<byte>. Inner V2 serializer invoked inline against the same buffer.V2ProtoBenchmarks— V1 path uses realMessageSerializer.Serialize+AkkaPduProtobuffCodec.ConstructMessage; V2 path uses the spike. Both produce wire-equivalentAckAndEnvelopeContainerbytes (verified at setup viaAckAndEnvelopeContainer.Parser.ParseFrom).Key learning so far
Per-message serialization throughput, single-threaded, hardware: AMD Ryzen 9 9900X:
Send-side allocations drop to zero for byte[] payloads. Receive-side allocations drop by 50–67%. Wire format is byte-equivalent — V1 peers parse V2 output, V2 reader parses V1 output. Full numbers in this comment.
What still needs to land on this branch
PatchingBufferWriter,ProtoWire,V2SerializerRegistry,V2RemoteEnvelopeWriter,V2RemoteEnvelopeReader) from the benchmark project intoAkka.Remote/Serialization/V2so they can be referenced from production code.V2SerializerRegistryfromSerialization.Serializationat startup so the registry knows about all V2-native and V1-wrapped serializers.ConstructMessageV2toAkkaPduProtobuffCodecand wireEndpointWriter.WriteSendto call it (skipping the intermediateSerializedMessageproto object).openspec/IMPLEMENTATION_ORDER.md).DecodeMessageparses V2 wire bytes correctly because Google.Protobuf accepts over-long varints).What this PR will NOT become
e6b49676for the spec-narrowing decision), and benchmark scaffolding.Akka.Remote/Serialization/V2, ship the necessary subset, and have a proper review cycle.What this PR is producing
V2SerializerRegistryinitializes, whereConstructMessageV2lives, how V1 and V2 paths coexist during transition).Commits (current state, may keep evolving)
e6b49676— Narrow milestone scope; createserializer-v2-codegenplaceholder1d5d2a04—SerializerV2+SerializerV1Adapterfoundation + infrastructure1c1c36a7— Update existing tests for V1Adapter wrappingcc4bed24— PortByteArraySerializer+PrimitiveSerializersto V2e1f7261f—SerializerV1AdapterSpecunit tests6b7d8d02— Transport-envelope benchmark + API baselines92719443— V2 wrap-pipeline spike (PatchingBufferWriter + AckAndEnvelopeContainer wire format)35e109aa— V2 spike read path + benchmarks for both directionsDesign docs
openspec/changes/serializer-v2/proposal.mdopenspec/changes/serializer-v2/design.mdopenspec/changes/serializer-v2-codegen/proposal.md(deferred scope)