use IO::Buffer #2

kazuho · 2023-01-07T07:00:22Z

Use IO::Buffer for retaining packet image as well as internal structures (e.g., pseudo header, L4 port tuple).

Benchmark results:

	main	IO::Buffer
reflector (Core i5-1240p, bare metal)	1.57Gbps	1.79Gbps
NAT (Core i7-9750H, VMware fusion to host)	182Mbps	183Mbps

kazuho · 2023-01-07T07:10:25Z

nattable.rb

+    b.copy(l3_tuple)
+    b.copy(packet.l4.tuple, l3_tuple_size)
+
+    b.get_string


Probably, the reason we are not seeing speedup with NAT is due to these lines. Honestly, I doubt if this function and remote_key_from_packet have become slower by switching to IO::Buffer.

Here, we are building a key to lookup a NAT table, by concatenating the IP address tuple and the port tuple.

It could be the case that the String class of ruby has optimizations for handling tiny strings as well as concatenating them, while IO::Buffer does not have something alike.

Also, IO::Buffer cannot be used as a hash key and we have to call IO::Buffer#get_string. That can be costing us as well.

Also, IO::Buffer cannot be used as a hash key and we have to call IO::Buffer#get_string. That can be costing us as well.

Maybe IO::Buffer should implement hash and eql? based on the byte contents? cc @ioquatix

We could do that but for a hash key it can be tricky, since it can be mutated.

A String can be mutated too, but Hash actually does .dup.freeze when putting a non-frozen String into the Hash.
I'm not sure mutation is an issue, but indeed it would be awkward for IO::Buffer to cache the hash, maybe there is no need to cache it in the IO::Buffer though.

Anyway, this seems a good case for transfer_string.

I think one could use IO::Buffer.for here to workaround the lack of transfer_string:

l3_tuple = packet.tuple l3_tuple_size = l3_tuple.size string = "\0".b * (l3_tuple_size + 4) # "\0".b could be stored in a constant IO::Buffer.for(string) do |b| b.copy(l3_tuple) b.copy(packet.l4.tuple, l3_tuple_size) end string

This should avoid the extra bytes copy.
Of course there is still an allocation of l3_tuple_size + 4 bytes but there were 2 of them in the code above.

I think this is less elegant/more convoluted than .transfer_string (and it may be hard to support on some Ruby implementations without an extra copy on .for) but it's one way that already works now.

Thank you for the suggestion! Applied in 7dca4c3.

Ideally, I would prefer doing (packet.tuple + packet.l4.tuple).transfer_string, because then there would be zero intermediary objects and all the lengths and offsets can be calculated in C.

But this is better than what I wrote.

eregon · 2023-01-07T15:18:26Z

IMHO this looks quite a bit cleaner than using String for the usage here.

Re IO::Buffer#get_string, maybe we should have a way to reuse the buffer of IO::Buffer for the string?
Maybe with some method that freezes the buffer and gets the string?
Or even a method which clears the buffer and instead use the underlying char* for the String (i.e., transferring the char* to the String, and no longer using it for the IO::Buffer), like IO::Buffer#transfer but returning a String?
WDYT @ioquatix?

ioquatix · 2023-01-07T23:06:37Z

I think both ideas are acceptable but let me try it out. @kazuho do you mind opening issue on bugs.ruby-lang.org with your requirements?

ioquatix · 2023-01-07T23:55:12Z

You should see if caching and reusing IO::Buffer instances gives you a performance advantage. It would be interesting to see if it helps.

ioquatix · 2023-01-08T10:33:34Z

I thought about this more.

Most binary formats will have a body string packed into a packet of data, e.g. WebSockets. Sometimes it's compressed (or needs to be compressed).

Converting a full buffer to a string e.g. #transfer_string might not be that useful in practice due to the binary framing surrounding the string.

eregon · 2023-01-08T11:52:44Z

Converting a full buffer to a string e.g. #transfer_string might not be that useful in practice due to the binary framing surrounding the string.

As you can see in this PR, there are multiple usages of .get_string without any offset, so #transfer_string would be much better there.
But even if one needs an offset and length, that could be achieved without copying bytes via lazy substrings: https://bugs.ruby-lang.org/issues/19315. That already works on CRuby if the substring goes until the end of the string and that issue is to make it always work, like it does in TruffleRuby and probably JRuby.
So buffer.transfer_string[20, 100] would not copy any bytes.

EDIT:
Actually since there is IO::Buffer#slice, it could be: buffer.slice(20, 100).transfer_string.
Except that that would need to clear buffer as well, and potentially other slices of buffer which seems not feasible.
I think one way is to mark the original buffer (buffer in this case) as read-only on .transfer_string, and then it's safe to share the bytes with the String. If the String wants to mutates them it'll copy (standard shared String COW).

ioquatix · 2023-01-08T20:55:39Z

get_string returns a mutable copy but we could certainly add a zero copy interface that returns frozen strings. However, it would obviously depend on the lifetime and mutability of the buffer. It's probably expected that IO::Buffers are reused for subsequent packets of data, so it might not be a good design. I'll have to try it out when I get around to implementing QUIC and HTTP/2 with the updated interfaces.

eregon · 2023-01-08T21:03:46Z

I'll have to try it out when I get around to implementing QUIC and HTTP/2 with the updated interfaces.

This project seems also about binary network protocol parsing, so I think it could be fine to try it here.
Basically this seems a near-ideal use case for IO::Buffer, so I think it's worth exploring how to use and potentially improve IO::Buffer for this project, so it's efficient and easy to use.

ioquatix · 2023-01-09T02:25:18Z

nattable.rb

-    b.copy(src_addr)
-    b.copy(packet.l4.tuple, addr_size)
+    key = ZERO_STR.byteslice(0, addr_size + 4)
+    IO::Buffer.for(key) do |b|


It's neat idea, I didn't think about it. Maybe we can do the similar:

IO::Buffer.string(size) do |buffer| # ... end # => string

At the end, the buffer would be transferred to string zero copy.

That interface SGTM (though I wonder if there should be a static function as part of String that generates a fixed-length, zero-filled bytes). I'm all in to optimizations that reduce the conversion cost bet. the two types.

That seems a nice addition, and it's easy to support without extra copying on TruffleRuby, unlike the current usage here (would need one copy to go from Rope to byte[] internally).

kazuho · 2023-01-09T02:52:30Z

With all the changes, when running the NAT on bare metal (Raspberry Pi 2b @ 900MHz), I do see slightly better performance with IO::Buffer:

	average	stdev
main@0727aa7	22.1	1.02
this PR@7dca4c3	23.2	1.41

Units are Mbps. Ruby 3.2.0 without yjit (it is not supported on armv7).

kazuho · 2023-01-09T10:21:02Z

Interestingly, difference is now smaller if any:

	average	stdev
main@b3b96d2	26.7	0.44
this PR@86cfab7	27.1	0.31

(rat.rb; units are Mbps; raspberry pi 2b @ 900MHz yjit off)

PS. With reflector.rb, the winner is different depending on if yjit is used:

	average	stdev
main@9d4161d; yjit on	1.92	0.01
this PR@86cfab7; yjit on	2.05	0.01
main@9d4161d; yjit off	1.49	0.01
this PR@86cfab7; yjit off	1.39	0.00

(reflector.rb; units are Gbps; Core i5-1240P)

ioquatix · 2023-01-10T07:32:17Z

How do you run the benchmark?

kazuho · 2023-01-10T10:43:33Z

@ioquatix https://github.com/kazuho/rat/wiki/test-setup here it is. I've copy-pasted it from my memo so they could be slightly off.

kazuho · 2023-01-11T06:05:01Z

As of main @ 2252490 vs. this PR @ cc41d29:

reflector.rb (Core i5-1240P; Gbps):

	average	stdev
main	1.56	0.01
main + yjit	2.07	0.01
this PR	1.46	0.00
this PR + yjit	2.17	0.01

rat.rb (Raspberry Pi 2b; Mbps):

	average	stdev
main	31.4	0.36
this PR	30.8	0.29

…t is a hot function

ioquatix · 2023-02-23T06:33:04Z

What do you think about introducing ruby/ruby#7364

kazuho · 2023-02-24T04:51:59Z

@ioquatix Woot. I think that'd be a nice addition for IO::Buffer.

I'm not sure if it that change would make this branch run as faster as master, but I'll rerun the benchmark (my theory is that IPv4 address tuples are tiny enough (at most 12 bytes) and can be embedded as strings inside VALUEs, it could be tough to outcompete that).

ioquatix · 2023-02-24T06:22:15Z

Allocating a string of 12 bytes which is used by reference is probably only a single allocation. But temporary IO::Buffer is also allocation because Ruby doesn't have any kind of stack allocation for VALUE AFAIK. I don't know if getting the RSTRING_PTR can cause external allocation or not. We can probably check it.

ioquatix · 2024-02-05T02:34:54Z

IO::Buffer#string was introduced in Ruby 3.3 - so you can try it out.

ioquatix · 2025-03-07T08:25:37Z

Would be interesting to see updated benchmarks.

kazuho added 3 commits January 7, 2023 09:34

switch to IO::Buffer

80fcd3f

delay dup

e1bd5e4

*_from_tuple can be slow

a6b38a8

kazuho commented Jan 7, 2023

View reviewed changes

use IO::Buffer#for to generate keys, as suggested by @eregon

7dca4c3

ioquatix reviewed Jan 9, 2023

View reviewed changes

kazuho added 3 commits January 9, 2023 07:31

fix overflow error as well as optimizing adjustments

ddd08f0

Merge branch 'main' into kazuho/io-buffer

d878fa5

no need to dup

86cfab7

Merge branch 'main' into kazuho/io-buffer

2927025

Merge branch 'main' into kazuho/io-buffer

cc41d29

kazuho added 7 commits January 11, 2023 15:43

no default values, move handling of odd-length out of IP.sum16 as i…

1b3f739

…t is a hot function

Merge branch 'main' into kazuho/io-buffer

70e74bc

Merge branch 'main' into kazuho/io-buffer

5380c41

Merge branch 'main' into kazuho/io-buffer

99e3878

packet.tuple is IO::Buffer

4954481

declaration should not allow parameter omission

86abb0f

Merge branch 'main' into kazuho/io-buffer

2072059

kazuho added 2 commits January 13, 2023 15:34

Merge branch 'main' into kazuho/io-buffer

9ec5674

Merge branch 'main' into kazuho/io-buffer

53fe71e

ioquatix moved this from In Progress to Done in Open Source Mar 7, 2025

use IO::Buffer #2

Are you sure you want to change the base?

use IO::Buffer #2

Uh oh!

Conversation

kazuho commented Jan 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kazuho Jan 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eregon Jan 7, 2023

Choose a reason for hiding this comment

Uh oh!

ioquatix Jan 7, 2023

Choose a reason for hiding this comment

Uh oh!

eregon Jan 8, 2023

Choose a reason for hiding this comment

Uh oh!

eregon Jan 8, 2023

Choose a reason for hiding this comment

Uh oh!

kazuho Jan 9, 2023

Choose a reason for hiding this comment

Uh oh!

eregon commented Jan 7, 2023

Uh oh!

ioquatix commented Jan 7, 2023

Uh oh!

ioquatix commented Jan 7, 2023

Uh oh!

ioquatix commented Jan 8, 2023

Uh oh!

eregon commented Jan 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ioquatix commented Jan 8, 2023

Uh oh!

eregon commented Jan 8, 2023

Uh oh!

ioquatix Jan 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kazuho Jan 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eregon Jan 9, 2023

Choose a reason for hiding this comment

Uh oh!

kazuho commented Jan 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kazuho commented Jan 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ioquatix commented Jan 10, 2023

Uh oh!

kazuho commented Jan 10, 2023

Uh oh!

kazuho commented Jan 11, 2023

Uh oh!

ioquatix commented Feb 23, 2023

Uh oh!

kazuho commented Feb 24, 2023

Uh oh!

ioquatix commented Feb 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ioquatix commented Feb 5, 2024

Uh oh!

ioquatix commented Mar 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

kazuho commented Jan 7, 2023 •

edited

Loading

kazuho Jan 7, 2023 •

edited

Loading

eregon commented Jan 8, 2023 •

edited

Loading

ioquatix Jan 9, 2023 •

edited

Loading

kazuho Jan 9, 2023 •

edited

Loading

kazuho commented Jan 9, 2023 •

edited

Loading

kazuho commented Jan 9, 2023 •

edited

Loading

ioquatix commented Feb 24, 2023 •

edited

Loading