Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@kazuho
Copy link
Owner

@kazuho kazuho commented Jan 7, 2023

Use IO::Buffer for retaining packet image as well as internal structures (e.g., pseudo header, L4 port tuple).

Benchmark results:

main IO::Buffer
reflector (Core i5-1240p, bare metal) 1.57Gbps 1.79Gbps
NAT (Core i7-9750H, VMware fusion to host) 182Mbps 183Mbps

nattable.rb Outdated
b.copy(l3_tuple)
b.copy(packet.l4.tuple, l3_tuple_size)

b.get_string
Copy link
Owner Author

@kazuho kazuho Jan 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, the reason we are not seeing speedup with NAT is due to these lines. Honestly, I doubt if this function and remote_key_from_packet have become slower by switching to IO::Buffer.

Here, we are building a key to lookup a NAT table, by concatenating the IP address tuple and the port tuple.

It could be the case that the String class of ruby has optimizations for handling tiny strings as well as concatenating them, while IO::Buffer does not have something alike.

Also, IO::Buffer cannot be used as a hash key and we have to call IO::Buffer#get_string. That can be costing us as well.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, IO::Buffer cannot be used as a hash key and we have to call IO::Buffer#get_string. That can be costing us as well.

Maybe IO::Buffer should implement hash and eql? based on the byte contents? cc @ioquatix

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do that but for a hash key it can be tricky, since it can be mutated.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A String can be mutated too, but Hash actually does .dup.freeze when putting a non-frozen String into the Hash.
I'm not sure mutation is an issue, but indeed it would be awkward for IO::Buffer to cache the hash, maybe there is no need to cache it in the IO::Buffer though.

Anyway, this seems a good case for transfer_string.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one could use IO::Buffer.for here to workaround the lack of transfer_string:

    l3_tuple = packet.tuple
    l3_tuple_size = l3_tuple.size
    string = "\0".b * (l3_tuple_size + 4) # "\0".b could be stored in a constant

    IO::Buffer.for(string) do |b|
      b.copy(l3_tuple)
      b.copy(packet.l4.tuple, l3_tuple_size)
    end

    string

This should avoid the extra bytes copy.
Of course there is still an allocation of l3_tuple_size + 4 bytes but there were 2 of them in the code above.

I think this is less elegant/more convoluted than .transfer_string (and it may be hard to support on some Ruby implementations without an extra copy on .for) but it's one way that already works now.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the suggestion! Applied in 7dca4c3.

Ideally, I would prefer doing (packet.tuple + packet.l4.tuple).transfer_string, because then there would be zero intermediary objects and all the lengths and offsets can be calculated in C.

But this is better than what I wrote.

@eregon
Copy link

eregon commented Jan 7, 2023

IMHO this looks quite a bit cleaner than using String for the usage here.

Re IO::Buffer#get_string, maybe we should have a way to reuse the buffer of IO::Buffer for the string?
Maybe with some method that freezes the buffer and gets the string?
Or even a method which clears the buffer and instead use the underlying char* for the String (i.e., transferring the char* to the String, and no longer using it for the IO::Buffer), like IO::Buffer#transfer but returning a String?
WDYT @ioquatix?

@ioquatix
Copy link

ioquatix commented Jan 7, 2023

I think both ideas are acceptable but let me try it out. @kazuho do you mind opening issue on bugs.ruby-lang.org with your requirements?

@ioquatix
Copy link

ioquatix commented Jan 7, 2023

You should see if caching and reusing IO::Buffer instances gives you a performance advantage. It would be interesting to see if it helps.

@ioquatix
Copy link

ioquatix commented Jan 8, 2023

I thought about this more.

Most binary formats will have a body string packed into a packet of data, e.g. WebSockets. Sometimes it's compressed (or needs to be compressed).

Converting a full buffer to a string e.g. #transfer_string might not be that useful in practice due to the binary framing surrounding the string.

@eregon
Copy link

eregon commented Jan 8, 2023

Converting a full buffer to a string e.g. #transfer_string might not be that useful in practice due to the binary framing surrounding the string.

As you can see in this PR, there are multiple usages of .get_string without any offset, so #transfer_string would be much better there.
But even if one needs an offset and length, that could be achieved without copying bytes via lazy substrings: https://bugs.ruby-lang.org/issues/19315. That already works on CRuby if the substring goes until the end of the string and that issue is to make it always work, like it does in TruffleRuby and probably JRuby.
So buffer.transfer_string[20, 100] would not copy any bytes.

EDIT:
Actually since there is IO::Buffer#slice, it could be: buffer.slice(20, 100).transfer_string.
Except that that would need to clear buffer as well, and potentially other slices of buffer which seems not feasible.
I think one way is to mark the original buffer (buffer in this case) as read-only on .transfer_string, and then it's safe to share the bytes with the String. If the String wants to mutates them it'll copy (standard shared String COW).

@ioquatix
Copy link

ioquatix commented Jan 8, 2023

get_string returns a mutable copy but we could certainly add a zero copy interface that returns frozen strings. However, it would obviously depend on the lifetime and mutability of the buffer. It's probably expected that IO::Buffers are reused for subsequent packets of data, so it might not be a good design. I'll have to try it out when I get around to implementing QUIC and HTTP/2 with the updated interfaces.

@eregon
Copy link

eregon commented Jan 8, 2023

I'll have to try it out when I get around to implementing QUIC and HTTP/2 with the updated interfaces.

This project seems also about binary network protocol parsing, so I think it could be fine to try it here.
Basically this seems a near-ideal use case for IO::Buffer, so I think it's worth exploring how to use and potentially improve IO::Buffer for this project, so it's efficient and easy to use.

b.copy(src_addr)
b.copy(packet.l4.tuple, addr_size)
key = ZERO_STR.byteslice(0, addr_size + 4)
IO::Buffer.for(key) do |b|
Copy link

@ioquatix ioquatix Jan 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's neat idea, I didn't think about it. Maybe we can do the similar:

IO::Buffer.string(size) do |buffer|
  # ...
end # => string

At the end, the buffer would be transferred to string zero copy.

Copy link
Owner Author

@kazuho kazuho Jan 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That interface SGTM (though I wonder if there should be a static function as part of String that generates a fixed-length, zero-filled bytes). I'm all in to optimizations that reduce the conversion cost bet. the two types.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems a nice addition, and it's easy to support without extra copying on TruffleRuby, unlike the current usage here (would need one copy to go from Rope to byte[] internally).

@kazuho
Copy link
Owner Author

kazuho commented Jan 9, 2023

With all the changes, when running the NAT on bare metal (Raspberry Pi 2b @ 900MHz), I do see slightly better performance with IO::Buffer:

average stdev
main@0727aa7 22.1 1.02
this PR@7dca4c3 23.2 1.41

Units are Mbps. Ruby 3.2.0 without yjit (it is not supported on armv7).

@kazuho
Copy link
Owner Author

kazuho commented Jan 9, 2023

Interestingly, difference is now smaller if any:

average stdev
main@b3b96d2 26.7 0.44
this PR@86cfab7 27.1 0.31

(rat.rb; units are Mbps; raspberry pi 2b @ 900MHz yjit off)

PS. With reflector.rb, the winner is different depending on if yjit is used:

average stdev
main@9d4161d; yjit on 1.92 0.01
this PR@86cfab7; yjit on 2.05 0.01
main@9d4161d; yjit off 1.49 0.01
this PR@86cfab7; yjit off 1.39 0.00

(reflector.rb; units are Gbps; Core i5-1240P)

@ioquatix
Copy link

How do you run the benchmark?

@kazuho
Copy link
Owner Author

kazuho commented Jan 10, 2023

@ioquatix https://github.com/kazuho/rat/wiki/test-setup here it is. I've copy-pasted it from my memo so they could be slightly off.

@kazuho
Copy link
Owner Author

kazuho commented Jan 11, 2023

As of main @ 2252490 vs. this PR @ cc41d29:

reflector.rb (Core i5-1240P; Gbps):

average stdev
main 1.56 0.01
main + yjit 2.07 0.01
this PR 1.46 0.00
this PR + yjit 2.17 0.01

rat.rb (Raspberry Pi 2b; Mbps):

average stdev
main 31.4 0.36
this PR 30.8 0.29

@ioquatix
Copy link

What do you think about introducing ruby/ruby#7364

@kazuho
Copy link
Owner Author

kazuho commented Feb 24, 2023

@ioquatix Woot. I think that'd be a nice addition for IO::Buffer.

I'm not sure if it that change would make this branch run as faster as master, but I'll rerun the benchmark (my theory is that IPv4 address tuples are tiny enough (at most 12 bytes) and can be embedded as strings inside VALUEs, it could be tough to outcompete that).

@ioquatix
Copy link

ioquatix commented Feb 24, 2023

Allocating a string of 12 bytes which is used by reference is probably only a single allocation. But temporary IO::Buffer is also allocation because Ruby doesn't have any kind of stack allocation for VALUE AFAIK. I don't know if getting the RSTRING_PTR can cause external allocation or not. We can probably check it.

@ioquatix
Copy link

ioquatix commented Feb 5, 2024

IO::Buffer#string was introduced in Ruby 3.3 - so you can try it out.

@ioquatix
Copy link

ioquatix commented Mar 7, 2025

Would be interesting to see updated benchmarks.

@ioquatix ioquatix moved this from In Progress to Done in Open Source Mar 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants