-
Notifications
You must be signed in to change notification settings - Fork 8
use IO::Buffer #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
nattable.rb
Outdated
| b.copy(l3_tuple) | ||
| b.copy(packet.l4.tuple, l3_tuple_size) | ||
|
|
||
| b.get_string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, the reason we are not seeing speedup with NAT is due to these lines. Honestly, I doubt if this function and remote_key_from_packet have become slower by switching to IO::Buffer.
Here, we are building a key to lookup a NAT table, by concatenating the IP address tuple and the port tuple.
It could be the case that the String class of ruby has optimizations for handling tiny strings as well as concatenating them, while IO::Buffer does not have something alike.
Also, IO::Buffer cannot be used as a hash key and we have to call IO::Buffer#get_string. That can be costing us as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also,
IO::Buffercannot be used as a hash key and we have to callIO::Buffer#get_string. That can be costing us as well.
Maybe IO::Buffer should implement hash and eql? based on the byte contents? cc @ioquatix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could do that but for a hash key it can be tricky, since it can be mutated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A String can be mutated too, but Hash actually does .dup.freeze when putting a non-frozen String into the Hash.
I'm not sure mutation is an issue, but indeed it would be awkward for IO::Buffer to cache the hash, maybe there is no need to cache it in the IO::Buffer though.
Anyway, this seems a good case for transfer_string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think one could use IO::Buffer.for here to workaround the lack of transfer_string:
l3_tuple = packet.tuple
l3_tuple_size = l3_tuple.size
string = "\0".b * (l3_tuple_size + 4) # "\0".b could be stored in a constant
IO::Buffer.for(string) do |b|
b.copy(l3_tuple)
b.copy(packet.l4.tuple, l3_tuple_size)
end
stringThis should avoid the extra bytes copy.
Of course there is still an allocation of l3_tuple_size + 4 bytes but there were 2 of them in the code above.
I think this is less elegant/more convoluted than .transfer_string (and it may be hard to support on some Ruby implementations without an extra copy on .for) but it's one way that already works now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the suggestion! Applied in 7dca4c3.
Ideally, I would prefer doing (packet.tuple + packet.l4.tuple).transfer_string, because then there would be zero intermediary objects and all the lengths and offsets can be calculated in C.
But this is better than what I wrote.
|
IMHO this looks quite a bit cleaner than using Re |
|
I think both ideas are acceptable but let me try it out. @kazuho do you mind opening issue on bugs.ruby-lang.org with your requirements? |
|
You should see if caching and reusing |
|
I thought about this more. Most binary formats will have a body string packed into a packet of data, e.g. WebSockets. Sometimes it's compressed (or needs to be compressed). Converting a full buffer to a string e.g. |
As you can see in this PR, there are multiple usages of EDIT: |
|
|
This project seems also about binary network protocol parsing, so I think it could be fine to try it here. |
| b.copy(src_addr) | ||
| b.copy(packet.l4.tuple, addr_size) | ||
| key = ZERO_STR.byteslice(0, addr_size + 4) | ||
| IO::Buffer.for(key) do |b| |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's neat idea, I didn't think about it. Maybe we can do the similar:
IO::Buffer.string(size) do |buffer|
# ...
end # => string
At the end, the buffer would be transferred to string zero copy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That interface SGTM (though I wonder if there should be a static function as part of String that generates a fixed-length, zero-filled bytes). I'm all in to optimizations that reduce the conversion cost bet. the two types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems a nice addition, and it's easy to support without extra copying on TruffleRuby, unlike the current usage here (would need one copy to go from Rope to byte[] internally).
|
With all the changes, when running the NAT on bare metal (Raspberry Pi 2b @ 900MHz), I do see slightly better performance with IO::Buffer:
Units are Mbps. Ruby 3.2.0 without yjit (it is not supported on armv7). |
|
Interestingly, difference is now smaller if any:
(rat.rb; units are Mbps; raspberry pi 2b @ 900MHz yjit off) PS. With reflector.rb, the winner is different depending on if yjit is used:
(reflector.rb; units are Gbps; Core i5-1240P) |
|
How do you run the benchmark? |
|
@ioquatix https://github.com/kazuho/rat/wiki/test-setup here it is. I've copy-pasted it from my memo so they could be slightly off. |
…t is a hot function
|
What do you think about introducing ruby/ruby#7364 |
|
@ioquatix Woot. I think that'd be a nice addition for IO::Buffer. I'm not sure if it that change would make this branch run as faster as master, but I'll rerun the benchmark (my theory is that IPv4 address tuples are tiny enough (at most 12 bytes) and can be embedded as strings inside VALUEs, it could be tough to outcompete that). |
|
Allocating a string of 12 bytes which is used by reference is probably only a single allocation. But temporary IO::Buffer is also allocation because Ruby doesn't have any kind of stack allocation for VALUE AFAIK. I don't know if getting the |
|
|
|
Would be interesting to see updated benchmarks. |
Use IO::Buffer for retaining packet image as well as internal structures (e.g., pseudo header, L4 port tuple).
Benchmark results: