Thanks to visit codestin.com
Credit goes to github.com

Skip to content

add performance focused APIs #17

@BurntSushi

Description

@BurntSushi

In building out Jiff, my focus has generally not been on performance. There are some Criterion benchmarks, but they are by no means exhaustive. I would love to have more added.

While I think there's always room to optimize the implementation (and if you have ideas that require major refactoring, please file an issue), this issue is more about API changes or additions that are needed to improve performance. For example, I'm thinking about things like "datetime arithmetic requires the use of Span, and Span is a very bulky type." At time of writing, its size is 64 bytes. I presume that this means that there is some non-trivial amount of time being spent in various operations on Span memcpying around this enormous bag of bytes.

Getting rid of Span or splitting it up isn't really an option as far as I'm concerned. So in order to make things go faster, I think we really just need to be able to provide APIs that operate on more primitive data types. Like, for example, an i64 of nanoseconds. This would come with various trade-offs, so I'd expect these APIs to be a bit more verbosely named.

Here's a probably non-exhaustive list of API design choices that likely have an impact on performance:

  • A Timestamp is always a 96-bit integer.
  • A Span is a whopping 64 bytes. A Span is Copy and most APIs accept a Span instead of a &Span.
  • Dealing with spans generally takes a lot of work. So when you add, say, a Span to a Timestamp, it's not just a simple matter of ~one integer addition. There's work needed to be done to collapse all of the clock units on Span down to a single integer number of nanoseconds, and then i128 math is used. Perhaps there is a way to optimize a Span for the "single unit" case, although doing this without indirection will make a Span even bigger, and doing it with indirection will remove the Copy bound. Another alternative might be leaving Span as-is, but adding non-Span APIs to operate on durations. But that has some design work needed as well. See add more integration with std::time::Duration? #21.
  • A Zoned embeds a TimeZone which means a Zoned is not Copy and cloning/dropping a Zoned is, while cheap, not as cheap as a small Copy type. (This also means most things take a &Zoned instead of a Zoned.)
  • A default TimeZoneDatabase uses a thread safe shared cache internally. This means time zone lookups by name require some kind of synchronization. (Usually this overhead can be worked around by asking for all the time zones you need up-front, but this is difficult if you're deserializing zoned datetime strings.)
  • The use of ranged integers internally makes some kinds of layout optimizations more difficult than they would be otherwise. For example, it might be tempting to bit pack the representation of some types, but if you do that, you'd have to either define a new ranged integer abstraction for bitpacking (plausible... but maybe complicated) or abdicate range checking altogether. You might think you can just convert to and from ranged integers as you need them, but the whole point of Jiff's ranged integer abstraction is that they track min/max values when debug_assertions are enabled. So if you drop the ranged integer type to do bit-packing and then re-create the ranged integer after unpacking, you will have lost the min/max values that were originally attached. Anyway, see consider ripping out ranged integer types #11 for more on this.

Anyway, before we get too focused on API design, I would like to see concrete use cases first in the form of executable code and/or benchmarks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedExtra attention is neededquestionFurther information is requested

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions