Implement bytes.startswith in mypyc#20387
Conversation
72f9ae5 to
2064d9e
Compare
e8c65fe to
1185d2f
Compare
for more information, see https://pre-commit.ci
|
According to mypy_primer, this change doesn't affect type check results on a corpus of open source code. ✅ |
| const char *self_buf = PyBytes_AS_STRING(self); | ||
| const char *subobj_buf = PyBytes_AS_STRING(subobj); | ||
|
|
||
| if (subobj_len == 0) { |
There was a problem hiding this comment.
maybe this if check should go above the 2 PyBytes_AS_STRING lines? We can exit without those calls if the check returns true
There was a problem hiding this comment.
Good call, updated. I split the checks around each PyBytes_GET_SIZE call a bit further to optimize for the empty-arg case. Probably won't save a ton but I don't think it makes it unreadable
| # Test empty cases | ||
| assert test.startswith(b'') | ||
| assert b''.startswith(b'') | ||
| assert not b''.startswith(test) |
There was a problem hiding this comment.
Test with bytearray 1) as the receiver object and 2) the argument. This way we will also test the slow path.
There was a problem hiding this comment.
Added a few checks to cover those as well
Rounding out #20387 and implementing `bytes.endswith`. Simple benchmark shows a ~6.4x improvement. Tested with the following benchmark code: ``` import time def bench(suffix: bytes, a: list[bytes], n: int) -> int: i = 0 for x in range(n): for b in a: if b.endswith(suffix): i += 1 return i a = [b"foo", b"barasdfsf", b"foobar", b"ab", b"asrtert", b"sertyeryt"] n = 5 * 1000 * 1000 suffix = b"foo" bench(suffix, a, n) t0 = time.time() bench(suffix, a, n) td = time.time() - t0 print(f"{td}s") ``` Output: ``` $ python bench.py 0.9002199172973633s $ python -c "import bench" 0.13828086853027344s ```
Rounding out python#20387 and implementing `bytes.endswith`. Simple benchmark shows a ~6.4x improvement. Tested with the following benchmark code: ``` import time def bench(suffix: bytes, a: list[bytes], n: int) -> int: i = 0 for x in range(n): for b in a: if b.endswith(suffix): i += 1 return i a = [b"foo", b"barasdfsf", b"foobar", b"ab", b"asrtert", b"sertyeryt"] n = 5 * 1000 * 1000 suffix = b"foo" bench(suffix, a, n) t0 = time.time() bench(suffix, a, n) td = time.time() - t0 print(f"{td}s") ``` Output: ``` $ python bench.py 0.9002199172973633s $ python -c "import bench" 0.13828086853027344s ```
Rounding out python#20387 and implementing `bytes.endswith`. Simple benchmark shows a ~6.4x improvement. Tested with the following benchmark code: ``` import time def bench(suffix: bytes, a: list[bytes], n: int) -> int: i = 0 for x in range(n): for b in a: if b.endswith(suffix): i += 1 return i a = [b"foo", b"barasdfsf", b"foobar", b"ab", b"asrtert", b"sertyeryt"] n = 5 * 1000 * 1000 suffix = b"foo" bench(suffix, a, n) t0 = time.time() bench(suffix, a, n) td = time.time() - t0 print(f"{td}s") ``` Output: ``` $ python bench.py 0.9002199172973633s $ python -c "import bench" 0.13828086853027344s ```
Implements
bytes.startswithin mypy. Potentially could be more efficient without relying onmemcmpbut not sure.Tested with the following benchmark code, which shows a ~6.3x performance improvement compared to standard Python:
Output: