-
-
Notifications
You must be signed in to change notification settings - Fork 12k
Perf: Optimize string construction in array formatting functions #30730
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Perf: Optimize string construction in array formatting functions #30730
Conversation
|
Did you confirm that this indeed improves performance? |
|
Yes, and there is a significant speedup. I tested on import numpy as np
import sys
import time
def run_heavy_benchmark():
# 1. Create sufficiently large string
long_str = "a" * 10_000
# 2. Create a list of these strings
rows = 50
cols = 50
data = np.full((rows, cols), long_str, dtype=object)
print(f"Array shape: {data.shape}")
print(f"String length per cell: {len(long_str)}")
print("-" * 40)
print("Starting conversion (this might hang without the fix)...")
start_time = time.time()
try:
result = np.array2string(data, threshold=sys.maxsize, max_line_width=sys.maxsize)
except MemoryError:
print("Crashed with MemoryError! (Expected on old implementation)")
return
end_time = time.time()
print("-" * 40)
print(f"Final String Length: {len(result):,} chars")
print(f"Time taken: {end_time - start_time:.4f} seconds")
if __name__ == "__main__":
run_heavy_benchmark()The results were as follows: I tested using CPython, but I'd assume the performance impact is more significant on other python implementations since string concat is already optimized in CPython according to https://peps.python.org/pep-0008/#programming-recommendations. |
eendebakpt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can confirm the performance improvement for the benchmark from the OP. Is the benchmark representative for real-world cases? (e.g. why would someone put very large strings in numpy arrays of dtype object?)
For smaller cases the performance impact of the PR is neutral. (tested with long_str = "abcde" * 40 and rows = 20; cols = 20
The implementation itself looks ok and as the OP remarked using the join is a standard approach to efficient string concatenation.
|
You're right; for CPython, the impact of the optimization is negligible, especially for short strings. However, in PyPy, the impact is much higher. I've implemented an additional change to prevent list creation overhead, which was causing performance to degrade slightly in CPython for arrays with smaller strings. I've also extended the benchmark to include a variety of string lengths, array shapes, and both dtype possibilities ( Using PyPy
Using CPython
We can observe that for CPython there is a slight performance improvement in most cases. On the other hand, the impact on PyPy performance is rather large. Below you can find the exact benchmark I used: import numpy as np
import sys
import time
def benchmark(shape, str_len, dtype, runs=5):
"""Runs np.array2string benchmark for a given configuration."""
long_str = "a" * str_len
data = np.full(shape, long_str, dtype=dtype)
# Capture actual dtype
actual_dtype = str(data.dtype)
times = []
for _ in range(runs):
start = time.time()
try:
np.array2string(data, threshold=sys.maxsize, max_line_width=sys.maxsize)
times.append(time.time() - start)
except MemoryError:
return "MemoryError", actual_dtype
avg_time = sum(times) / len(times)
return avg_time, actual_dtype
def main():
# Configurations to test
shapes = [(100, 100), (200, 200), (400, 400)]
str_lens = [100, 500, 1000, 5000]
dtypes = [object, None] # object vs auto-detected
runs = 5
print(f"{'Dtype':<12} | {'StrLen':<7} | {'Shape':<12} | {'Avg Time (ms)':<14}")
print("-" * 55)
for dtype in dtypes:
for str_len in str_lens:
for shape in shapes:
avg_time, actual_dtype = benchmark(shape, str_len, dtype, runs=runs)
if isinstance(avg_time, str):
result_str = avg_time
else:
result_str = f"{avg_time * 1000:.3f}"
print(f"{actual_dtype:<12} | {str_len:<7} | {str(shape):<12} | {result_str:<14}")
if __name__ == "__main__":
main() |
b0cb425 to
56e2fa4
Compare
|
This seems fine to me and using I don't think we've historically considered array printing and formatting to be performance-critical. The fact that it's implemented in Python (and C calls into Python to do this) is perhaps indicative of how we treat this. Maybe it's worth reconsidering that and adding some string printing and formatting benchmarks? Currently there aren't any. Should probably be another PR though. |
|
Also, performance on pypy doesn't matter much: we no longer support pypy since we dropped support for Python 3.11 and there probably won't ever be a pypy 3.12. |
|
I would suggest doing most performance comparisons with integers or float values. That is the normal thing to print out. I would not worry about printing speed for very long strings. If that adds to much overhead, then sure, use objects arrays with strings maybe (but those strings can still be typical numerical length, like <=20 characters). EDIT: To be clear, I don't mind refactoring this even if it is a bit slower. There was a previous PR that attempted it and didn't have any speed benefit (or almost none) while making things slightly messier. Using |
That makes sense, I'll run these benchmarks. I'm a little busy now, I'll do that in a couple of days. |
This PR optimizes the string construction logic in
_extendLine,_extendLine_pretty, and_formatArrayby replacing incremental string concatenation (s += ...) with list accumulation and.join().The Issue: In Python, strings are immutable. The previous implementation used
+=to append to the s string inside recursive calls and loops. This forces the interpreter to create a new string object and copy the entire content on every iteration, resulting in quadratic time complexityO(N^2). This significantly degrades performance when formatting large arrays or deep structures.The Solution: I refactored the functions to use a list (
s_list) as an accumulator. Lines are appended to the list, and the final string is constructed once using''.join(s_list)at the end of the formatting process. This ensures linear time complexityO(N)regarding memory allocation.This is in line with standard python performance recommendation: https://peps.python.org/pep-0008/#programming-recommendations