Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@mohasarc
Copy link

This PR optimizes the string construction logic in _extendLine, _extendLine_pretty, and _formatArray by replacing incremental string concatenation (s += ...) with list accumulation and .join().

The Issue: In Python, strings are immutable. The previous implementation used += to append to the s string inside recursive calls and loops. This forces the interpreter to create a new string object and copy the entire content on every iteration, resulting in quadratic time complexity O(N^2). This significantly degrades performance when formatting large arrays or deep structures.

The Solution: I refactored the functions to use a list (s_list) as an accumulator. Lines are appended to the list, and the final string is constructed once using ''.join(s_list) at the end of the formatting process. This ensures linear time complexity O(N) regarding memory allocation.

This is in line with standard python performance recommendation: https://peps.python.org/pep-0008/#programming-recommendations

@jorenham
Copy link
Member

Did you confirm that this indeed improves performance?

@mohasarc
Copy link
Author

Yes, and there is a significant speedup.

I tested on macbook M2 and python 3.11.11 and I used the following code for benchmarking

import numpy as np
import sys
import time

def run_heavy_benchmark():
    # 1. Create sufficiently large string
    long_str = "a" * 10_000
    
    # 2. Create a list of these strings
    rows = 50
    cols = 50
    data = np.full((rows, cols), long_str, dtype=object)

    print(f"Array shape: {data.shape}")
    print(f"String length per cell: {len(long_str)}")
    print("-" * 40)
    print("Starting conversion (this might hang without the fix)...")

    start_time = time.time()

    try:
        result = np.array2string(data, threshold=sys.maxsize, max_line_width=sys.maxsize)
    except MemoryError:
        print("Crashed with MemoryError! (Expected on old implementation)")
        return

    end_time = time.time()
    
    print("-" * 40)
    print(f"Final String Length: {len(result):,} chars")
    print(f"Time taken: {end_time - start_time:.4f} seconds")

if __name__ == "__main__":
    run_heavy_benchmark()

The results were as follows:

50X50 array

Before: 0.1072 seconds
After: 0.0867 seconds

500X500 array

Before: 59.4363 seconds
After: 33.3848 seconds

I tested using CPython, but I'd assume the performance impact is more significant on other python implementations since string concat is already optimized in CPython according to https://peps.python.org/pep-0008/#programming-recommendations.

Copy link
Contributor

@eendebakpt eendebakpt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can confirm the performance improvement for the benchmark from the OP. Is the benchmark representative for real-world cases? (e.g. why would someone put very large strings in numpy arrays of dtype object?)

For smaller cases the performance impact of the PR is neutral. (tested with long_str = "abcde" * 40 and rows = 20; cols = 20

The implementation itself looks ok and as the OP remarked using the join is a standard approach to efficient string concatenation.

@mohasarc
Copy link
Author

You're right; for CPython, the impact of the optimization is negligible, especially for short strings. However, in PyPy, the impact is much higher.

I've implemented an additional change to prevent list creation overhead, which was causing performance to degrade slightly in CPython for arrays with smaller strings.

I've also extended the benchmark to include a variety of string lengths, array shapes, and both dtype possibilities (dtype=object and the default). Then, I tested with both CPython and PyPy. The results are as follows:

Using PyPy

Dtype StrLen Shape Before (ms) After (ms)
object 100 (100, 100) 28.678 22.420
object 100 (200, 200) 181.311 78.322
object 100 (400, 400) 1404.370 387.183
object 500 (100, 100) 121.665 65.792
object 500 (200, 200) 752.913 308.560
object 500 (400, 400) 5438.833 1471.256
object 1000 (100, 100) 224.758 122.421
object 1000 (200, 200) 1167.583 524.106
object 1000 (400, 400) 8660.068 2824.459
object 5000 (100, 100) 815.186 563.605
object 5000 (200, 200) 5426.055 2688.147
object 5000 (400, 400) 142674.519 19974.628
<U100 100 (100, 100) 86.409 59.029
<U100 100 (200, 200) 489.234 332.261
<U100 100 (400, 400) 1599.828 1021.941
<U500 500 (100, 100) 196.879 163.089
<U500 500 (200, 200) 949.469 655.126
<U500 500 (400, 400) 79291.969 2866.978
<U1000 1000 (100, 100) 383.444 261.022
<U1000 1000 (200, 200) 1611.339 1110.126
<U1000 1000 (400, 400) 9579.738 5269.808
<U5000 5000 (100, 100) 1429.401 1123.077
<U5000 5000 (200, 200) 7332.455 5293.393
<U5000 5000 (400, 400) 45051.155 26732.417

Using CPython

Dtype StrLen Shape Before (ms) After (ms)
object 100 (100, 100) 9.234 7.911
object 100 (200, 200) 38.667 34.075
object 100 (400, 400) 171.439 169.277
object 500 (100, 100) 20.282 18.209
object 500 (200, 200) 117.366 113.162
object 500 (400, 400) 685.050 647.188
object 1000 (100, 100) 36.138 40.654
object 1000 (200, 200) 199.724 189.959
object 1000 (400, 400) 1139.380 1114.547
object 5000 (100, 100) 162.781 147.551
object 5000 (200, 200) 867.590 878.764
object 5000 (400, 400) 6811.729 5916.547
<U100 100 (100, 100) 16.754 14.577
<U100 100 (200, 200) 71.728 61.639
<U100 100 (400, 400) 254.428 249.150
<U500 500 (100, 100) 31.052 30.705
<U500 500 (200, 200) 154.542 168.961
<U500 500 (400, 400) 785.722 793.822
<U1000 1000 (100, 100) 53.329 56.266
<U1000 1000 (200, 200) 255.221 255.550
<U1000 1000 (400, 400) 1463.481 1452.615
<U5000 5000 (100, 100) 210.411 204.498
<U5000 5000 (200, 200) 1197.290 1125.832
<U5000 5000 (400, 400) 8756.840 8180.154

We can observe that for CPython there is a slight performance improvement in most cases. On the other hand, the impact on PyPy performance is rather large.

Below you can find the exact benchmark I used:

import numpy as np
import sys
import time

def benchmark(shape, str_len, dtype, runs=5):
    """Runs np.array2string benchmark for a given configuration."""
    long_str = "a" * str_len
    data = np.full(shape, long_str, dtype=dtype)
    
    # Capture actual dtype
    actual_dtype = str(data.dtype)
    
    times = []
    for _ in range(runs):
        start = time.time()
        try:
            np.array2string(data, threshold=sys.maxsize, max_line_width=sys.maxsize)
            times.append(time.time() - start)
        except MemoryError:
            return "MemoryError", actual_dtype
    
    avg_time = sum(times) / len(times)
    return avg_time, actual_dtype

def main():
    # Configurations to test
    shapes = [(100, 100), (200, 200), (400, 400)]
    str_lens = [100, 500, 1000, 5000]
    dtypes = [object, None] # object vs auto-detected
    runs = 5

    print(f"{'Dtype':<12} | {'StrLen':<7} | {'Shape':<12} | {'Avg Time (ms)':<14}")
    print("-" * 55)

    for dtype in dtypes:
        for str_len in str_lens:
            for shape in shapes:
                avg_time, actual_dtype = benchmark(shape, str_len, dtype, runs=runs)
                
                if isinstance(avg_time, str):
                    result_str = avg_time
                else:
                    result_str = f"{avg_time * 1000:.3f}"
                
                print(f"{actual_dtype:<12} | {str_len:<7} | {str(shape):<12} | {result_str:<14}")

if __name__ == "__main__":
    main()

@mohasarc mohasarc force-pushed the perf/optimize-array-formatting-strings branch from b0cb425 to 56e2fa4 Compare January 28, 2026 10:57
@ngoldbaum
Copy link
Member

ngoldbaum commented Jan 28, 2026

This seems fine to me and using list.append like this also benefits from having amortized O(1) performance on accumulating the strings, while the current approach is accidentally O(N^2), I think.

I don't think we've historically considered array printing and formatting to be performance-critical. The fact that it's implemented in Python (and C calls into Python to do this) is perhaps indicative of how we treat this.

Maybe it's worth reconsidering that and adding some string printing and formatting benchmarks? Currently there aren't any. Should probably be another PR though.

@ngoldbaum
Copy link
Member

Also, performance on pypy doesn't matter much: we no longer support pypy since we dropped support for Python 3.11 and there probably won't ever be a pypy 3.12.

@ngoldbaum ngoldbaum added this to the 2.5.0 Release milestone Jan 28, 2026
@seberg
Copy link
Member

seberg commented Jan 29, 2026

I would suggest doing most performance comparisons with integers or float values. That is the normal thing to print out. I would not worry about printing speed for very long strings.

If that adds to much overhead, then sure, use objects arrays with strings maybe (but those strings can still be typical numerical length, like <=20 characters).

EDIT: To be clear, I don't mind refactoring this even if it is a bit slower. There was a previous PR that attempted it and didn't have any speed benefit (or almost none) while making things slightly messier. Using StringIO to build the final result may also be interesting, but IIRC it is also no faster (but I guess it might be faster with the argument that += may be optimized in CPython only).

@mohasarc
Copy link
Author

I would suggest doing most performance comparisons with integers or float values. That is the normal thing to print out. I would not worry about printing speed for very long strings.

That makes sense, I'll run these benchmarks. I'm a little busy now, I'll do that in a couple of days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants