Perf: Optimize string construction in array formatting functions #30730

mohasarc · 2026-01-26T21:55:41Z

This PR optimizes the string construction logic in _extendLine, _extendLine_pretty, and _formatArray by replacing incremental string concatenation (s += ...) with list accumulation and .join().

The Issue: In Python, strings are immutable. The previous implementation used += to append to the s string inside recursive calls and loops. This forces the interpreter to create a new string object and copy the entire content on every iteration, resulting in quadratic time complexity O(N^2). This significantly degrades performance when formatting large arrays or deep structures.

The Solution: I refactored the functions to use a list (s_list) as an accumulator. Lines are appended to the list, and the final string is constructed once using ''.join(s_list) at the end of the formatting process. This ensures linear time complexity O(N) regarding memory allocation.

This is in line with standard python performance recommendation: https://peps.python.org/pep-0008/#programming-recommendations

jorenham · 2026-01-26T21:59:09Z

Did you confirm that this indeed improves performance?

mohasarc · 2026-01-26T23:22:05Z

Yes, and there is a significant speedup.

I tested on macbook M2 and python 3.11.11 and I used the following code for benchmarking

import numpy as np
import sys
import time

def run_heavy_benchmark():
    # 1. Create sufficiently large string
    long_str = "a" * 10_000
    
    # 2. Create a list of these strings
    rows = 50
    cols = 50
    data = np.full((rows, cols), long_str, dtype=object)

    print(f"Array shape: {data.shape}")
    print(f"String length per cell: {len(long_str)}")
    print("-" * 40)
    print("Starting conversion (this might hang without the fix)...")

    start_time = time.time()

    try:
        result = np.array2string(data, threshold=sys.maxsize, max_line_width=sys.maxsize)
    except MemoryError:
        print("Crashed with MemoryError! (Expected on old implementation)")
        return

    end_time = time.time()
    
    print("-" * 40)
    print(f"Final String Length: {len(result):,} chars")
    print(f"Time taken: {end_time - start_time:.4f} seconds")

if __name__ == "__main__":
    run_heavy_benchmark()

The results were as follows:

50X50 array

Before: 0.1072 seconds
After: 0.0867 seconds

500X500 array

Before: 59.4363 seconds
After: 33.3848 seconds

I tested using CPython, but I'd assume the performance impact is more significant on other python implementations since string concat is already optimized in CPython according to https://peps.python.org/pep-0008/#programming-recommendations.

eendebakpt

I can confirm the performance improvement for the benchmark from the OP. Is the benchmark representative for real-world cases? (e.g. why would someone put very large strings in numpy arrays of dtype object?)

For smaller cases the performance impact of the PR is neutral. (tested with long_str = "abcde" * 40 and rows = 20; cols = 20

The implementation itself looks ok and as the OP remarked using the join is a standard approach to efficient string concatenation.

numpy/_core/arrayprint.py

mohasarc · 2026-01-28T10:45:19Z

You're right; for CPython, the impact of the optimization is negligible, especially for short strings. However, in PyPy, the impact is much higher.

I've implemented an additional change to prevent list creation overhead, which was causing performance to degrade slightly in CPython for arrays with smaller strings.

I've also extended the benchmark to include a variety of string lengths, array shapes, and both dtype possibilities (dtype=object and the default). Then, I tested with both CPython and PyPy. The results are as follows:

Using PyPy

Dtype	StrLen	Shape	Before (ms)	After (ms)
object	100	(100, 100)	28.678	22.420
object	100	(200, 200)	181.311	78.322
object	100	(400, 400)	1404.370	387.183
object	500	(100, 100)	121.665	65.792
object	500	(200, 200)	752.913	308.560
object	500	(400, 400)	5438.833	1471.256
object	1000	(100, 100)	224.758	122.421
object	1000	(200, 200)	1167.583	524.106
object	1000	(400, 400)	8660.068	2824.459
object	5000	(100, 100)	815.186	563.605
object	5000	(200, 200)	5426.055	2688.147
object	5000	(400, 400)	142674.519	19974.628
<U100	100	(100, 100)	86.409	59.029
<U100	100	(200, 200)	489.234	332.261
<U100	100	(400, 400)	1599.828	1021.941
<U500	500	(100, 100)	196.879	163.089
<U500	500	(200, 200)	949.469	655.126
<U500	500	(400, 400)	79291.969	2866.978
<U1000	1000	(100, 100)	383.444	261.022
<U1000	1000	(200, 200)	1611.339	1110.126
<U1000	1000	(400, 400)	9579.738	5269.808
<U5000	5000	(100, 100)	1429.401	1123.077
<U5000	5000	(200, 200)	7332.455	5293.393
<U5000	5000	(400, 400)	45051.155	26732.417

Using CPython

Dtype	StrLen	Shape	Before (ms)	After (ms)
object	100	(100, 100)	9.234	7.911
object	100	(200, 200)	38.667	34.075
object	100	(400, 400)	171.439	169.277
object	500	(100, 100)	20.282	18.209
object	500	(200, 200)	117.366	113.162
object	500	(400, 400)	685.050	647.188
object	1000	(100, 100)	36.138	40.654
object	1000	(200, 200)	199.724	189.959
object	1000	(400, 400)	1139.380	1114.547
object	5000	(100, 100)	162.781	147.551
object	5000	(200, 200)	867.590	878.764
object	5000	(400, 400)	6811.729	5916.547
<U100	100	(100, 100)	16.754	14.577
<U100	100	(200, 200)	71.728	61.639
<U100	100	(400, 400)	254.428	249.150
<U500	500	(100, 100)	31.052	30.705
<U500	500	(200, 200)	154.542	168.961
<U500	500	(400, 400)	785.722	793.822
<U1000	1000	(100, 100)	53.329	56.266
<U1000	1000	(200, 200)	255.221	255.550
<U1000	1000	(400, 400)	1463.481	1452.615
<U5000	5000	(100, 100)	210.411	204.498
<U5000	5000	(200, 200)	1197.290	1125.832
<U5000	5000	(400, 400)	8756.840	8180.154

We can observe that for CPython there is a slight performance improvement in most cases. On the other hand, the impact on PyPy performance is rather large.

Below you can find the exact benchmark I used:

import numpy as np
import sys
import time

def benchmark(shape, str_len, dtype, runs=5):
    """Runs np.array2string benchmark for a given configuration."""
    long_str = "a" * str_len
    data = np.full(shape, long_str, dtype=dtype)
    
    # Capture actual dtype
    actual_dtype = str(data.dtype)
    
    times = []
    for _ in range(runs):
        start = time.time()
        try:
            np.array2string(data, threshold=sys.maxsize, max_line_width=sys.maxsize)
            times.append(time.time() - start)
        except MemoryError:
            return "MemoryError", actual_dtype
    
    avg_time = sum(times) / len(times)
    return avg_time, actual_dtype

def main():
    # Configurations to test
    shapes = [(100, 100), (200, 200), (400, 400)]
    str_lens = [100, 500, 1000, 5000]
    dtypes = [object, None] # object vs auto-detected
    runs = 5

    print(f"{'Dtype':<12} | {'StrLen':<7} | {'Shape':<12} | {'Avg Time (ms)':<14}")
    print("-" * 55)

    for dtype in dtypes:
        for str_len in str_lens:
            for shape in shapes:
                avg_time, actual_dtype = benchmark(shape, str_len, dtype, runs=runs)
                
                if isinstance(avg_time, str):
                    result_str = avg_time
                else:
                    result_str = f"{avg_time * 1000:.3f}"
                
                print(f"{actual_dtype:<12} | {str_len:<7} | {str(shape):<12} | {result_str:<14}")

if __name__ == "__main__":
    main()

…tting

ngoldbaum · 2026-01-28T15:51:42Z

This seems fine to me and using list.append like this also benefits from having amortized O(1) performance on accumulating the strings, while the current approach is accidentally O(N^2), I think.

I don't think we've historically considered array printing and formatting to be performance-critical. The fact that it's implemented in Python (and C calls into Python to do this) is perhaps indicative of how we treat this.

Maybe it's worth reconsidering that and adding some string printing and formatting benchmarks? Currently there aren't any. Should probably be another PR though.

numpy/_core/arrayprint.py

ngoldbaum · 2026-01-28T15:55:03Z

Also, performance on pypy doesn't matter much: we no longer support pypy since we dropped support for Python 3.11 and there probably won't ever be a pypy 3.12.

seberg · 2026-01-29T07:35:32Z

I would suggest doing most performance comparisons with integers or float values. That is the normal thing to print out. I would not worry about printing speed for very long strings.

If that adds to much overhead, then sure, use objects arrays with strings maybe (but those strings can still be typical numerical length, like <=20 characters).

EDIT: To be clear, I don't mind refactoring this even if it is a bit slower. There was a previous PR that attempted it and didn't have any speed benefit (or almost none) while making things slightly messier. Using StringIO to build the final result may also be interesting, but IIRC it is also no faster (but I guess it might be faster with the argument that += may be optimized in CPython only).

mohasarc · 2026-01-30T07:05:23Z

I would suggest doing most performance comparisons with integers or float values. That is the normal thing to print out. I would not worry about printing speed for very long strings.

That makes sense, I'll run these benchmarks. I'm a little busy now, I'll do that in a couple of days.

ENH: Optimize string construction in array formatting

4c477f3

eendebakpt reviewed Jan 27, 2026

View reviewed changes

numpy/_core/arrayprint.py Show resolved Hide resolved

ENH: Use global list to prevent list creation overhead in array forma…

56e2fa4

…tting

mohasarc force-pushed the perf/optimize-array-formatting-strings branch from b0cb425 to 56e2fa4 Compare January 28, 2026 10:57

BUG: handle 0-d arrays

065458a

ngoldbaum reviewed Jan 28, 2026

View reviewed changes

numpy/_core/arrayprint.py Outdated Show resolved Hide resolved

MAINT: fix linter error

0ebaf47

ngoldbaum added the 01 - Enhancement label Jan 28, 2026

ngoldbaum added this to the 2.5.0 Release milestone Jan 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Perf: Optimize string construction in array formatting functions #30730

Perf: Optimize string construction in array formatting functions #30730

mohasarc commented Jan 26, 2026

Uh oh!

jorenham commented Jan 26, 2026

Uh oh!

mohasarc commented Jan 26, 2026

Uh oh!

eendebakpt left a comment

Uh oh!

Uh oh!

mohasarc commented Jan 28, 2026

Uh oh!

ngoldbaum commented Jan 28, 2026 •

edited

Loading

Uh oh!

Uh oh!

ngoldbaum commented Jan 28, 2026

Uh oh!

seberg commented Jan 29, 2026 •

edited

Loading

Uh oh!

mohasarc commented Jan 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Perf: Optimize string construction in array formatting functions #30730

Are you sure you want to change the base?

Perf: Optimize string construction in array formatting functions #30730

Conversation

mohasarc commented Jan 26, 2026

Uh oh!

jorenham commented Jan 26, 2026

Uh oh!

mohasarc commented Jan 26, 2026

Uh oh!

eendebakpt left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mohasarc commented Jan 28, 2026

Uh oh!

ngoldbaum commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ngoldbaum commented Jan 28, 2026

Uh oh!

seberg commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mohasarc commented Jan 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ngoldbaum commented Jan 28, 2026 •

edited

Loading

seberg commented Jan 29, 2026 •

edited

Loading