Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

stvoutsin
Copy link
Contributor

@stvoutsin stvoutsin commented Jul 22, 2025

Description

This pull request attempts to improve the performance of VOTable binary parsing by implementing optimized Cython converters.

Initial benchmarks have shown performance improvements of the order:

  • 53% speedup on numeric-heavy tables
  • 43-47% speedup on mixed data type tables
  • 30-35% speedup on string-heavy tables
  • Consistent improvements across BINARY and BINARY2 formats

Problem:
We’re seeing relatively slow performance when parsing large VOTables using the BINARY2 serialization format.
Profiling results from py-spy show a significant portion of time is spent in astropy.io.votable.converters.binparse. Here’s an excerpt of the profiler output:

py-spy top -- python test.py
Collecting samples from 'python test.py' (python v3.12.7)
Total Samples 1600
GIL: 100.00%, Active: 100.00%, Threads: 1
  %Own   %Total  OwnTime  TotalTime  Function (filename)                                                                                                                                                           
 54.00%  57.00%    7.69s     8.26s   binparse (astropy/io/votable/converters.py)
 24.00% 100.00%    3.68s    14.66s   parsebinary (astropy/io/votable/tree.py)
  4.00%   4.00%   0.940s    0.940s   bitarray_to_bool (astropy/io/votable/converters.py)
  8.00%   8.00%   0.890s    0.910s   setitem (numpy/ma/core.py)
  0.00%   0.00%   0.470s    0.670s   getbinary_data_stream (astropy/io/votable/tree.py)
  2.00%   2.00%   0.450s    0.450s   careful_read (astropy/io/votable/tree.py)
  0.00%   0.00%   0.430s    0.430s   new (numpy/_core/records.py)
  

As shown, the hot path is dominated by the binparse function in converters.py.

Solution:

  • Added "Fast" Cython converters for all numeric types (double, float, int, long, short, unsignedByte, boolean, bit)

Compatibility:
Should be 100% backward compatible since the API has not been modified

Testing:
Added comprehensive benchmarks to astropy-benchmarks showing consistent 30-50% performance improvements across different table types and data patterns. astropy/astropy-benchmarks#141

Astropy:main

[ 0.00%] · For astropy commit 8599d499 <main>:
[ 0.00%] ·· Benchmarking virtualenv-py3.11
[ 3.85%] ··· votable.TimeVOTableBooleanFields.time_booleans_binary                                                                                                                                         2.19±0s
[ 7.69%] ··· votable.TimeVOTableBooleanFields.time_booleans_binary2                                                                                                                                        3.16±0s
[11.54%] ··· votable.TimeVOTableLongStrings.time_long_strings_binary                                                                                                                                       2.45±0s
[15.38%] ··· votable.TimeVOTableLongStrings.time_long_strings_binary2                                                                                                                                      3.31±0s
[19.23%] ··· votable.TimeVOTableMixed.time_mixed_binary                                                                                                                                                    4.37±0s
[23.08%] ··· votable.TimeVOTableMixed.time_mixed_binary2                                                                                                                                                   5.77±0s
[26.92%] ··· votable.TimeVOTableNumeric.time_numeric_binary                                                                                                                                                3.37±0s
[30.77%] ··· votable.TimeVOTableNumeric.time_numeric_binary2                                                                                                                                               4.24±0s
[34.62%] ··· votable.TimeVOTableShortStrings.time_short_strings_binary                                                                                                                                     2.98±0s
[38.46%] ··· votable.TimeVOTableShortStrings.time_short_strings_binary2                                                                                                                                    3.82±0s
[42.31%] ··· votable.TimeVOTableSmallOverhead.time_small_binary                                                                                                                                           10.5±0ms
[46.15%] ··· votable.TimeVOTableSmallOverhead.time_small_binary2                                                                                                                                          13.9±0ms
[50.00%] ··· votable.TimeVOTableStringIntensive.time_string_intensive_binary2                                                                                                                              4.42±0s

u/stvoutsin/binary2-cython

[ 0.00%] · For astropy commit 82cd430e <u/stvoutsin/binary2-cython>:
[ 0.00%] ·· Benchmarking virtualenv-py3.11
[ 3.85%] ··· votable.TimeVOTableBooleanFields.time_booleans_binary                                                                                                                                         1.82±0s
[ 7.69%] ··· votable.TimeVOTableBooleanFields.time_booleans_binary2                                                                                                                                        2.38±0s
[11.54%] ··· votable.TimeVOTableLongStrings.time_long_strings_binary                                                                                                                                       1.63±0s
[15.38%] ··· votable.TimeVOTableLongStrings.time_long_strings_binary2                                                                                                                                      2.20±0s
[19.23%] ··· votable.TimeVOTableMixed.time_mixed_binary                                                                                                                                                    2.51±0s
[23.08%] ··· votable.TimeVOTableMixed.time_mixed_binary2                                                                                                                                                   3.05±0s
[26.92%] ··· votable.TimeVOTableNumeric.time_numeric_binary                                                                                                                                                1.57±0s
[30.77%] ··· votable.TimeVOTableNumeric.time_numeric_binary2                                                                                                                                               2.01±0s
[34.62%] ··· votable.TimeVOTableShortStrings.time_short_strings_binary                                                                                                                                     1.96±0s
[38.46%] ··· votable.TimeVOTableShortStrings.time_short_strings_binary2                                                                                                                                    2.45±0s
[42.31%] ··· votable.TimeVOTableSmallOverhead.time_small_binary                                                                                                                                           6.12±0ms
[46.15%] ··· votable.TimeVOTableSmallOverhead.time_small_binary2                                                                                                                                          8.35±0ms
[50.00%] ··· votable.TimeVOTableStringIntensive.time_string_intensive_binary2                                                                                                                              3.26±0s

For better visualization:

Benchmark Before After Improvement
Numeric (BINARY) 3.37s 1.57s 53% faster
Numeric (BINARY2) 4.24s 2.01s 53% faster
Mixed (BINARY) 4.37s 2.51s 43% faster
Mixed (BINARY2) 5.77s 3.05s 47% faster
Long Strings (BINARY) 2.45s 1.63s 33% faster
Long Strings (BINARY2) 3.31s 2.20s 33% faster
Short Strings (BINARY) 2.98s 1.96s 34% faster
Short Strings (BINARY2) 3.82s 2.45s 36% faster
Boolean Fields (BINARY) 2.19s 1.82s 17% faster
Boolean Fields (BINARY2) 3.16s 2.38s 25% faster
Small Overhead (BINARY) 10.5ms 6.12ms 42% faster
Small Overhead (BINARY2) 13.9ms 8.35ms 40% faster
String Intensive (BINARY2) 4.42s 3.26s 26% faster

I've also tested parsing outside of the astropy-benchmark PR with a VOTable with 50000 rows and aprox. 1000 columns (~50 million cells).
With this table the parsing took 30 seconds, down from 65 seconds in the version currently on main.

Fixes #18442

Any thoughts on this approach? Are there alternatives that I haven't considered? Perhaps someone with more Cython experience can let me know if I've made any obvious mistakes or if there are better way of doing any of this.

  • By checking this box, the PR author has requested that maintainers do NOT use the "Squash and Merge" button. Maintainers should respect this when possible; however, the final decision is at the discretion of the maintainer that merges the PR.

Copy link
Contributor

Thank you for your contribution to Astropy! 🌌 This checklist is meant to remind the package maintainers who will review this pull request of some common things to look for.

  • Do the proposed changes actually accomplish desired goals?
  • Do the proposed changes follow the Astropy coding guidelines?
  • Are tests added/updated as required? If so, do they follow the Astropy testing guidelines?
  • Are docs added/updated as required? If so, do they follow the Astropy documentation guidelines?
  • Is rebase and/or squash necessary? If so, please provide the author with appropriate instructions. Also see instructions for rebase and squash.
  • Did the CI pass? If no, are the failures related? If you need to run daily and weekly cron jobs as part of the PR, please apply the "Extra CI" label. Codestyle issues can be fixed by the bot.
  • Is a change log needed? If yes, did the change log check pass? If no, add the "no-changelog-entry-needed" label. If this is a manual backport, use the "skip-changelog-checks" label unless special changelog handling is necessary.
  • Is this a big PR that makes a "What's new?" entry worthwhile and if so, is (1) a "what's new" entry included in this PR and (2) the "whatsnew-needed" label applied?
  • At the time of adding the milestone, if the milestone set requires a backport to release branch(es), apply the appropriate "backport-X.Y.x" label(s) before merge.

@stvoutsin stvoutsin force-pushed the u/stvoutsin/binary2-cython branch from 82cd430 to 1dfea18 Compare July 22, 2025 23:38
@pllim pllim added this to the v7.2.0 milestone Jul 23, 2025
@@ -842,6 +828,12 @@ class Double(FloatingPoint):

format = "f8"

def binparse(self, read):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems a little repetitive across the board. Looks like the only difference is a number. Can this method be inherited but access some self._expected_len (name negotiable) that is set by the subclass?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great suggestion, looking back at it now I can see there is a lot of duplication. I've moved things up to the parent Numeric class, let me know if that looks good. (Or if you see any adjustments needing to be made)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we have the benchmark down, should maybe put some ballpark values here.

@pllim pllim added Performance Extra CI Run cron CI as part of PR benchmark Run benchmarks for a PR labels Jul 23, 2025
@pllim
Copy link
Member

pllim commented Jul 23, 2025

Hmm looks like @eerovaher proposed an alternative at #18455 . Are you interested to have a look, @stvoutsin ? Thanks, all!

@pllim
Copy link
Member

pllim commented Jul 24, 2025

astropy/astropy-benchmarks#141 is merged so theoretically next push would trigger a new benchmark run with the new relevant benchmarks. 🤞

@stvoutsin stvoutsin marked this pull request as draft July 24, 2025 19:38
@stvoutsin stvoutsin force-pushed the u/stvoutsin/binary2-cython branch from f6f4267 to 11d9403 Compare July 24, 2025 23:19
@stvoutsin stvoutsin force-pushed the u/stvoutsin/binary2-cython branch from ba78237 to 1babe2e Compare July 24, 2025 23:47
@stvoutsin stvoutsin marked this pull request as ready for review July 25, 2025 00:20
@pllim
Copy link
Member

pllim commented Jul 25, 2025

Hmm benchmark job says nothing has "significantly changed". Is this expected? I thought it would say "this and that is much faster" or something.

[98.70%] ··· ...imeVOTableBooleanFields.time_booleans_binary            2.31±0s
[98.76%] ··· ...meVOTableBooleanFields.time_booleans_binary2            3.29±0s
[98.82%] ··· ...eVOTableLongStrings.time_long_strings_binary         2.35±0.01s
[98.88%] ··· ...VOTableLongStrings.time_long_strings_binary2         3.19±0.02s
[98.95%] ··· votable.TimeVOTableMixed.time_mixed_binary              4.53±0.01s
[99.01%] ··· votable.TimeVOTableMixed.time_mixed_binary2             5.84±0.02s
[99.07%] ··· votable.TimeVOTableNumeric.time_numeric_binary          3.61±0.02s
[99.13%] ··· votable.TimeVOTableNumeric.time_numeric_binary2         4.46±0.01s
[99.19%] ··· ...OTableShortStrings.time_short_strings_binary         2.91±0.02s
[99.26%] ··· ...TableShortStrings.time_short_strings_binary2            3.78±0s
[99.32%] ··· ...e.TimeVOTableSmallOverhead.time_small_binary        10.6±0.07ms
[99.38%] ··· ....TimeVOTableSmallOverhead.time_small_binary2         13.8±0.1ms
[99.44%] ··· ...tringIntensive.time_string_intensive_binary2         4.41±0.03s

@stvoutsin stvoutsin marked this pull request as draft July 25, 2025 13:09
@stvoutsin
Copy link
Contributor Author

significantly changed"

Apologies I think i introduced a regression while trying to fix an issue that caused failed tests in the CI.
I've changed this approach so let's see if this run gives better results.
At least locally, I get:

[ 3.85%] ··· votable.TimeVOTableBooleanFields.time_booleans_binary                                                                                                                                         1.40±0s
[ 7.69%] ··· votable.TimeVOTableBooleanFields.time_booleans_binary2                                                                                                                                        1.94±0s
[11.54%] ··· votable.TimeVOTableLongStrings.time_long_strings_binary                                                                                                                                       1.26±0s
[15.38%] ··· votable.TimeVOTableLongStrings.time_long_strings_binary2                                                                                                                                      1.75±0s
[19.23%] ··· votable.TimeVOTableMixed.time_mixed_binary                                                                                                                                                    2.01±0s
[23.08%] ··· votable.TimeVOTableMixed.time_mixed_binary2                                                                                                                                                   2.60±0s
[26.92%] ··· votable.TimeVOTableNumeric.time_numeric_binary                                                                                                                                                1.27±0s
[30.77%] ··· votable.TimeVOTableNumeric.time_numeric_binary2                                                                                                                                               1.80±0s
[34.62%] ··· votable.TimeVOTableShortStrings.time_short_strings_binary                                                                                                                                     1.53±0s
[38.46%] ··· votable.TimeVOTableShortStrings.time_short_strings_binary2                                                                                                                                    2.07±0s
[42.31%] ··· votable.TimeVOTableSmallOverhead.time_small_binary                                                                                                                                           5.33±0ms
[46.15%] ··· votable.TimeVOTableSmallOverhead.time_small_binary2                                                                                                                                          7.81±0ms
[50.00%] ··· votable.TimeVOTableStringIntensive.time_string_intensive_binary2                                                                                                                              2.80±0s

I will also test with increasing the row sizes for the sample data to see how the runtime difference between main and this branch scales.

@stvoutsin
Copy link
Contributor Author

I've run the astropy-benchmarks locally with 200k, 500k and 1M sample row sizes (https://github.com/astropy/astropy-benchmarks/blob/main/benchmarks/votable.py#L13). Results shown here:

u/stvoutsin/binary2-cython vs main branch performance comparison

Test Row Count Cython Main Improvement
BooleanFields.time_booleans_binary 200k 1.40s 1.66s 15.7%
BooleanFields.time_booleans_binary 800k 5.71s 6.75s 15.4%
BooleanFields.time_booleans_binary 1M 7.52s 8.38s 10.3%
BooleanFields.time_booleans_binary2 200k 1.94s 2.43s 20.2%
BooleanFields.time_booleans_binary2 800k 7.94s 9.37s 15.3%
BooleanFields.time_booleans_binary2 1M 10.00s 11.70s 14.5%
LongStrings.time_long_strings_binary 200k 1.26s 1.93s 34.7%
LongStrings.time_long_strings_binary 800k 5.08s 7.72s 34.2%
LongStrings.time_long_strings_binary 1M 6.47s 9.67s 33.1%
LongStrings.time_long_strings_binary2 200k 1.75s 2.49s 29.7%
LongStrings.time_long_strings_binary2 800k 7.25s 10.10s 28.2%
LongStrings.time_long_strings_binary2 1M 9.03s 13.00s 30.5%
Mixed.time_mixed_binary 200k 2.01s 3.53s 43.1%
Mixed.time_mixed_binary 800k 8.52s 14.30s 40.4%
Mixed.time_mixed_binary 1M 10.80s 18.40s 41.3%
Mixed.time_mixed_binary2 200k 2.60s 4.69s 44.6%
Mixed.time_mixed_binary2 800k 11.50s 18.50s 37.8%
Mixed.time_mixed_binary2 1M 14.00s 22.60s 38.1%
Numeric.time_numeric_binary 200k 1.27s 2.98s 57.4%
Numeric.time_numeric_binary 800k 5.26s 11.80s 55.4%
Numeric.time_numeric_binary 1M 6.86s 14.80s 53.6%
Numeric.time_numeric_binary2 200k 1.80s 3.49s 48.4%
Numeric.time_numeric_binary2 800k 7.43s 14.20s 47.7%
Numeric.time_numeric_binary2 1M 9.45s 18.00s 47.5%
ShortStrings.time_short_strings_binary 200k 1.53s 2.29s 33.2%
ShortStrings.time_short_strings_binary 800k 6.59s 9.17s 28.1%
ShortStrings.time_short_strings_binary 1M 8.56s 11.30s 24.2%
ShortStrings.time_short_strings_binary2 200k 2.07s 3.05s 32.1%
ShortStrings.time_short_strings_binary2 800k 8.77s 12.10s 27.5%
ShortStrings.time_short_strings_binary2 1M 11.20s 14.60s 23.3%
SmallOverhead.time_small_binary 200k 5.3ms 9.4ms 43.1%
SmallOverhead.time_small_binary 800k 5.5ms 9.2ms 40.0%
SmallOverhead.time_small_binary 1M 5.5ms 9.0ms 38.8%
SmallOverhead.time_small_binary2 200k 7.8ms 11.4ms 31.5%
SmallOverhead.time_small_binary2 800k 8.1ms 11.5ms 29.2%
SmallOverhead.time_small_binary2 1M 7.8ms 11.5ms 32.1%
StringIntensive.time_string_intensive_binary2 200k 2.80s 3.71s 24.5%
StringIntensive.time_string_intensive_binary2 800k 12.50s 14.10s 11.3%
StringIntensive.time_string_intensive_binary2 1M 16.20s 18.20s 11.0%

@stvoutsin stvoutsin marked this pull request as ready for review July 26, 2025 10:48
@stvoutsin stvoutsin force-pushed the u/stvoutsin/binary2-cython branch from 350e752 to 96b3904 Compare July 26, 2025 10:52
@pllim
Copy link
Member

pllim commented Jul 26, 2025

Benchmark run looking good. Thanks!

Change Before [db7a3e2] After [30bfc4a] Ratio Benchmark (Parameter)
- 2.82±0s 2.00±0s 0.71 votable.TimeVOTableShortStrings.time_short_strings_binary
- 3.09±0s 2.03±0.01s 0.66 votable.TimeVOTableLongStrings.time_long_strings_binary2
- 13.6±0.09ms 8.81±0.04ms 0.65 votable.TimeVOTableSmallOverhead.time_small_binary2
- 10.4±0.1ms 6.27±0.03ms 0.61 votable.TimeVOTableSmallOverhead.time_small_binary
- 5.60±0s 3.35±0.01s 0.6 votable.TimeVOTableMixed.time_mixed_binary2
- 4.43±0.01s 2.26±0s 0.51 votable.TimeVOTableNumeric.time_numeric_binary2
- 3.51±0.01s 1.69±0s 0.48 votable.TimeVOTableNumeric.time_numeric_binary

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
benchmark Run benchmarks for a PR Extra CI Run cron CI as part of PR io.votable Performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Performance bottleneck in VOTable parsing with BINARY2 serialization (converters.py)
2 participants