[QC] optimize numpy operations #3621

raciner · 2025-09-23T11:22:55Z

to speed up computation and reduce memory consumption

What does this PR do?

Some existing code is replaced with numpy calls. This speeds up computation and reduces memory consumption.

Why was it initiated? Any relevant Issues?

The speed was slow and the memory consumption high for large input files.

PR Checklist

Correct base branch selected? master for new features, maintenance_... for bug fixes
This PR is not directly related to an existing issue (which has no PR yet).
All tests still pass.
First time contributors have added your name to CONTRIBUTORS.txt .
Add the yellow ready for review label when you are ready for the PR to be reviewed.

to speed up computation and reduce memory consumption

ThomasLecocq

Hi Roman,

Are those covered by tests (originally?) ?

It does indeed look very simple & efficient!

ThomasLecocq · 2025-10-29T09:19:36Z

obspy/signal/quality_control.py

+        self.meta['sample_mean'] = full_samples.mean()

-        full_samples = np.concatenate([tr.data for tr in self.data])
        self.meta['sample_median'] = np.median(full_samples)


could this and the three following two lines be replaced by np.percentile(full_samples, [25,50,75]) ? Is that faster (I would suppose it'd be computing the distribution only once?)

Benchmarks with an array of 100_000_000 elements suggest a speedup of a factor of 2 when using your suggestion, I would definitely go for this.

I also wonder, whether we should make use of the fact that np.mean(full_samples**2) is already calculated and can be used instead of computing np.std from scratch:

Instead of calling np.std(...),

squared = numpy.mean(full_samples**2) is stored and then self.meta['sample_stdev'] = np.sqrt(squared - np.mean(full_samples)**2)

as this will potentially avoid computing np.mean(full_samples**2) twice.

My benchmarks suggest that this gives a factor of ~2 as well for computing the standard deviation. I suggest to make this change as well before merging the code.

Ok, I just updated the pull request. The updated code should reflect today's discussion.

obspy/signal/quality_control.py

ThomasLecocq · 2025-10-29T09:57:51Z

oh and for the near future, please branch & PR against master, not maintenance - we'll get rid of this branch soon & release directly from master.

raciner · 2025-10-29T13:57:37Z

Update to further improve performance as discussed.

ThomasLecocq · 2025-10-29T14:45:45Z

looks good, could you add a line in the changelog too, it's always nice to report on performance improvements when we release :-)

raciner · 2025-10-29T21:25:58Z

Done.

replaced existing code with numpy opterations which do the same thing

a45af59

to speed up computation and reduce memory consumption

raciner added ready for review PRs that are ready to be reviewed to get marked ready to merge performance .signal issues related to our signal processing functionalities labels Sep 23, 2025

ThomasLecocq reviewed Oct 29, 2025

View reviewed changes

ThomasLecocq added author action needed and removed ready for review PRs that are ready to be reviewed to get marked ready to merge labels Oct 29, 2025

ThomasLecocq changed the title ~~replaced existing code with numpy opterations which do the same thing~~ [QC] optimize numpy operations Oct 29, 2025

flake8 compliance

a9ca1fe

raciner added ready for review PRs that are ready to be reviewed to get marked ready to merge and removed author action needed labels Oct 29, 2025

ThomasLecocq approved these changes Oct 29, 2025

View reviewed changes

described changes made

ea5fe1b

QuLogic approved these changes Oct 30, 2025

View reviewed changes

ThomasLecocq merged commit 970621b into maintenance_1.4.x Oct 31, 2025
29 checks passed

ThomasLecocq deleted the fast_mseed_metadata2 branch October 31, 2025 09:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[QC] optimize numpy operations #3621

[QC] optimize numpy operations #3621

raciner commented Sep 23, 2025 •

edited

Loading

Uh oh!

ThomasLecocq left a comment

Uh oh!

ThomasLecocq Oct 29, 2025 •

edited

Loading

Uh oh!

raciner Oct 29, 2025 •

edited

Loading

Uh oh!

ThomasLecocq Oct 29, 2025

Uh oh!

raciner Oct 29, 2025

Uh oh!

Uh oh!

ThomasLecocq commented Oct 29, 2025

Uh oh!

raciner commented Oct 29, 2025

Uh oh!

ThomasLecocq commented Oct 29, 2025

Uh oh!

raciner commented Oct 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[QC] optimize numpy operations #3621

[QC] optimize numpy operations #3621

Conversation

raciner commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Why was it initiated? Any relevant Issues?

PR Checklist

Uh oh!

ThomasLecocq left a comment

Choose a reason for hiding this comment

Uh oh!

ThomasLecocq Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

raciner Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ThomasLecocq Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

raciner Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ThomasLecocq commented Oct 29, 2025

Uh oh!

raciner commented Oct 29, 2025

Uh oh!

ThomasLecocq commented Oct 29, 2025

Uh oh!

raciner commented Oct 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

raciner commented Sep 23, 2025 •

edited

Loading

ThomasLecocq Oct 29, 2025 •

edited

Loading

raciner Oct 29, 2025 •

edited

Loading