Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@raciner
Copy link

@raciner raciner commented Sep 23, 2025

to speed up computation and reduce memory consumption

What does this PR do?

Some existing code is replaced with numpy calls. This speeds up computation and reduces memory consumption.

Why was it initiated? Any relevant Issues?

The speed was slow and the memory consumption high for large input files.

PR Checklist

  • Correct base branch selected? master for new features, maintenance_... for bug fixes
  • This PR is not directly related to an existing issue (which has no PR yet).
  • All tests still pass.
  • First time contributors have added your name to CONTRIBUTORS.txt .
  • Add the yellow ready for review label when you are ready for the PR to be reviewed.

to speed up computation and reduce memory consumption
@raciner raciner added ready for review PRs that are ready to be reviewed to get marked ready to merge performance .signal issues related to our signal processing functionalities labels Sep 23, 2025
Copy link
Contributor

@ThomasLecocq ThomasLecocq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Roman,

Are those covered by tests (originally?) ?

It does indeed look very simple & efficient!

self.meta['sample_mean'] = full_samples.mean()

full_samples = np.concatenate([tr.data for tr in self.data])
self.meta['sample_median'] = np.median(full_samples)
Copy link
Contributor

@ThomasLecocq ThomasLecocq Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this and the three following two lines be replaced by np.percentile(full_samples, [25,50,75]) ? Is that faster (I would suppose it'd be computing the distribution only once?)

Copy link
Author

@raciner raciner Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmarks with an array of 100_000_000 elements suggest a speedup of a factor of 2 when using your suggestion, I would definitely go for this.

I also wonder, whether we should make use of the fact that np.mean(full_samples**2) is already calculated and can be used instead of computing np.std from scratch:

Instead of calling np.std(...),

squared = numpy.mean(full_samples**2) is stored and then 
self.meta['sample_stdev'] = np.sqrt(squared - np.mean(full_samples)**2)

as this will potentially avoid computing np.mean(full_samples**2) twice.

My benchmarks suggest that this gives a factor of ~2 as well for computing the standard deviation. I suggest to make this change as well before merging the code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree !

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I just updated the pull request. The updated code should reflect today's discussion.

@ThomasLecocq ThomasLecocq added author action needed and removed ready for review PRs that are ready to be reviewed to get marked ready to merge labels Oct 29, 2025
@ThomasLecocq ThomasLecocq changed the title replaced existing code with numpy opterations which do the same thing [QC] optimize numpy operations Oct 29, 2025
@ThomasLecocq
Copy link
Contributor

oh and for the near future, please branch & PR against master, not maintenance - we'll get rid of this branch soon & release directly from master.

@raciner
Copy link
Author

raciner commented Oct 29, 2025

Update to further improve performance as discussed.

@raciner raciner added ready for review PRs that are ready to be reviewed to get marked ready to merge and removed author action needed labels Oct 29, 2025
@ThomasLecocq
Copy link
Contributor

looks good, could you add a line in the changelog too, it's always nice to report on performance improvements when we release :-)

@raciner
Copy link
Author

raciner commented Oct 29, 2025

Done.

@ThomasLecocq ThomasLecocq merged commit 970621b into maintenance_1.4.x Oct 31, 2025
29 checks passed
@ThomasLecocq ThomasLecocq deleted the fast_mseed_metadata2 branch October 31, 2025 09:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance ready for review PRs that are ready to be reviewed to get marked ready to merge .signal issues related to our signal processing functionalities

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants