feat: add argument seed to topaz normalize #234

swan343 · 2025-06-26T19:35:32Z

Hi!

I noticed here mentions the seed argument, which was present in v0.1.0, is no longer present in the latest version. This PR adds the seed arg back. With a specified non zero seed, I can now get the exact same output after multiple preprocess/normalize runs. Without it the outputs are not equivalent.

DarnellGranberry · 2025-06-27T15:14:58Z

Hi @swan343, it looks like that was included in a big update @tbepler made in afa606f, so let's see if that was intentional or not.

tbepler · 2025-06-30T02:52:43Z

I removed the seed argument because of multiprocessing. If the random number generator is initialized within each process, you will not get the same result when processing a list of micrographs if you change the number of processes being used or run everything in the main thread. Given this potential lack of consistency I felt it best to simply not expose this option.

swan343 · 2025-06-30T16:22:31Z

hi @tbepler thanks for the response! From my limited testing I do see the results are repeatable with multiprocessing, if the seed argument is given (np.random.RandomState should always return the same pseudo number generator no matter when and where in theory). heres a sample test code. I do notice for at least some mrcs using gpu will yield different results vs cpu, not sure if this variation also applies to different computers (perhaps that's what you meant?); but using the same computer/device the results are the same.

import hashlib
import glob
import shutil
import os

def get_md5(file_path: str) -> str:
    with open(file_path, 'rb') as f:
        md5 = hashlib.md5()
        while True:
            data = f.read(8192)
            if not data:
                break
            md5.update(data)
        return md5.hexdigest()


result = {}
os.system(' '.join([
  'topaz', 'preprocess', '--device=-1', '--seed=5424','--num-workers=12',
  '--format="png"',
  '-s', '4', '-o=/tmp/lu3435571etv8n.tmp/res', '/tmp/lu3435571etv8n.tmp/test/*.mrc'
]))
for i in sorted(glob.glob('/tmp/lu3435571etv8n.tmp/res/*.png')):
    result[i] = get_md5(i)

for i in range(10):
  print('testing iteration', i)
  shutil.rmtree('/tmp/lu3435571etv8n.tmp/res', ignore_errors=True)
  os.mkdir('/tmp/lu3435571etv8n.tmp/res')
  os.system(' '.join([
    'topaz', 'preprocess', '--device=-1', '--seed=5424',f'--num-workers={i}',
    '--format="png"',
    '-s', '4', '-o=/tmp/lu3435571etv8n.tmp/res', '/tmp/lu3435571etv8n.tmp/test/*.mrc'
]))
  for j in sorted(glob.glob('/tmp/lu3435571etv8n.tmp/res/*.png')):
    if get_md5(j) != result[j]:
      print('md5 doesnt match! @ iter',i, j, get_md5(j), result[j])
      raise Exception('md5 doesnt match!')
  print('iteration', i, 'passed')

tbepler · 2025-07-01T04:06:47Z

@swan343 my main concern is actually about consistency between runs with different numbers of workers or no workers. With the same number of workers, the results should be the same, because each process's random number generator is independently seeded with the seed. However, if you change the number of workers the results will change, because, e.g. starting 2 workers and running them in parallel from the same state is not the same as running 1 worker through all the images from that same starting state.

This is a pretty minor replicability issue, but would cause inconsistency between runs even when the seed is set which would be unexpected.

swan343 · 2025-07-01T15:45:24Z

@tbepler thanks for the reply, by workers are you referring to --num-workers or something else? I can generate the exact same output with --num-workers range from 0 to 9 from the modified script above, cpu or gpu (which overrides it to 1 anyways i believe)

tbepler · 2025-07-03T16:38:39Z

Ah, I see. You want to reset the random number generator to the same seed for every image. This means every image will start from the same random initialization. Is that what you want?

Do you really need to set the random seed for this function? How much variability are you seeing in the results? I would expect there to be a little bit, but not major qualitative differences. If there are, setting the seed will lock that one solution but I'm not sure it addresses the core problem then, which is that the solution is unstable.

I think this is fine, though. I put a few comments on your PR. We can merge it after those are fixed.

swan343 · 2025-07-03T17:06:29Z

yes that's exactly the intent. The variability isn't a lot per se - I do particle masking and can see about 2%-5% of difference depending on the detection technique and validation criteria. With a fixed seed the variation is eliminated completely. Appreciate the feedback!

feat: add seed to normalize

4335def

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add argument seed to topaz normalize #234

feat: add argument seed to topaz normalize #234

Uh oh!

swan343 commented Jun 26, 2025

Uh oh!

DarnellGranberry commented Jun 27, 2025

Uh oh!

tbepler commented Jun 30, 2025 •

edited

Loading

Uh oh!

swan343 commented Jun 30, 2025 •

edited

Loading

Uh oh!

tbepler commented Jul 1, 2025

Uh oh!

swan343 commented Jul 1, 2025

Uh oh!

tbepler commented Jul 3, 2025

Uh oh!

swan343 commented Jul 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: add argument seed to topaz normalize #234

Are you sure you want to change the base?

feat: add argument seed to topaz normalize #234

Uh oh!

Conversation

swan343 commented Jun 26, 2025

Uh oh!

DarnellGranberry commented Jun 27, 2025

Uh oh!

tbepler commented Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

swan343 commented Jun 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tbepler commented Jul 1, 2025

Uh oh!

swan343 commented Jul 1, 2025

Uh oh!

tbepler commented Jul 3, 2025

Uh oh!

swan343 commented Jul 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tbepler commented Jun 30, 2025 •

edited

Loading

swan343 commented Jun 30, 2025 •

edited

Loading