Fix/ncf mlperf tweaks #5334

robieta · 2018-09-19T17:14:48Z

Cleanup run script. Fix hr threshold and add ml_perf flag
Make exit behavior more stable
Add seed to reduce variation. There is still unaccounted for variation, but this at least removes some of it.

reedwm · 2018-09-19T17:29:26Z

official/recommendation/data_preprocessing.py

-  atexit.register(_shutdown, proc=proc)
-  atexit.register(tf.gfile.DeleteRecursively,
-                  ncf_dataset.cache_paths.cache_root)
+  cleanup_called = {"finished": False}


Where do you set this to True?

Oops. Good catch.

reedwm · 2018-09-19T17:30:50Z

official/recommendation/ncf_main.py

@@ -390,6 +396,8 @@ def run_ncf(_):
    if model_helpers.past_stop_threshold(FLAGS.hr_threshold, hr):
      break

+  cleanup_fn()  # Cleanup data construction artifacts and subprocess.


No need to call this, since it's registered with atexit

This comes from Shawn. He is running in a loop, and is accumulating files because the atexit hasn't triggered yet. I neglected that use case when I decided to rely on atexits.

reedwm · 2018-09-19T17:32:38Z

official/recommendation/ncf_main.py

  num_gpus = flags_core.get_num_gpus(FLAGS)
  batch_size = distribution_utils.per_device_batch_size(
      int(FLAGS.batch_size), num_gpus)
  eval_batch_size = int(FLAGS.eval_batch_size or FLAGS.batch_size)
-  ncf_dataset = data_preprocessing.instantiate_pipeline(
+  ncf_dataset, cleanup_fn = data_preprocessing.instantiate_pipeline(


Pass deterministic=FLAGS.seed is not None

Good catch. I wonder if this is the reason that I wasn't seeing deterministic behavior. (I'll rerun to test.)

robieta requested review from karmel and a team as code owners September 19, 2018 17:14

googlebot added the cla: yes label Sep 19, 2018

robieta requested review from reedwm and removed request for karmel September 19, 2018 17:14

reedwm requested changes Sep 19, 2018

View reviewed changes

robieta added the kokoro:force-run label Sep 19, 2018

kokoro-team removed the kokoro:force-run label Sep 19, 2018

Taylor Robie added 6 commits September 19, 2018 14:19

bug fixes and add seed

dcb280a

more random corrections

a6ed9f7

make cleanup more robust

325355d

return cleanup fn

1a3e623

delint and address PR comments.

bb2e9f8

delint and fix tests

11c5a43

robieta force-pushed the fix/ncf_mlperf_tweaks branch from 1b180ca to 11c5a43 Compare September 19, 2018 21:20

delinting is never done

eb84633

reedwm approved these changes Sep 19, 2018

View reviewed changes

Taylor Robie added 2 commits September 19, 2018 16:04

add pipeline hashing

c56dd5e

delint

a788c60

robieta merged commit 4dc1080 into master Sep 20, 2018

robieta deleted the fix/ncf_mlperf_tweaks branch September 20, 2018 01:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix/ncf mlperf tweaks #5334

Fix/ncf mlperf tweaks #5334

Uh oh!

robieta commented Sep 19, 2018

Uh oh!

reedwm Sep 19, 2018

Uh oh!

robieta Sep 19, 2018

Uh oh!

reedwm Sep 19, 2018

Uh oh!

robieta Sep 19, 2018

Uh oh!

reedwm Sep 19, 2018

Uh oh!

robieta Sep 19, 2018

Uh oh!

Uh oh!

Fix/ncf mlperf tweaks #5334

Fix/ncf mlperf tweaks #5334

Uh oh!

Conversation

robieta commented Sep 19, 2018

Uh oh!

reedwm Sep 19, 2018

Choose a reason for hiding this comment

Uh oh!

robieta Sep 19, 2018

Choose a reason for hiding this comment

Uh oh!

reedwm Sep 19, 2018

Choose a reason for hiding this comment

Uh oh!

robieta Sep 19, 2018

Choose a reason for hiding this comment

Uh oh!

reedwm Sep 19, 2018

Choose a reason for hiding this comment

Uh oh!

robieta Sep 19, 2018

Choose a reason for hiding this comment

Uh oh!

Uh oh!