Add dataset info and hyper parameter logging for benchmark. #4152

qlzh727 · 2018-05-02T21:24:54Z

Probably will hit a merge conflict with Taylor's flag change.

karmel · 2018-05-03T18:09:31Z

official/resnet/cifar10_main.py

@@ -243,7 +243,7 @@ def main(flags_obj):
                    or input_fn)

  resnet_run_loop.resnet_main(
-      flags_obj, cifar10_model_fn, input_function,
+      flags_obj, cifar10_model_fn, input_function, 'CIFAR-10',


Maybe make this a module-level var or a class-level var? Ditto for Imagenet below.

karmel · 2018-05-03T18:10:57Z

official/resnet/resnet_run_loop.py

@@ -402,8 +404,16 @@ def resnet_main(flags_obj, model_function, input_function, shape=None):
          'dtype': flags_core.get_tf_dtype(flags_obj)
      })

+  hyperparams = {


nitting here, but this is a mix of hyperparams and run params. dtype, version, synthetic data are not what would typically be considered hyperparams.

Talked offline. I will rename the data schema into run_parameter which covers both hyper/non-hyper params. Will do this in a separate PR.

karmel · 2018-05-03T18:11:22Z

official/resnet/resnet_run_loop.py

@@ -361,6 +361,8 @@ def resnet_main(flags_obj, model_function, input_function, shape=None):
    input_function: the function that processes the dataset and returns a
      dataset that the estimator can train on. This will be wrapped with
      all the relevant flags for running and passed to estimator.
+    dataset: the name of the dataset for training and evaluation. This is used


Maybe dataset_name to be explicit? Dataset sounds like the actual data.

karmel · 2018-05-03T18:14:00Z

official/utils/logs/logger.py

    """Collect most of the TF runtime information for the local env.

    The schema of the run info follows official/benchmark/datastore/schema.

    Args:
      model_name: string, the name of the model.
+      dataset_name: string, the name of dataset for training and evaluation.
+      hyperparams: dict, the dictionary of hyper parameters for the model.


Noted above, but maybe generalize this naming, or split out hyper from run, though the latter requires some semantic parsing of the params in ways most people don't care about.

Ack, Will do this in a separate PR.

karmel · 2018-05-03T18:17:41Z

official/utils/logs/logger.py

+        str: {"name": name, "string_value": value},
+        int: {"name": name, "long_value": value},
+        bool: {"name": name, "bool_value": str(value)},
+        float: {"name": name, "float_value": value},


This seems fairly complicated, and gives us names for val that are different. Is it necessary? Why not just include the value and assume that we will worry about type elsewhere?

The type will inherited by the table, so that sql can work properly based on the value type. Currently I don't see a strong use case to do data manipulation for the hyper param values. I could also convert this to a simple key-str(value) dict in a future PR.

karmel · 2018-05-03T18:18:35Z

official/utils/logs/logger.py

+    return type_check.get(type(value),
+                          {"name": name, "string_value": str(value)})
+  if hyperparams:
+    run_info["hyperparameter"] = [


nit: regardless of name we choose, should probably be pluralized.

robieta

Just a couple minor suggestions.

robieta · 2018-05-03T18:19:05Z

official/resnet/resnet_run_loop.py

@@ -350,7 +350,7 @@ def validate_batch_size_for_multi_gpu(batch_size):
    raise ValueError(err)


-def resnet_main(flags_obj, model_function, input_function, shape=None):
+def resnet_main(flags_obj, model_function, input_function, dataset, shape=None):


nit: can we call this dataset_name for clairity?

robieta · 2018-05-03T18:21:35Z

official/resnet/resnet_run_loop.py

@@ -402,8 +404,16 @@ def resnet_main(flags_obj, model_function, input_function, shape=None):
          'dtype': flags_core.get_tf_dtype(flags_obj)
      })

+  hyperparams = {
+      'batch_size': flags_obj.batch_size,
+      'dtype': flags_obj.dtype,


I think this should be flags_core.get_tf_dtype(flags_obj).name That way we're logging the official name rather than whatever abbreviation we choose.

robieta · 2018-05-03T18:22:27Z

official/resnet/resnet_run_loop.py

+      'resnet_size': flags_obj.resnet_size,
+      'synthetic_data': flags_obj.use_synthetic_data,
+      'train_epochs': flags_obj.train_epochs,
+      'version': flags_obj.version,


Log this as resnet_version for clarity?

Sure. Done.

qlzh727 · 2018-05-03T19:16:20Z

Addressed the comment for data schema rename.

qlzh727 · 2018-05-03T20:59:09Z

Schema updated for the bigquery table, will rerun the backfill for the all the existing data.

karmel · 2018-05-03T20:59:56Z

official/utils/logs/logger.py

+        str: {"name": name, "string_value": value},
+        int: {"name": name, "long_value": value},
+        bool: {"name": name, "bool_value": str(value)},
+        float: {"name": name, "float_value": value},


* Add dataset info and hyper parameter logging for benchmark. * Address review comments. * Address the view comment for data schema name. * Fix test cases. * Lint fix.

…ow#4152) * Add dataset info and hyper parameter logging for benchmark. * Address review comments. * Address the view comment for data schema name. * Fix test cases. * Lint fix.

Add dataset info and hyper parameter logging for benchmark.

e1e3a67

qlzh727 requested a review from karmel May 2, 2018 21:24

qlzh727 requested a review from a team as a code owner May 2, 2018 21:24

googlebot added the cla: yes label May 2, 2018

Merge branch 'master' into hyperparam-update

a226581

qlzh727 requested a review from robieta May 3, 2018 18:13

karmel suggested changes May 3, 2018

View reviewed changes

robieta suggested changes May 3, 2018

View reviewed changes

qlzh727 added 2 commits May 3, 2018 11:54

Address review comments.

4e586b9

Address the view comment for data schema name.

830f5f4

qlzh727 added 2 commits May 3, 2018 13:48

Fix test cases.

08e1220

Lint fix.

15e8d14

karmel approved these changes May 3, 2018

View reviewed changes

robieta approved these changes May 3, 2018

View reviewed changes

qlzh727 merged commit eb0c0df into tensorflow:master May 3, 2018

qlzh727 deleted the hyperparam-update branch May 3, 2018 21:19

Add dataset info and hyper parameter logging for benchmark. #4152

Add dataset info and hyper parameter logging for benchmark. #4152

Uh oh!

Conversation

qlzh727 commented May 2, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

robieta left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qlzh727 commented May 3, 2018

Uh oh!

qlzh727 commented May 3, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!