Add benchmark logger that does stream upload to bigquery. #4210

qlzh727 · 2018-05-09T00:10:22Z

All three type of benchmark loggers are behind the control flag:
benchmark_logger_type.
The benchmark_uploader is spilt into lib and main module.
Added test cases with mock for bigquery related classes.
The git mv does not seems to preserve the file history, even I try to just move the file to new location in the first commit.

qlzh727 · 2018-05-09T00:12:26Z

FYI, if compare the first commit and second one, you will be able to see the diff between for the moved file.

Skip the import of benchmark_uploader when bigquery is not installed.

robieta · 2018-05-09T15:56:01Z

I think a README would be quite helpful here. It's not clear to me how all of this fits together, or why multiple flavors of uploaders are necessary in the first place.

qlzh727 · 2018-05-09T15:57:13Z

Sure. Will do in a following commit.

qlzh727 · 2018-05-09T17:22:11Z

Actually, while thinking about readme, you are totally right about "why" multiple type of uploader is needed. Let me combine them into one.

karmel

We'll do it live!

karmel · 2018-05-10T03:03:12Z

official/benchmark/benchmark_uploader.py

@@ -72,19 +61,17 @@ def upload_benchmark_run(self, dataset_name, table_name, run_id):
        the data will be uploaded.
      run_id: string, a unique ID that will be attached to the data, usually
        this is a UUID4 format.
+      run_json: dict, the JSON data that contains the benchmark run info.


The naming of this is confusing-- is it JSON or a python dict? If the latter, why is it called json?

In python, there isn't a specific type of JSON. python uses dict to represent JSON data, and of course it has some restriction about what's the type of the key and value in the dict.

I think we should absolutely not refer to dicts as JSON. Some dicts can be converted to JSON, but I tend to think that json in the variable names implies that the variable refers to the serialized string.

karmel · 2018-05-10T03:04:39Z

official/benchmark/benchmark_uploader.py

-
-  def upload_metric(self, dataset_name, table_name, run_id):
+    run_json["model_id"] = run_id
+    table_ref = self._bq_client.dataset(dataset_name).table(table_name)


Comment me-- what is happening here?

This is finding a uniq table reference based on dataset and table name, so that we can insert the data to it in the next step.

karmel · 2018-05-10T03:07:02Z

official/benchmark/benchmark_uploader.py

+    errors = self._bq_client.insert_rows_json(table_ref, metric_json_list)
+    if errors:
+      tf.logging.error(
+          "Failed to upload benchmark info to bigquery: {}".format(errors))


This is the same as above; why not just have the above method call this method with the wrapped json object?

Done, extract the common piece into function.

karmel · 2018-05-10T03:10:34Z

official/benchmark/benchmark_uploader.py

-    expected_file = os.path.join(
-        self._logging_dir, logger.METRIC_LOG_FILE_NAME)
-    with tf.gfile.GFile(expected_file) as f:
+    with tf.gfile.GFile(metric_json_file) as f:
      lines = f.readlines()


It's easy to imagine this consuming all of the memory for a really big file. Why not do this as a generator? (That is, for line in f.read(), or whatever the correct syntax is, rather than reading in the whole file upfront.)

The side effect of that is we will have one API call per line. I could implement some buffering and limit the max number of lines we load for each upload, if this is what u want. At the moment, we don't see a memory issue since usually we only have O(100) lines of metric record for small model. For big model like imagenet, it will be covered by the streaming upload.

But isn't this exactly the same as:

for line in f: if line.strip(): metrics.append(json.loads(line))

Except that readlines requires all lines read into memory, and the above creates a generator that reads line by line?

Done. Thanks for the suggestion.

karmel · 2018-05-10T03:17:14Z

official/utils/flags/_benchmark.py

+      name="benchmark_logger_type", default="BaseBenchmarkLogger",
+      enum_values=["BaseBenchmarkLogger", "BenchmarkFileLogger",
+                   "BenchmarkBigQueryLogger"],
+      help=help_wrap("The type of benchmark logger to use. Default to use "


nit: Defaults to using

karmel · 2018-05-10T03:17:41Z

official/utils/flags/_benchmark.py

+                   "BenchmarkBigQueryLogger"],
+      help=help_wrap("The type of benchmark logger to use. Default to use "
+                     "BaseBenchmarkLogger which logs to STDOUT. Different "
+                     "loggers will require other flags to be able to work."))


Can we add some fancy validators for those flags?

I see that I'm winning hearts and minds.

Done. Added validation for flag combination.

karmel · 2018-05-10T03:19:05Z

official/utils/logs/logger.py

      _benchmark_logger = BaseBenchmarkLogger()
+    elif flag_obj.benchmark_logger_type == 'BenchmarkFileLogger':


Do we still need the filelogger as an option? If we are not going to use it, why complicate this with a third option that we have to maintain? That seems like it would allow us to strip half the code here and above.

yes. The file logger is still useful for small model so that we can save the bigquery upload quota. The streaming one will be useful for those long running benchmark.

Why not always stream?

Bigquery API support to have a usage quota, even we currently have small load, I think its still worthwhile to not spam the service. For small dataset like cifar10 which take 40 mins to finish, I would still prefer to upload the data after the run.

karmel · 2018-05-10T03:23:10Z

official/utils/logs/logger.py

+    if not isinstance(value, numbers.Number):
+      tf.logging.warning(
+          "Metric value to log should be a number. Got %s", type(value))
+      return


If we need to maintain the File version, I would rather not have this check in multiple places-- extract to function?

karmel · 2018-05-10T03:24:24Z

official/utils/logs/logger.py

+        "global_step": global_step,
+        "timestamp": datetime.datetime.utcnow().strftime(
+            _DATE_TIME_FORMAT_PATTERN),
+        "extras": extras}]


Or, better still, looks like this is all shared code; extract or make these inherit from a shared object, rather than repeat.

karmel · 2018-05-10T03:54:22Z

Also, just to confirm: this should be non-blocking, correct? That is, it shouldn't affect the speed of training at all?

qlzh727 · 2018-05-10T18:51:41Z

I think the hooks are executed on a separate thread during training time, if I understand it correctly, so the training performance shouldn't be affected. During eval stage, this will add some overhead because of the bigquery upload for the eval result.

qlzh727 · 2018-05-10T20:50:45Z

And I was lying about the hook behavior. It seems that the hook does block on the after_run method, so computation running in hook will have a potential performance impact.

Let me update the metric logging to be execute on a different thread.

qlzh727 · 2018-05-10T21:37:03Z

Updated to use new thread for bigquery upload.

karmel · 2018-05-11T01:00:01Z

official/benchmark/benchmark_uploader.py

-    expected_file = os.path.join(
-        self._logging_dir, logger.METRIC_LOG_FILE_NAME)
-    with tf.gfile.GFile(expected_file) as f:
+    with tf.gfile.GFile(metric_json_file) as f:
      lines = f.readlines()


But isn't this exactly the same as:

for line in f: if line.strip(): metrics.append(json.loads(line))

Except that readlines requires all lines read into memory, and the above creates a generator that reads line by line?

karmel · 2018-05-11T01:01:51Z

official/utils/logs/logger.py

      _benchmark_logger = BaseBenchmarkLogger()
+    elif flag_obj.benchmark_logger_type == 'BenchmarkFileLogger':


Why not always stream?

karmel · 2018-05-11T01:12:54Z

official/utils/logs/logger.py

+    run_info = _gather_run_info(model_name, dataset_name, run_params)
+    # Starting new thread for bigquery upload in case it might take long time
+    # and impact the benchmark and performance measurement.
+    thread.start_new_thread(


Q1: If we spawn a thread to do this, what happens to errors raised? They will not interrupt the run, correct? What will happen in that scenario?
Q2: Under normal circumstances, we allow up to num_cpus for input processing, whereas this will require one separate CPU thread. In practice, it's really unlikely that this will matter, but it's important to be aware of possible heisenbug situations, where we are somehow slowing down runs by trying to observe them, especially when we get to CPU benchmarks. Thoughts? I guess we can compare the speed of non-streaming to streaming runs periodically to check for such an effect, but can this be done in a more automated way?

We currently log the error and does not stop the training process.

CPU benchmark might be harder since all the process running on the machine will have the potential impact of the performance. We can setup a separate run for non-streaming benchmark when we start testing on CPU.

Okay-- let's just make sure to record here in the comments that users should be aware of this potential performance implication, especially for CPU training, and run tests accordingly.

Adding comment for potential performance impact for model on CPU.

karmel

Danke

The test is broken when the benchmark_logger_type is set first, and validated when the benchmark_log_dir is not set yet.

…#4210) * Move the benchmark_uploader to new location. * Update benchmark logger to streaming upload. * Fix lint and unit test error. * delint. * Update the benchmark uploader test. Skip the import of benchmark_uploader when bigquery is not installed. * Merge the 2 classes of benchmark uploader into 1. * Address review comments. * delint. * Execute bigquery upload in a separate thread. * Change to use python six.moves for importing. * Address review comments and delint. * Address review comment. Adding comment for potential performance impact for model on CPU. * Fix random failure on py3. * Fix the order of flag saver to avoid the randomness. The test is broken when the benchmark_logger_type is set first, and validated when the benchmark_log_dir is not set yet.

qlzh727 added 2 commits May 8, 2018 17:05

Move the benchmark_uploader to new location.

1bb15a8

Update benchmark logger to streaming upload.

be4198c

qlzh727 requested review from karmel and a team as code owners May 9, 2018 00:10

googlebot added the cla: yes label May 9, 2018

qlzh727 mentioned this pull request May 9, 2018

Add benchmark logger that does stream upload to bigquery. #4209

Closed

qlzh727 added 3 commits May 8, 2018 19:48

Fix lint and unit test error.

5eb3ce7

delint.

ce01d0c

Update the benchmark uploader test.

14b4ba2

Skip the import of benchmark_uploader when bigquery is not installed.

Merge the 2 classes of benchmark uploader into 1.

4f178ae

karmel suggested changes May 10, 2018

View reviewed changes

qlzh727 added 2 commits May 10, 2018 11:42

Address review comments.

6f96a76

delint.

6e4d7de

Execute bigquery upload in a separate thread.

7ec4708

Change to use python six.moves for importing.

2d89263

karmel reviewed May 11, 2018

View reviewed changes

Address review comments and delint.

2f2cade

qlzh727 added the kokoro:force-run label May 11, 2018

kokoro-team removed the kokoro:force-run label May 11, 2018

Address review comment.

b5be15a

Adding comment for potential performance impact for model on CPU.

karmel approved these changes May 11, 2018

View reviewed changes

Fix random failure on py3.

d4f4038

Fix the order of flag saver to avoid the randomness.

bc3fb26

The test is broken when the benchmark_logger_type is set first, and validated when the benchmark_log_dir is not set yet.

qlzh727 merged commit 0270cac into tensorflow:master May 11, 2018

qlzh727 deleted the pr-fix branch May 11, 2018 22:07

		_benchmark_logger = BaseBenchmarkLogger()
		elif flag_obj.benchmark_logger_type == 'BenchmarkFileLogger':

Add benchmark logger that does stream upload to bigquery. #4210

Add benchmark logger that does stream upload to bigquery. #4210

Uh oh!

Conversation

qlzh727 commented May 9, 2018

Uh oh!

qlzh727 commented May 9, 2018

Uh oh!

robieta commented May 9, 2018

Uh oh!

qlzh727 commented May 9, 2018

Uh oh!

qlzh727 commented May 9, 2018

Uh oh!

karmel left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

karmel commented May 10, 2018

Uh oh!

qlzh727 commented May 10, 2018

Uh oh!

qlzh727 commented May 10, 2018

Uh oh!

qlzh727 commented May 10, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment