Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Add benchmark logger that does stream upload to bigquery. #4210

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
May 11, 2018

Conversation

qlzh727
Copy link
Member

@qlzh727 qlzh727 commented May 9, 2018

  1. All three type of benchmark loggers are behind the control flag:
    benchmark_logger_type.

  2. The benchmark_uploader is spilt into lib and main module.

  3. Added test cases with mock for bigquery related classes.

  4. The git mv does not seems to preserve the file history, even I try to just move the file to new location in the first commit.

@qlzh727
Copy link
Member Author

qlzh727 commented May 9, 2018

FYI, if compare the first commit and second one, you will be able to see the diff between for the moved file.

qlzh727 added 3 commits May 8, 2018 19:48
Skip the import of benchmark_uploader when bigquery is not installed.
@robieta
Copy link
Contributor

robieta commented May 9, 2018

I think a README would be quite helpful here. It's not clear to me how all of this fits together, or why multiple flavors of uploaders are necessary in the first place.

@qlzh727
Copy link
Member Author

qlzh727 commented May 9, 2018

Sure. Will do in a following commit.

@qlzh727
Copy link
Member Author

qlzh727 commented May 9, 2018

Actually, while thinking about readme, you are totally right about "why" multiple type of uploader is needed. Let me combine them into one.

Copy link
Contributor

@karmel karmel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -72,19 +61,17 @@ def upload_benchmark_run(self, dataset_name, table_name, run_id):
the data will be uploaded.
run_id: string, a unique ID that will be attached to the data, usually
this is a UUID4 format.
run_json: dict, the JSON data that contains the benchmark run info.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming of this is confusing-- is it JSON or a python dict? If the latter, why is it called json?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In python, there isn't a specific type of JSON. python uses dict to represent JSON data, and of course it has some restriction about what's the type of the key and value in the dict.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should absolutely not refer to dicts as JSON. Some dicts can be converted to JSON, but I tend to think that json in the variable names implies that the variable refers to the serialized string.


def upload_metric(self, dataset_name, table_name, run_id):
run_json["model_id"] = run_id
table_ref = self._bq_client.dataset(dataset_name).table(table_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment me-- what is happening here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is finding a uniq table reference based on dataset and table name, so that we can insert the data to it in the next step.

errors = self._bq_client.insert_rows_json(table_ref, metric_json_list)
if errors:
tf.logging.error(
"Failed to upload benchmark info to bigquery: {}".format(errors))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same as above; why not just have the above method call this method with the wrapped json object?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, extract the common piece into function.

expected_file = os.path.join(
self._logging_dir, logger.METRIC_LOG_FILE_NAME)
with tf.gfile.GFile(expected_file) as f:
with tf.gfile.GFile(metric_json_file) as f:
lines = f.readlines()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's easy to imagine this consuming all of the memory for a really big file. Why not do this as a generator? (That is, for line in f.read(), or whatever the correct syntax is, rather than reading in the whole file upfront.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The side effect of that is we will have one API call per line. I could implement some buffering and limit the max number of lines we load for each upload, if this is what u want. At the moment, we don't see a memory issue since usually we only have O(100) lines of metric record for small model. For big model like imagenet, it will be covered by the streaming upload.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But isn't this exactly the same as:

for line in f:
  if line.strip():
    metrics.append(json.loads(line))

Except that readlines requires all lines read into memory, and the above creates a generator that reads line by line?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks for the suggestion.

name="benchmark_logger_type", default="BaseBenchmarkLogger",
enum_values=["BaseBenchmarkLogger", "BenchmarkFileLogger",
"BenchmarkBigQueryLogger"],
help=help_wrap("The type of benchmark logger to use. Default to use "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Defaults to using

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

"BenchmarkBigQueryLogger"],
help=help_wrap("The type of benchmark logger to use. Default to use "
"BaseBenchmarkLogger which logs to STDOUT. Different "
"loggers will require other flags to be able to work."))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some fancy validators for those flags?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that I'm winning hearts and minds.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Added validation for flag combination.

_benchmark_logger = BaseBenchmarkLogger()
elif flag_obj.benchmark_logger_type == 'BenchmarkFileLogger':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need the filelogger as an option? If we are not going to use it, why complicate this with a third option that we have to maintain? That seems like it would allow us to strip half the code here and above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. The file logger is still useful for small model so that we can save the bigquery upload quota. The streaming one will be useful for those long running benchmark.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not always stream?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bigquery API support to have a usage quota, even we currently have small load, I think its still worthwhile to not spam the service. For small dataset like cifar10 which take 40 mins to finish, I would still prefer to upload the data after the run.

if not isinstance(value, numbers.Number):
tf.logging.warning(
"Metric value to log should be a number. Got %s", type(value))
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we need to maintain the File version, I would rather not have this check in multiple places-- extract to function?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

"global_step": global_step,
"timestamp": datetime.datetime.utcnow().strftime(
_DATE_TIME_FORMAT_PATTERN),
"extras": extras}]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, better still, looks like this is all shared code; extract or make these inherit from a shared object, rather than repeat.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@karmel
Copy link
Contributor

karmel commented May 10, 2018

Also, just to confirm: this should be non-blocking, correct? That is, it shouldn't affect the speed of training at all?

@qlzh727
Copy link
Member Author

qlzh727 commented May 10, 2018

I think the hooks are executed on a separate thread during training time, if I understand it correctly, so the training performance shouldn't be affected. During eval stage, this will add some overhead because of the bigquery upload for the eval result.

@qlzh727
Copy link
Member Author

qlzh727 commented May 10, 2018

And I was lying about the hook behavior. It seems that the hook does block on the after_run method, so computation running in hook will have a potential performance impact.

Let me update the metric logging to be execute on a different thread.

@qlzh727
Copy link
Member Author

qlzh727 commented May 10, 2018

Updated to use new thread for bigquery upload.

expected_file = os.path.join(
self._logging_dir, logger.METRIC_LOG_FILE_NAME)
with tf.gfile.GFile(expected_file) as f:
with tf.gfile.GFile(metric_json_file) as f:
lines = f.readlines()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But isn't this exactly the same as:

for line in f:
  if line.strip():
    metrics.append(json.loads(line))

Except that readlines requires all lines read into memory, and the above creates a generator that reads line by line?

_benchmark_logger = BaseBenchmarkLogger()
elif flag_obj.benchmark_logger_type == 'BenchmarkFileLogger':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not always stream?

run_info = _gather_run_info(model_name, dataset_name, run_params)
# Starting new thread for bigquery upload in case it might take long time
# and impact the benchmark and performance measurement.
thread.start_new_thread(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q1: If we spawn a thread to do this, what happens to errors raised? They will not interrupt the run, correct? What will happen in that scenario?
Q2: Under normal circumstances, we allow up to num_cpus for input processing, whereas this will require one separate CPU thread. In practice, it's really unlikely that this will matter, but it's important to be aware of possible heisenbug situations, where we are somehow slowing down runs by trying to observe them, especially when we get to CPU benchmarks. Thoughts? I guess we can compare the speed of non-streaming to streaming runs periodically to check for such an effect, but can this be done in a more automated way?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. We currently log the error and does not stop the training process.
  2. CPU benchmark might be harder since all the process running on the machine will have the potential impact of the performance. We can setup a separate run for non-streaming benchmark when we start testing on CPU.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay-- let's just make sure to record here in the comments that users should be aware of this potential performance implication, especially for CPU training, and run tests accordingly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Adding comment for potential performance impact for model on CPU.
Copy link
Contributor

@karmel karmel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Danke

The test is broken when the benchmark_logger_type is set first, and
validated when the benchmark_log_dir is not set yet.
@qlzh727 qlzh727 merged commit 0270cac into tensorflow:master May 11, 2018
@qlzh727 qlzh727 deleted the pr-fix branch May 11, 2018 22:07
omegafragger pushed a commit to omegafragger/models that referenced this pull request May 15, 2018
…#4210)

* Move the benchmark_uploader to new location.

* Update benchmark logger to streaming upload.

* Fix lint and unit test error.

* delint.

* Update the benchmark uploader test.

Skip the import of benchmark_uploader when bigquery is not installed.

* Merge the 2 classes of benchmark uploader into 1.

* Address review comments.

* delint.

* Execute bigquery upload in a separate thread.

* Change to use python six.moves for importing.

* Address review comments and delint.

* Address review comment.

Adding comment for potential performance impact for model on CPU.

* Fix random failure on py3.

* Fix the order of flag saver to avoid the randomness.

The test is broken when the benchmark_logger_type is set first, and
validated when the benchmark_log_dir is not set yet.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants