-
Notifications
You must be signed in to change notification settings - Fork 45.6k
Add benchmark logger that does stream upload to bigquery. #4210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
FYI, if compare the first commit and second one, you will be able to see the diff between for the moved file. |
Skip the import of benchmark_uploader when bigquery is not installed.
I think a README would be quite helpful here. It's not clear to me how all of this fits together, or why multiple flavors of uploaders are necessary in the first place. |
Sure. Will do in a following commit. |
Actually, while thinking about readme, you are totally right about "why" multiple type of uploader is needed. Let me combine them into one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -72,19 +61,17 @@ def upload_benchmark_run(self, dataset_name, table_name, run_id): | |||
the data will be uploaded. | |||
run_id: string, a unique ID that will be attached to the data, usually | |||
this is a UUID4 format. | |||
run_json: dict, the JSON data that contains the benchmark run info. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The naming of this is confusing-- is it JSON or a python dict? If the latter, why is it called json?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In python, there isn't a specific type of JSON. python uses dict to represent JSON data, and of course it has some restriction about what's the type of the key and value in the dict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should absolutely not refer to dicts as JSON. Some dicts can be converted to JSON, but I tend to think that json in the variable names implies that the variable refers to the serialized string.
|
||
def upload_metric(self, dataset_name, table_name, run_id): | ||
run_json["model_id"] = run_id | ||
table_ref = self._bq_client.dataset(dataset_name).table(table_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment me-- what is happening here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is finding a uniq table reference based on dataset and table name, so that we can insert the data to it in the next step.
errors = self._bq_client.insert_rows_json(table_ref, metric_json_list) | ||
if errors: | ||
tf.logging.error( | ||
"Failed to upload benchmark info to bigquery: {}".format(errors)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the same as above; why not just have the above method call this method with the wrapped json object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, extract the common piece into function.
expected_file = os.path.join( | ||
self._logging_dir, logger.METRIC_LOG_FILE_NAME) | ||
with tf.gfile.GFile(expected_file) as f: | ||
with tf.gfile.GFile(metric_json_file) as f: | ||
lines = f.readlines() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's easy to imagine this consuming all of the memory for a really big file. Why not do this as a generator? (That is, for line in f.read(), or whatever the correct syntax is, rather than reading in the whole file upfront.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The side effect of that is we will have one API call per line. I could implement some buffering and limit the max number of lines we load for each upload, if this is what u want. At the moment, we don't see a memory issue since usually we only have O(100) lines of metric record for small model. For big model like imagenet, it will be covered by the streaming upload.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But isn't this exactly the same as:
for line in f:
if line.strip():
metrics.append(json.loads(line))
Except that readlines requires all lines read into memory, and the above creates a generator that reads line by line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thanks for the suggestion.
official/utils/flags/_benchmark.py
Outdated
name="benchmark_logger_type", default="BaseBenchmarkLogger", | ||
enum_values=["BaseBenchmarkLogger", "BenchmarkFileLogger", | ||
"BenchmarkBigQueryLogger"], | ||
help=help_wrap("The type of benchmark logger to use. Default to use " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Defaults to using
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
official/utils/flags/_benchmark.py
Outdated
"BenchmarkBigQueryLogger"], | ||
help=help_wrap("The type of benchmark logger to use. Default to use " | ||
"BaseBenchmarkLogger which logs to STDOUT. Different " | ||
"loggers will require other flags to be able to work.")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add some fancy validators for those flags?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that I'm winning hearts and minds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Added validation for flag combination.
_benchmark_logger = BaseBenchmarkLogger() | ||
elif flag_obj.benchmark_logger_type == 'BenchmarkFileLogger': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need the filelogger as an option? If we are not going to use it, why complicate this with a third option that we have to maintain? That seems like it would allow us to strip half the code here and above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. The file logger is still useful for small model so that we can save the bigquery upload quota. The streaming one will be useful for those long running benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not always stream?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bigquery API support to have a usage quota, even we currently have small load, I think its still worthwhile to not spam the service. For small dataset like cifar10 which take 40 mins to finish, I would still prefer to upload the data after the run.
official/utils/logs/logger.py
Outdated
if not isinstance(value, numbers.Number): | ||
tf.logging.warning( | ||
"Metric value to log should be a number. Got %s", type(value)) | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we need to maintain the File version, I would rather not have this check in multiple places-- extract to function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
official/utils/logs/logger.py
Outdated
"global_step": global_step, | ||
"timestamp": datetime.datetime.utcnow().strftime( | ||
_DATE_TIME_FORMAT_PATTERN), | ||
"extras": extras}] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or, better still, looks like this is all shared code; extract or make these inherit from a shared object, rather than repeat.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Also, just to confirm: this should be non-blocking, correct? That is, it shouldn't affect the speed of training at all? |
I think the hooks are executed on a separate thread during training time, if I understand it correctly, so the training performance shouldn't be affected. During eval stage, this will add some overhead because of the bigquery upload for the eval result. |
And I was lying about the hook behavior. It seems that the hook does block on the after_run method, so computation running in hook will have a potential performance impact. Let me update the metric logging to be execute on a different thread. |
Updated to use new thread for bigquery upload. |
expected_file = os.path.join( | ||
self._logging_dir, logger.METRIC_LOG_FILE_NAME) | ||
with tf.gfile.GFile(expected_file) as f: | ||
with tf.gfile.GFile(metric_json_file) as f: | ||
lines = f.readlines() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But isn't this exactly the same as:
for line in f:
if line.strip():
metrics.append(json.loads(line))
Except that readlines requires all lines read into memory, and the above creates a generator that reads line by line?
_benchmark_logger = BaseBenchmarkLogger() | ||
elif flag_obj.benchmark_logger_type == 'BenchmarkFileLogger': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not always stream?
official/utils/logs/logger.py
Outdated
run_info = _gather_run_info(model_name, dataset_name, run_params) | ||
# Starting new thread for bigquery upload in case it might take long time | ||
# and impact the benchmark and performance measurement. | ||
thread.start_new_thread( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q1: If we spawn a thread to do this, what happens to errors raised? They will not interrupt the run, correct? What will happen in that scenario?
Q2: Under normal circumstances, we allow up to num_cpus for input processing, whereas this will require one separate CPU thread. In practice, it's really unlikely that this will matter, but it's important to be aware of possible heisenbug situations, where we are somehow slowing down runs by trying to observe them, especially when we get to CPU benchmarks. Thoughts? I guess we can compare the speed of non-streaming to streaming runs periodically to check for such an effect, but can this be done in a more automated way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- We currently log the error and does not stop the training process.
- CPU benchmark might be harder since all the process running on the machine will have the potential impact of the performance. We can setup a separate run for non-streaming benchmark when we start testing on CPU.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay-- let's just make sure to record here in the comments that users should be aware of this potential performance implication, especially for CPU training, and run tests accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Adding comment for potential performance impact for model on CPU.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Danke
The test is broken when the benchmark_logger_type is set first, and validated when the benchmark_log_dir is not set yet.
…#4210) * Move the benchmark_uploader to new location. * Update benchmark logger to streaming upload. * Fix lint and unit test error. * delint. * Update the benchmark uploader test. Skip the import of benchmark_uploader when bigquery is not installed. * Merge the 2 classes of benchmark uploader into 1. * Address review comments. * delint. * Execute bigquery upload in a separate thread. * Change to use python six.moves for importing. * Address review comments and delint. * Address review comment. Adding comment for potential performance impact for model on CPU. * Fix random failure on py3. * Fix the order of flag saver to avoid the randomness. The test is broken when the benchmark_logger_type is set first, and validated when the benchmark_log_dir is not set yet.
All three type of benchmark loggers are behind the control flag:
benchmark_logger_type.
The benchmark_uploader is spilt into lib and main module.
Added test cases with mock for bigquery related classes.
The git mv does not seems to preserve the file history, even I try to just move the file to new location in the first commit.