Make benchmarking script more fault-tolerant #7674

GMNGeoffrey · 2021-11-17T00:49:57Z

This is not the prettiest Python I've ever written, but it does allow
restarting benchmark runs rather than losing all progress after a
single failure. This makes the workflow of starting a benchmarking run
and then coming back when it is finished far more workable.

I tried out incremental output of the final json results and reloading
from that, but decided against it because I had to manually construct
json (no native incremental support) and use context handlers to ensure
structures were closed even on a failure exit. Overall it ended up
being pretty gross. Since we were already using temporary files for
captures, this seemed like a reasonable way to go.

iree-github-actions-bot · 2021-11-17T02:55:48Z

Abbreviated Benchmark Summary

@ commit f791bf46d159d54e0933d191e54f39ef9dd943cb (vs. base 727d793f354ae4b68a0a7ef6920ee0993fe050db)

Regressed Benchmarks 🚩

Benchmark Name	Average Latency (ms)	Median Latency (ms)	Latency Standard Deviation (ms)
MobileNetV3Small [fp32,imagenet] (TensorFlow) 1-thread,little-core,full-inference,experimental-flags with IREE-Dylib @ SM-G980F (CPU-ARMv8.2-A)	228 (vs. 186, 22.58%↑)	228	2
MobileBertSquad [fp16] (TensorFlow) full-inference,experimental-flags with IREE-Vulkan @ SM-G980F (GPU-Mali-G77)	339 (vs. 288, 17.71%↑)	340	2
MobileNetV3Small [fp32,imagenet] (TensorFlow) 3-thread,little-core,full-inference,experimental-flags with IREE-VMVX @ Pixel-4 (CPU-ARMv8.2-A)	10562 (vs. 9187, 14.97%↑)	10560	9

[Top 3 out of 9 benchmark results showed]

Improved Benchmarks 🎉

Benchmark Name	Average Latency (ms)	Median Latency (ms)	Latency Standard Deviation (ms)
DeepLabV3 [fp32] (TFLite) kernel-execution,experimental-flags with IREE-Vulkan @ SM-G980F (GPU-Mali-G77)	23 (vs. 35, 34.29%↓)	23	0
DeepLabV3 [fp32] (TFLite) full-inference,default-flags with IREE-Vulkan @ SM-G980F (GPU-Mali-G77)	51 (vs. 72, 29.17%↓)	51	0
MobileSSD [fp32] (TFLite) little-core,full-inference,experimental-flags with IREE-Dylib-Sync @ SM-G980F (CPU-ARMv8.2-A)	1441 (vs. 1685, 14.48%↓)	1356	142

[Top 3 out of 4 benchmark results showed]

For more information:

antiagainst

LGTM, thanks for adding this! But given this keep going feature does not have some count down, I assume it's only meant for local usage.

My main comment is mostly around having more comments explaining the intent. I know it's an art as when/where to comment to avoid being excessive; but generally if the logic is too branchy and lengthy I'd suggest to put more to keep following the logic. :)

build_tools/benchmarks/common/benchmark_definition.py

build_tools/benchmarks/run_benchmarks_on_android.py

GMNGeoffrey · 2021-11-18T00:12:33Z

LGTM, thanks for adding this! But given this keep going feature does not have some count down, I assume it's only meant for local usage.

Not sure what you mean by a countdown? My main use case was for running locally, although I think it would also be nice to be able to do this on CI so you could see the results. It would require plumbing more things through though, so I wouldn't do it right now.

My main comment is mostly around having more comments explaining the intent. I know it's an art as when/where to comment to avoid being excessive; but generally if the logic is too branchy and lengthy I'd suggest to put more to keep following the logic. :)

Yeah, like I said, I don't really love this. There are a lot of switches and parameters all over the place 😕 I'll add some more comments and docstrings.

GMNGeoffrey · 2021-11-18T03:09:30Z

AttributeError: 'Namespace' object has no attribute 'tmpdir'

(╯°□°)╯︵ ┻━┻

I hate dynamic languages

antiagainst · 2021-11-18T16:22:36Z

FYI: this breaks uploading to dashboard: https://buildkite.com/iree/iree-benchmark/builds/1534#68e90f49-9c6e-4c4a-81ff-36d8a4895f57. Looks we missed one place that should be updated. Fix here: #7694.

GMNGeoffrey · 2021-11-18T16:26:26Z

Argh, thanks for fixing

This is not the prettiest Python I've ever written, but it does allow restarting benchmark runs rather than losing all progress after a single failure. This makes the workflow of starting a benchmarking run and then coming back when it is finished far more workable. I tried out incremental output of the final json results and reloading from that, but decided against it because I had to manually construct json (no native incremental support) and use context handlers to ensure structures were closed even on a failure exit. Overall it ended up being pretty gross. Since we were already using temporary files for captures, this seemed like a reasonable way to go.

google-cla bot added the cla: yes label Nov 17, 2021

GMNGeoffrey force-pushed the benchmark-scripting branch 7 times, most recently from 25c621a to 3822e6b Compare November 17, 2021 01:49

Make benchmarking script more fault-tolerant

5c141e2

GMNGeoffrey force-pushed the benchmark-scripting branch from 3822e6b to 5c141e2 Compare November 17, 2021 01:59

GMNGeoffrey added the buildkite:benchmark label Nov 17, 2021

antiagainst approved these changes Nov 17, 2021

View reviewed changes

GMNGeoffrey added 4 commits November 17, 2021 17:12

Update docs and variable names

41bbf45

Fix temp file deletion

3d6878b

Formatting

cbf254e

Fix actual errors caught by pytype!

9a89d27

GMNGeoffrey added 2 commits November 17, 2021 19:18

Fix typo

758ec16

Make all CLI arguments accept either dashes or underscores

f791bf4

GMNGeoffrey merged commit 2c6281b into iree-org:main Nov 18, 2021

GMNGeoffrey deleted the benchmark-scripting branch November 18, 2021 05:47

KoolJBlack mentioned this pull request Nov 18, 2021

Merge main -> google #7698

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make benchmarking script more fault-tolerant #7674

Make benchmarking script more fault-tolerant #7674

Uh oh!

GMNGeoffrey commented Nov 17, 2021 •

edited

Loading

Uh oh!

iree-github-actions-bot commented Nov 17, 2021 •

edited

Loading

Uh oh!

antiagainst left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

GMNGeoffrey commented Nov 18, 2021

Uh oh!

GMNGeoffrey commented Nov 18, 2021

Uh oh!

antiagainst commented Nov 18, 2021

Uh oh!

GMNGeoffrey commented Nov 18, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Make benchmarking script more fault-tolerant #7674

Make benchmarking script more fault-tolerant #7674

Uh oh!

Conversation

GMNGeoffrey commented Nov 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

iree-github-actions-bot commented Nov 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Abbreviated Benchmark Summary

Regressed Benchmarks 🚩

Improved Benchmarks 🎉

Uh oh!

antiagainst left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

GMNGeoffrey commented Nov 18, 2021

Uh oh!

GMNGeoffrey commented Nov 18, 2021

Uh oh!

antiagainst commented Nov 18, 2021

Uh oh!

GMNGeoffrey commented Nov 18, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

GMNGeoffrey commented Nov 17, 2021 •

edited

Loading

iree-github-actions-bot commented Nov 17, 2021 •

edited

Loading