Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@ahmed-mez
Copy link
Contributor

@ahmed-mez ahmed-mez commented May 6, 2019

This PR optimizes the openmetrics parser using the logic introduced in #282 to optimize the prometheus parser.

Here are some benchmark using timeit:

call (x100000): _parse_sample('simple_metric 1.513767429e+09')

Simple example with prometheus parser: 0.2489180564880371
Simple example with openmetrics parser: 1.1144659519195557
Simple example with the optimized openmetrics parser: 0.5948491096496582

call (x100000): _parse_sample('kube_service_labels{label_app="kube-state-metrics",label_chart="kube-state-metrics-0.5.0",label_heritage="Tiller",label_release="ungaged-panther",namespace="default",service="ungaged-panther-kube-state-metrics"} 1')

KSM metric example with prometheus parser: 1.6796550750732422
KSM metric example openmetrics parser: 6.6183180809021
KSM metric example optimized openmetrics parser: 2.0289480686187744

@ahmed-mez ahmed-mez force-pushed the master branch 2 times, most recently from e6d7944 to 7832fed Compare May 6, 2019 10:37
Copy link
Contributor

@brian-brazil brian-brazil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is correct, on several points.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

internal functions and constants should begin with _

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this comparison be against 0?

This is also going to be n^2 overall if there's many backslashes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's n^2 in the worst case yes when we're escaping too many characters, I've mad a change by calling the function with smaller arguments _is_character_escaped(value_substr[:i], i) instead of _is_character_escaped(value_substr, i)
Let me know if you're thinking of a possible better solution to optimize πŸ‘

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you lstipping here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if label_name is zero length?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is n^2. Work from the start in one loop.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still n^2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's n^2 in the worst case yes when we're escaping too many characters, I've mad a change by calling the function with smaller arguments _is_character_escaped(value_substr[:i], i) instead of _is_character_escaped(value_substr, i)
Let me know if you're thinking of a possible better solution to optimize πŸ‘

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't you check each value, rather than the whole string?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if " # " is part of a label value?

You should also add a testcase for this so it doesn't trip up someone else.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tabs aren't supported.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a clean way to do this. Use find instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the strip?

@ahmed-mez
Copy link
Contributor Author

Thank you for reviewing the PR, I've made the requested changes and added a test case, the parsing logic is more solid now.
I'll be looking forward to get a second review. Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value_substr[-1] is more succinct

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still n^2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the lstrip?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still here, don't look inside error strings

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is guaranteed to be right after the equals.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the MetricFamily code catch this already?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is guaranteed to be after the ", if present.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use find instead of index

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if it (incorrectly) does?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed it to if text.count(seperator) == 0:
it should garantee that πŸ‘

@ahmed-mez
Copy link
Contributor Author

Here is the benchmark after the changes we made in this PR, the perfs got even better now ~3.9x

call (x100000): _parse_sample('simple_metric 1.513767429e+09')

Simple example with prometheus parser: 0.24088597297668457
Simple example with openmetrics parser: 1.116285800933838
Simple example with the optimized openmetrics parser: 0.48735499382019043

call (x100000): _parse_sample('kube_service_labels{label_app="kube-state-metrics",label_chart="kube-state-metrics-0.5.0",label_heritage="Tiller",label_release="ungaged-panther",namespace="default",service="ungaged-panther-kube-state-metrics"} 1')

KSM metric example with prometheus parser: 1.608799934387207
KSM metric example openmetrics parser: 6.636054039001465
KSM metric example optimized openmetrics parser: 1.7176191806793213

@brian-brazil
Copy link
Contributor

That looks about right. Could you expand the unittests to ensure we're covering everything for both the old and new way of parsing labels? Also, it'd be great if you could add tests for any things this PR had incorrect at any point as if you've made this mistake others likely will too and the tests are going to be used as the openmetrics test suite.

@ahmed-mez ahmed-mez changed the title optimize openmetrics text parsing (~3.3x perf) optimize openmetrics text parsing (~4x perf) May 15, 2019
@ahmed-mez
Copy link
Contributor Author

Added test cases, I guess the PR is ready for a final review :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

potentially contains

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would this handle something like {a} ?

Copy link
Contributor Author

@ahmed-mez ahmed-mez May 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made the required changes and added some test cases like that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As these tests will become the official regression suite for OpenMetrics, it'd be best to test a full line rather than just a function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added multiple test cases to assert what function are called πŸ‘

ahmed-mez added 10 commits May 16, 2019 15:47
Signed-off-by: Ahmed Mezghani <[email protected]>
Signed-off-by: Ahmed Mezghani <[email protected]>
Signed-off-by: Ahmed Mezghani <[email protected]>
Signed-off-by: Ahmed Mezghani <[email protected]>
Signed-off-by: Ahmed Mezghani <[email protected]>
Signed-off-by: Ahmed Mezghani <[email protected]>
Signed-off-by: Ahmed Mezghani <[email protected]>
@brian-brazil brian-brazil merged commit 6740213 into prometheus:master May 17, 2019
@brian-brazil
Copy link
Contributor

Thanks!

@ahmed-mez
Copy link
Contributor Author

Great! @brian-brazil :) any plans to release soon?

@brian-brazil
Copy link
Contributor

I'll add it to my todo list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants