optimize openmetrics text parsing (~4x perf) #402

ahmed-mez · 2019-05-06T09:36:50Z

This PR optimizes the openmetrics parser using the logic introduced in #282 to optimize the prometheus parser.

Here are some benchmark using timeit:

call (x100000): _parse_sample('simple_metric 1.513767429e+09')

Simple example with prometheus parser: 0.2489180564880371
Simple example with openmetrics parser: 1.1144659519195557
Simple example with the optimized openmetrics parser: 0.5948491096496582

call (x100000): _parse_sample('kube_service_labels{label_app="kube-state-metrics",label_chart="kube-state-metrics-0.5.0",label_heritage="Tiller",label_release="ungaged-panther",namespace="default",service="ungaged-panther-kube-state-metrics"} 1')

KSM metric example with prometheus parser: 1.6796550750732422
KSM metric example openmetrics parser: 6.6183180809021
KSM metric example optimized openmetrics parser: 2.0289480686187744

brian-brazil

I don't think this is correct, on several points.

brian-brazil · 2019-05-07T10:16:15Z

prometheus_client/openmetrics/parser.py

+}
+
+
+def replace_escape_sequence(match):


internal functions and constants should begin with _

brian-brazil · 2019-05-07T10:21:00Z

prometheus_client/openmetrics/parser.py

-    labelvalue = []
+def _is_character_escaped(s, charpos):
+    num_bslashes = 0
+    while (charpos > num_bslashes and


Shouldn't this comparison be against 0?

This is also going to be n^2 overall if there's many backslashes.

it's n^2 in the worst case yes when we're escaping too many characters, I've mad a change by calling the function with smaller arguments _is_character_escaped(value_substr[:i], i) instead of _is_character_escaped(value_substr, i)
Let me know if you're thinking of a possible better solution to optimize 👍

brian-brazil · 2019-05-07T10:26:23Z

prometheus_client/openmetrics/parser.py

+            # The label name is before the equal
+            value_start = sub_labels.index("=")
+            label_name = sub_labels[:value_start]
+            sub_labels = sub_labels[value_start + 1:].lstrip()


Why are you lstipping here?

brian-brazil · 2019-05-07T10:27:03Z

prometheus_client/openmetrics/parser.py

+            value_substr = sub_labels[quote_start:]
+
+            # Check for extra commas
+            if label_name[0] == ',' or value_substr[len(value_substr)-1] == ',':


What if label_name is zero length?

brian-brazil · 2019-05-07T10:27:46Z

prometheus_client/openmetrics/parser.py

+            i = 0
+            while i < len(value_substr):
+                i = value_substr.index('"', i)
+                if not _is_character_escaped(value_substr, i):


Yeah, this is n^2. Work from the start in one loop.

This is still n^2

it's n^2 in the worst case yes when we're escaping too many characters, I've mad a change by calling the function with smaller arguments _is_character_escaped(value_substr[:i], i) instead of _is_character_escaped(value_substr, i)
Let me know if you're thinking of a possible better solution to optimize 👍

brian-brazil · 2019-05-07T10:29:16Z

prometheus_client/openmetrics/parser.py

+            quote_end = i + 1
+            label_value = sub_labels[quote_start:quote_end]
+            # Replace escaping if needed
+            if escaping:


Couldn't you check each value, rather than the whole string?

brian-brazil · 2019-05-07T10:30:12Z

prometheus_client/openmetrics/parser.py

+        # `index` and `rindex` methods raise a ValueError with
+        # `substring not found` message if text doesn't contain label braces
+        label_start = text.index("{")
+        label_end = text.rindex("}", 0, text.find(" # "))  # ignore exemplar label braces


What if " # " is part of a label value?

You should also add a testcase for this so it doesn't trip up someone else.

brian-brazil · 2019-05-07T10:31:50Z

prometheus_client/openmetrics/parser.py

+            # Detect what separator is used
+            separator = " "
+            if separator not in text:
+                separator = "\t"


Tabs aren't supported.

brian-brazil · 2019-05-07T10:32:39Z

prometheus_client/openmetrics/parser.py

+        return Sample(name, labels, value, timestamp, exemplar)
+
+    except ValueError as e:
+        if str(e).startswith("substring not found"):


This is not a clean way to do this. Use find instead.

brian-brazil · 2019-05-07T10:32:46Z

prometheus_client/openmetrics/parser.py

+        label_start = text.index("{")
+        label_end = text.rindex("}", 0, text.find(" # "))  # ignore exemplar label braces
+        # The name is before the labels
+        name = text[:label_start].strip()


Why the strip?

ahmed-mez · 2019-05-09T09:05:41Z

Thank you for reviewing the PR, I've made the requested changes and added a test case, the parsing logic is more solid now.
I'll be looking forward to get a second review. Thanks!

brian-brazil · 2019-05-09T10:23:33Z

prometheus_client/openmetrics/parser.py

+                raise ValueError
+
+            # Check for extra commas
+            if label_name[0] == ',' or value_substr[len(value_substr) - 1] == ',':


value_substr[-1] is more succinct

brian-brazil · 2019-05-09T10:23:46Z

prometheus_client/openmetrics/parser.py

+            i = 0
+            while i < len(value_substr):
+                i = value_substr.index('"', i)
+                if not _is_character_escaped(value_substr, i):


This is still n^2

brian-brazil · 2019-05-09T10:25:59Z

prometheus_client/openmetrics/parser.py

+            # Remove the processed label from the sub-slice for next iteration
+            sub_labels = sub_labels[quote_end + 1:]
+            next_comma = sub_labels.find(",") + 1
+            sub_labels = sub_labels[next_comma:].lstrip()


Why the lstrip?

brian-brazil · 2019-05-09T10:26:53Z

prometheus_client/openmetrics/parser.py

+        return Sample(name, labels, value, timestamp, exemplar)
+
+    except ValueError as e:
+        if str(e).find("substring not found") > -1:


This is still here, don't look inside error strings

brian-brazil · 2019-05-10T14:20:29Z

prometheus_client/openmetrics/parser.py

+            sub_labels = sub_labels[value_start + 1:]
+
+            # Find the first quote after the equal
+            quote_start = sub_labels.index('"') + 1


This is guaranteed to be right after the equals.

brian-brazil · 2019-05-10T14:21:01Z

prometheus_client/openmetrics/parser.py

+            value_substr = sub_labels[quote_start:]
+
+            # Check for empty label name
+            if len(label_name) == 0:


Shouldn't the MetricFamily code catch this already?

brian-brazil · 2019-05-10T14:23:47Z

prometheus_client/openmetrics/parser.py

+
+            # Remove the processed label from the sub-slice for next iteration
+            sub_labels = sub_labels[quote_end + 1:]
+            next_comma = sub_labels.find(",") + 1


This is guaranteed to be after the ", if present.

brian-brazil · 2019-05-10T14:24:08Z

prometheus_client/openmetrics/parser.py

-    value = []
+    # Detect the labels in the text
+    try:
+        # `index` method raises a ValueError with


Use find instead of index

brian-brazil · 2019-05-10T14:24:32Z

prometheus_client/openmetrics/parser.py

+        name = text[:label_start]
+        seperator = " # "
+        if not name.endswith("_bucket") or text.count(seperator) == 0:
+            # Line doesn't contain an exemplar


What if it (incorrectly) does?

changed it to if text.count(seperator) == 0:
it should garantee that 👍

ahmed-mez · 2019-05-14T09:13:49Z

Here is the benchmark after the changes we made in this PR, the perfs got even better now ~3.9x

call (x100000): _parse_sample('simple_metric 1.513767429e+09')

Simple example with prometheus parser: 0.24088597297668457
Simple example with openmetrics parser: 1.116285800933838
Simple example with the optimized openmetrics parser: 0.48735499382019043

call (x100000): _parse_sample('kube_service_labels{label_app="kube-state-metrics",label_chart="kube-state-metrics-0.5.0",label_heritage="Tiller",label_release="ungaged-panther",namespace="default",service="ungaged-panther-kube-state-metrics"} 1')

KSM metric example with prometheus parser: 1.608799934387207
KSM metric example openmetrics parser: 6.636054039001465
KSM metric example optimized openmetrics parser: 1.7176191806793213

brian-brazil · 2019-05-14T11:46:20Z

That looks about right. Could you expand the unittests to ensure we're covering everything for both the old and new way of parsing labels? Also, it'd be great if you could add tests for any things this PR had incorrect at any point as if you've made this mistake others likely will too and the tests are going to be used as the openmetrics test suite.

ahmed-mez · 2019-05-16T08:27:20Z

Added test cases, I guess the PR is ready for a final review :)

brian-brazil · 2019-05-16T09:25:58Z

prometheus_client/openmetrics/parser.py

+        label = text[label_start + 1:label_end]
+        labels = _parse_labels(label)
+    else:
+        # Line contains an exemplar


potentially contains

brian-brazil · 2019-05-16T09:26:59Z

prometheus_client/openmetrics/parser.py

+def _parse_labels(text):
+    labels = {}
+    # Return if we don't have valid labels
+    if "=" not in text:


How would this handle something like {a} ?

made the required changes and added some test cases like that

brian-brazil · 2019-05-16T09:28:21Z

tests/openmetrics/test_parser.py

+
+    @unittest.skipIf(sys.version_info < (3, 3), "Test requires Python 3.3+.")
+    def test_fallback_to_state_machine_label_parsing(self):
+        from unittest.mock import patch


As these tests will become the official regression suite for OpenMetrics, it'd be best to test a full line rather than just a function.

added multiple test cases to assert what function are called 👍

Signed-off-by: Ahmed Mezghani <[email protected]>

brian-brazil · 2019-05-17T09:38:59Z

Thanks!

ahmed-mez · 2019-05-17T09:42:37Z

Great! @brian-brazil :) any plans to release soon?

brian-brazil · 2019-05-17T09:46:38Z

I'll add it to my todo list.

ahmed-mez mentioned this pull request May 6, 2019

Openmetrics text parser performance #401

Closed

ahmed-mez force-pushed the master branch 2 times, most recently from e6d7944 to 7832fed Compare May 6, 2019 10:37

brian-brazil reviewed May 7, 2019

View reviewed changes

ahmed-mez force-pushed the master branch from bb08fef to 1130220 Compare May 9, 2019 09:01

ahmed-mez mentioned this pull request May 9, 2019

Update prometheus client to 0.6.0, handle counter metric name change DataDog/integrations-core#3700

Closed

6 tasks

brian-brazil reviewed May 9, 2019

View reviewed changes

ahmed-mez force-pushed the master branch from c8aa7a3 to 4090c66 Compare May 9, 2019 11:40

brian-brazil reviewed May 10, 2019

View reviewed changes

ahmed-mez force-pushed the master branch from de3a73f to 811b743 Compare May 10, 2019 15:42

ahmed-mez force-pushed the master branch from eb3e2e1 to 0c52438 Compare May 14, 2019 16:51

ahmed-mez changed the title ~~optimize openmetrics text parsing (~3.3x perf)~~ optimize openmetrics text parsing (~4x perf) May 15, 2019

brian-brazil reviewed May 16, 2019

View reviewed changes

ahmed-mez added 10 commits May 16, 2019 15:47

optimize openmetrics text parsing

1bb2751

Signed-off-by: Ahmed Mezghani <[email protected]>

consider pypy exception msg

4b18b23

Signed-off-by: Ahmed Mezghani <[email protected]>

more solid parsing + add test case

f5ecac9

Signed-off-by: Ahmed Mezghani <[email protected]>

exeception handling

7f97c55

Signed-off-by: Ahmed Mezghani <[email protected]>

optimize detecting escaped chars

6b377d5

Signed-off-by: Ahmed Mezghani <[email protected]>

changes after review

8aff6e7

Signed-off-by: Ahmed Mezghani <[email protected]>

more testcases

da13cde

Signed-off-by: Ahmed Mezghani <[email protected]>

better labels length counting logic

62d8850

Signed-off-by: Ahmed Mezghani <[email protected]>

better edge cases handling

59dfaf2

Signed-off-by: Ahmed Mezghani <[email protected]>

more test cases

6b4dae9

Signed-off-by: Ahmed Mezghani <[email protected]>

ahmed-mez force-pushed the master branch from d1b85d6 to 6b4dae9 Compare May 16, 2019 13:47

brian-brazil merged commit 6740213 into prometheus:master May 17, 2019

optimize openmetrics text parsing (~4x perf) #402

optimize openmetrics text parsing (~4x perf) #402

Uh oh!

Conversation

ahmed-mez commented May 6, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brian-brazil left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahmed-mez commented May 9, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahmed-mez commented May 14, 2019

Uh oh!

brian-brazil commented May 14, 2019

Uh oh!

ahmed-mez commented May 16, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahmed-mez May 16, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

brian-brazil commented May 17, 2019

Uh oh!

ahmed-mez commented May 17, 2019

Uh oh!

ahmed-mez commented May 6, 2019 •

edited

Loading

ahmed-mez May 16, 2019 •

edited

Loading