Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Default / lazy parser not cached #253

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Rafiot opened this issue Jan 31, 2025 · 6 comments · Fixed by #255
Closed

Default / lazy parser not cached #253

Rafiot opened this issue Jan 31, 2025 · 6 comments · Fixed by #255

Comments

@Rafiot
Copy link

Rafiot commented Jan 31, 2025

I'm guessing I'm doing something wrong, because it makes very little sense when reading the doc but in short, on my machine, the slowest parser is ua-parser-rs.

That's what I'm doing in ipython (but I have the same results just with the python interpreter, it's just easier to time it):

from ua_parser import parse
ua_string = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.104 Safari/537.36'

%timeit parse(ua_string).with_defaults()
  • With ua-parser-rs:

734 ms ± 27.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

  • With just google-re2

188 ms ± 10.9 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

  • After I uninstall ua-parser-rs and google-re2

1.07 ms ± 33.3 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

I'm on ubuntu 24.10 with Python 3.12.7.

Do you have any idea what's going on there? Something related to the cache being reloaded on every call? I couldn't find a way to avoid that.

@masklinn
Copy link
Contributor

masklinn commented Jan 31, 2025

note: I'm leaving the comment for posterity and the information in it is technically true, but it's a reply to a complete misunderstanding of the report, so you can skip it if you're facing the same issue, the comments afterwards are the investigation "proper", and the fix has been released as part of 1.0.1.

I'm guessing I'm doing something wrong [...] Something related to the cache being reloaded on every call? I couldn't find a way to avoid that.

For what that's worth1 only the basic python parser is cached by default2: on my benchmarks while a (sufficiently large) cache does improve the performance of the native parsers I considered the additional memory not really worth it given what little effect it had on what I considered a real-world benchmark.

Since you're looping on a single user agent string, it makes sense that the one cached parser would be faster than the two natives, as it's basically an ideal situation for a cache (once the UA has been parsed once future calls are just a cache hit), however the difference in scale you report is significantly larger than I would have expected, and so is the difference between re2 and regex. I'll have to see how ipython interacts with things when I'm able.

In the meantime, could you provide details for the machine you're running things on? (e.g. CPU model / architecture) I've mostly been benching on my dev machine (might have been smart to have some benches I could run on GHA, I should do that).

And could you try the bench script provided with ua-parser? Most of my benchmarking has been done using https://raw.githubusercontent.com/ua-parser/uap-python/refs/heads/master/samples/useragents.txt so I'd be grateful if you could download that file and run

python -mua_parser bench useragents.txt

Then maybe try to run a pared down configuration on id.txt (that's the same user agent repeated 100000 times):

python -mua_parser bench id.txt --caches none sieve --cachesizes 1

note that it might take a fairly long time (possibly minutes)

Footnotes

  1. it's documented but not made super clear in the readme as I assumed it wouldn't generally be visible or relevant

  2. you can create a custom parser with caching though, cf cache and other advanced parser customisations in the documentation

@Rafiot
Copy link
Author

Rafiot commented Feb 1, 2025

For context, my real-world use-case is here: https://github.com/Lookyloo/lookyloo/blob/main/lookyloo/helpers.py#L389
It is triggered when I submit a URL to capture, don't specify a user agent, fallback to the default one from chromium, which is then parsed.
I noticed the API took ~1s to respond when it should be immediate (and it is with the basic python parser).

Either way, it's not really relevant in this context, and the amount of UAs I need to parse doesn't require anything highly performant, but I have long running processes so I'm happy to use the rust parser, if I can make sure it is loaded only once.

And I think it is the way to go (?):

import ua_parser

base = ua_parser.regex.Resolver(ua_parser.loaders.load_lazy_builtins())
ua_parser.parser = ua_parser.Parser(base)
ua_string = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.104 Safari/537.36'

%timeit ua_parser.parse(ua_string).with_defaults()

=> 24 μs ± 383 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


CPU on my machine: Intel(R) Core(TM) i7-8665U CPU @ 1.90GHz / x86_64
CPU on the server: Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz / x86_64


Benchmarks (on my machine)
useragents.txt: 75158 lines, 20322 unique (27%)
basic              : 50.65s ( 674us/line)
basic-lru-10       : 52.29s ( 696us/line)
basic-lru-20       : 52.49s ( 698us/line)
basic-lru-50       : 51.95s ( 691us/line)
basic-lru-100      : 50.82s ( 676us/line)
basic-lru-200      : 47.56s ( 633us/line)
basic-lru-500      : 41.02s ( 546us/line)
basic-lru-1000     : 34.04s ( 453us/line)
basic-lru-2000     : 28.64s ( 381us/line)
basic-lru-5000     : 20.56s ( 274us/line)
basic-s3fifo-10    : 53.35s ( 710us/line)
basic-s3fifo-20    : 52.96s ( 705us/line)
basic-s3fifo-50    : 47.29s ( 629us/line)
basic-s3fifo-100   : 44.54s ( 593us/line)
basic-s3fifo-200   : 40.24s ( 535us/line)
basic-s3fifo-500   : 34.81s ( 463us/line)
basic-s3fifo-1000  : 30.15s ( 401us/line)
basic-s3fifo-2000  : 25.08s ( 334us/line)
basic-s3fifo-5000  : 18.84s ( 251us/line)
basic-sieve-10     : 56.27s ( 749us/line)
basic-sieve-20     : 53.51s ( 712us/line)
basic-sieve-50     : 48.31s ( 643us/line)
basic-sieve-100    : 44.51s ( 592us/line)
basic-sieve-200    : 40.12s ( 534us/line)
basic-sieve-500    : 34.16s ( 455us/line)
basic-sieve-1000   : 28.32s ( 377us/line)
basic-sieve-2000   : 23.23s ( 309us/line)
basic-sieve-5000   : 18.25s ( 243us/line)
re2                :  4.58s (  61us/line)
re2-lru-10         :  5.02s (  67us/line)
re2-lru-20         :  4.93s (  66us/line)
re2-lru-50         :  5.02s (  67us/line)
re2-lru-100        :  4.57s (  61us/line)
re2-lru-200        :  4.37s (  58us/line)
re2-lru-500        :  3.72s (  50us/line)
re2-lru-1000       :  3.35s (  45us/line)
re2-lru-2000       :  2.69s (  36us/line)
re2-lru-5000       :  2.01s (  27us/line)
re2-s3fifo-10      :  4.65s (  62us/line)
re2-s3fifo-20      :  4.54s (  60us/line)
re2-s3fifo-50      :  4.38s (  58us/line)
re2-s3fifo-100     :  4.03s (  54us/line)
re2-s3fifo-200     :  3.93s (  52us/line)
re2-s3fifo-500     :  3.16s (  42us/line)
re2-s3fifo-1000    :  2.71s (  36us/line)
re2-s3fifo-2000    :  2.31s (  31us/line)
re2-s3fifo-5000    :  1.95s (  26us/line)
re2-sieve-10       :  4.57s (  61us/line)
re2-sieve-20       :  4.63s (  62us/line)
re2-sieve-50       :  4.33s (  58us/line)
re2-sieve-100      :  4.14s (  55us/line)
re2-sieve-200      :  3.73s (  50us/line)
re2-sieve-500      :  3.08s (  41us/line)
re2-sieve-1000     :  2.65s (  35us/line)
re2-sieve-2000     :  2.31s (  31us/line)
re2-sieve-5000     :  1.84s (  25us/line)
regex              :  2.58s (  34us/line)
regex-lru-10       :  2.75s (  37us/line)
regex-lru-20       :  2.69s (  36us/line)
regex-lru-50       :  2.68s (  36us/line)
regex-lru-100      :  2.62s (  35us/line)
regex-lru-200      :  2.82s (  38us/line)
regex-lru-500      :  2.39s (  32us/line)
regex-lru-1000     :  2.12s (  28us/line)
regex-lru-2000     :  1.74s (  23us/line)
regex-lru-5000     :  1.42s (  19us/line)
regex-s3fifo-10    :  2.94s (  39us/line)
regex-s3fifo-20    :  2.93s (  39us/line)
regex-s3fifo-50    :  2.69s (  36us/line)
regex-s3fifo-100   :  2.54s (  34us/line)
regex-s3fifo-200   :  2.32s (  31us/line)
regex-s3fifo-500   :  1.99s (  27us/line)
regex-s3fifo-1000  :  1.71s (  23us/line)
regex-s3fifo-2000  :  1.46s (  19us/line)
regex-s3fifo-5000  :  1.18s (  16us/line)
regex-sieve-10     :  2.85s (  38us/line)
regex-sieve-20     :  2.88s (  38us/line)
regex-sieve-50     :  2.66s (  35us/line)
regex-sieve-100    :  2.53s (  34us/line)
regex-sieve-200    :  2.27s (  30us/line)
regex-sieve-500    :  1.96s (  26us/line)
regex-sieve-1000   :  1.72s (  23us/line)
regex-sieve-2000   :  1.48s (  20us/line)
regex-sieve-5000   :  1.23s (  16us/line)
legacy             : 45.64s ( 607us/line)

@masklinn
Copy link
Contributor

masklinn commented Feb 1, 2025

For context, my real-world use-case is here: https://github.com/Lookyloo/lookyloo/blob/main/lookyloo/helpers.py#L389 It is triggered when I submit a URL to capture, don't specify a user agent, fallback to the default one from chromium, which is then parsed. I noticed the API took ~1s to respond when it should be immediate (and it is with the basic python parser).

Oooh I see now, I completely misunderstood your report (probably because mentions of cache kinda prime me with how much time I spent on that), I'm dreadfully sorry, you're talking about caching the parser itself after lazily instantiating it!

At a glance, it looks like I broke the parser memoization in #230: I forgot to keep the assignment of the parser onto the parser global: 6fb7b58#diff-f71b6cb0226b7966e6f4e7aa9b42e7482dc8eb9410323f3f0536dde7f04130c1L135-R140 That is consistent with what you report: the regex and re2 parsers have high instantiation overhead because they have to construct the regex filter, whereas the basic / pure python parser pretty much just stores the matchers it's handled.

Thanks for the report! And sorry again for the misunderstanding.

And I think it is the way to go (?):

import ua_parser

base = ua_parser.regex.Resolver(ua_parser.loaders.load_lazy_builtins())
ua_parser.parser = ua_parser.Parser(base)
ua_string = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.104 Safari/537.36'

That looks about right, yes. There should actually be a from_matchers method on ua_parser which builds the resolver for you using the "best available resolver" heuristics.

@masklinn masklinn changed the title Cache not cached (?) Default / lazy parser not cached Feb 1, 2025
@Rafiot
Copy link
Author

Rafiot commented Feb 1, 2025

Do not worry at all! I should have explained my issue better.

For now, I'll just use the default parser (as it is not blocking), but will go back to the rust one as soon as the bug is fixed: as it's on long running processes, it makes sense to have one long(ish) initialization, and quick parsing.

masklinn added a commit to masklinn/uap-python that referenced this issue Feb 1, 2025
Reported by @Rafiot: the lazy parser is not memoised, this has limited
effect on the basic / pure Python parser as its initialisation is
trivial, but it *significantly* impact the re2 and regex parsers as
they need to process regexes into a filter tree.

The memoization was mistakenly removed in ua-parser#230: while refactoring
initialisation I removed the setting of the `parser` global.

- add a test to ensure the parser is correctly memoized, not
  re-instantiated every time
- reinstate setting the global
- add a mutex on `__getattr__`, it should only be used on first access
  and avoids two threads creating an expensive parser at the same
  time (which is a waste of CPU)

Fixes ua-parser#253
@masklinn
Copy link
Contributor

masklinn commented Feb 1, 2025

FWIW I just published 1.0.1 which should fix the issue: using your test case (kinda I'm just using timeit at the CLI), on 1.0.0 I get:

  • 589 usec per loop on the basic parser
  • 85.6 msec per loop on the re2 parser
  • 349 msec per loop on the regex parser

Which does track with your observations at least in terms of scaling.

With 1.0.1 off of pypi,

  • 1.54 usec per loop on the basic parser
  • 24.9 usec per loop on the re2 parser
  • 13.6 usec per loop on the regex parser

And thanks yet again for the report, and sorry for the trouble.

@Rafiot
Copy link
Author

Rafiot commented Feb 1, 2025

Excellent, thank you very much, it works as expected!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants