Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Lazily compile regexes to prevent expensive compilation at import. #95

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

rtibbles
Copy link

@rtibbles rtibbles commented Sep 6, 2021

Currently all regexes are compiled at module import, which leads to intensive compilation of all possible regexes at module import.

In local testing on my dev machine this leads to importing ua_parser taking ~2.5 seconds.

Deferring regex compilation has two benefits:

  • Avoids doing all regex compilation at module import time, reducing module import to ~0.009 seconds, a roughly 300 times speed up
  • Regexes are compiled as needed meaning that even on first execution not all regexes will be compiled, unless it is last in the list

One possible future optimization based on this would be to do some sort of ordering of parsers based on browser/os/device prevalence to ensure that common lookups require less compilation, but as the ordering is determined in uap_core, and sensitive to the needs of potentially conflicting regexes, I did not make any attempt to implement that here.

@mattrobenolt
Copy link
Member

So while I get the sentiment, the flipside is also not great.

In the primary use case in which this library was used, it was for a web service. In the web service, I'd personally prefer to take the hit on import time, rather than happening lazily over time.

I'd rather it take a bit longer for the server to start up and serving requests, rather than the first N requests taking more time to serve.

So ultimately, I think there's not a real way to win here and cover all concerns.

I'd propose if you would rather not take the hit on import time, wrap the import in your function so it's called on demand.

Something like:

def do_the_thing():
    from ua_parser import user_agent_parser as uap
    uap.Parse(...)

but this way you control when it gets imported and choose to take the hit when needed.

I think the only reasonable alternative is to control this behavior iwth an environment variable, so the environment variable can be checked when the module is loaded, and toggle the behavior based on that.

Otherwise, this change is going to be a negative affect on folks that prefer to take the hit on import.

Thoughts?

@masklinn
Copy link
Contributor

I'll close this for now, for the reasons @mattrobenolt explained, and because providing for a choice here would likely require a fair amount of design work. So would trying to find a better way to organise or lookup things (which might also be an option).

I'd be interested in your use case though @rtibbles what is the situation where you are unhappy paying the price upfront, but are happy to pay pretty much the same one later on during run?

@masklinn masklinn closed this Apr 26, 2022
@masklinn
Copy link
Contributor

masklinn commented May 1, 2022

With that said @rtibbles I'd be interested in having more information about your system and possibly the size of your _regexes.py file, do you have something special there?

Because on my machine with a 7860 lines _regexes.py:

> time python3.6 -c 'import ua_parser.user_agent_parser'
python3.6 -c 'import ua_parser.user_agent_parser'  0.13s user 0.04s system 91% cpu 0.180 total
> time python3.7 -c 'import ua_parser.user_agent_parser'
python3.7 -c 'import ua_parser.user_agent_parser'  0.12s user 0.04s system 92% cpu 0.170 total
> time python3.8 -c 'import ua_parser.user_agent_parser'
python3.8 -c 'import ua_parser.user_agent_parser'  0.11s user 0.04s system 91% cpu 0.165 total
> time python3.9 -c 'import ua_parser.user_agent_parser'
python3.9 -c 'import ua_parser.user_agent_parser'  0.12s user 0.04s system 91% cpu 0.167 total
> time python3.10 -c 'import ua_parser.user_agent_parser'
python3.10 -c 'import ua_parser.user_agent_parser'  0.11s user 0.04s system 92% cpu 0.163 total

the import times you're referring to are an order of magnitude slower, which seems odd, is it some sort of strangely slow SoC?

@masklinn masklinn mentioned this pull request May 2, 2022
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants