Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[Bug]: Avoid generating font cache on import time #28485

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
FRidh opened this issue Jun 28, 2024 · 11 comments
Closed

[Bug]: Avoid generating font cache on import time #28485

FRidh opened this issue Jun 28, 2024 · 11 comments

Comments

@FRidh
Copy link

FRidh commented Jun 28, 2024

Bug summary

Matplotlib generates as a side-effect a font cache during import time. This is needed for plotting functionality to work. Unfortunately, this also means that any (transitive) dependency that eagerly imports matplotlib will now result in the cache to be generated, even when the user is at no time using matplotlib.

I opened an issue about this 8 years ago #7592. At the time the issue was regarding Linux. Now, it seems the behaviour has changed on MacOS and it happens there as well.

Code for reproduction

import matplotlib

Actual outcome

/nix/store/774r81hjzn2p64jnsdyk0cl2rbjabghj-python3-3.12.3-env/lib/python3.12/site-packages/matplotlib/font_manager.py:270: in _get_macos_fonts
    subprocess.check_output(["system_profiler", "-xml", "SPFontsDataType"]))
/nix/store/65ackbgqn02p6fy75rksjbp17zj6440j-python3-3.12.3/lib/python3.12/subprocess.py:466: in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
/nix/store/65ackbgqn02p6fy75rksjbp17zj6440j-python3-3.12.3/lib/python3.12/subprocess.py:548: in run
    with Popen(*popenargs, **kwargs) as process:
/nix/store/65ackbgqn02p6fy75rksjbp17zj6440j-python3-3.12.3/lib/python3.12/subprocess.py:1026: in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
/nix/store/65ackbgqn02p6fy75rksjbp17zj6440j-python3-3.12.3/lib/python3.12/subprocess.py:1955: in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
E   FileNotFoundError: [Errno 2] No such file or directory: 'system_profiler'

Expected outcome

Preferably no side-effect during import-time. Or, as a work-around, some option such as an environment variable to disable the side-effect.

Additional information

Obviously if packages would not import matplotlib eagerly this would not be an issue. However, as a end-user this cannot be controlled. Anywhere in the dependency-tree one could import a library that is not required per se. But that does not mean libraries should just do side-effects during import time. Having an environment variable to disable the cache building would already be great, as it would provide a way out, other than trying to patch a (transitive) dependency.

Operating system

OSX

Matplotlib Version

3.9.0

Matplotlib Backend

No response

Python version

3.12

Jupyter version

No response

Installation

None

@tacaswell
Copy link
Member

I stand my comment from 2016 (#7592 (comment)) and we need this font cache to make (almost) any reasonable plot. If we find an existing cache on import we use it rather than re-generating.

Ensuring that we know where to find fonts has always been the behavior on all platforms, what has changed on macos is that we now ask the system to tell us what fonts it has (rather than trying to explore the disk) via #27230 . Given that this is failing, I think the issue is that nix is being too restrictive about what other programs are available and its packaging of Matplotlib needs to be updated to include access to system_profiler. Even if we were to do as requested (move checking/generating the cache until the first time we need it) it would not solve the actual problem that the nix packaking is broken on macOS. @FRidh Can you please report this downstream to nix? I do not know if it is technically possible, but it may also be worth working with nix upstream to generate and include a version of the font cache as part of their build process.

I am going to close this because I do not think we should take any action because in most cases this is a one-time-ever operation and delaying the expense only helps in the case where a) you are in a context where the cache does not persist b) we are imported but not actually used. The case of non-persistent cache can be handle by pre-generating it in the build process e.g. for containers. In this case delaying the generation would mask a show-stopping issue until later that I would rather be caught on import.

@tacaswell tacaswell closed this as not planned Won't fix, can't repro, duplicate, stale Jun 28, 2024
@timhoffm
Copy link
Member

where a) you are in a context where the cache does not persist b) we are imported but not actually used.

This may be a common case for test pipelines, where an environment is built from scratch. Typically, other packages will import us globally, but more often than not you don't test plotting functionality. Just a guess - would need confirmation before considering any action.

@FRidh
Copy link
Author

FRidh commented Jun 29, 2024

I am a Nixpkgs maintainer actually. Indeed, as mentioned in the previous post, it is getting increasingly common to run tests and analysis in such pipeline. Now, whenever a package imports matplotlib eagerly it breaks. This is the main issue here, something that could easily be avoided by letting matplotlib listening to some variable.

@tacaswell
Copy link
Member

Now, whenever a package imports matplotlib eagerly it breaks.

I do not see how eagerly or not matters here and from this report I would expect any import of Matplotlib (intentional or otherwise) is going to fail under the nix packaging? It may be the case that this is being expressed is when the import is extraneous, but this still looks like fundementally like a nix packaging issue.

This may be a common case for test pipelines, where an environment is built from scratch.

That maybe true, but we have no way to tell when we are being imported if we are going to be actually used in the process or not and I do not think we should make any comprises / have impacts on users who are actually using us.

Maybe we should go all-in on lazy imports (which will spread out the import times) but if we do that it should be library wide not just for fonts.

For fontconfig fonts we use a try-except that fails gracefully if the subprocess fails, we probably should be doing the same with system_profiler:

@lru_cache
def _get_fontconfig_fonts():
"""Cache and list the font paths known to ``fc-list``."""
try:
if b'--format' not in subprocess.check_output(['fc-list', '--help']):
_log.warning( # fontconfig 2.7 implemented --format.
'Matplotlib needs fontconfig>=2.7 to query system fonts.')
return []
out = subprocess.check_output(['fc-list', '--format=%{file}\\n'])
except (OSError, subprocess.CalledProcessError):
return []
return [Path(os.fsdecode(fname)) for fname in out.split(b'\n')]
@lru_cache
def _get_macos_fonts():
"""Cache and list the font paths known to ``system_profiler SPFontsDataType``."""
d, = plistlib.loads(
subprocess.check_output(["system_profiler", "-xml", "SPFontsDataType"]))
return [Path(entry["path"]) for entry in d["_items"]]

@timhoffm
Copy link
Member

timhoffm commented Jul 2, 2024

Alternatively to lazy imports, we could also load the fontManager / font cache lazily.

In a quick test, assigning None in these three lines makes import matplotlib.pyplot as plt run without error.

fontManager = _load_fontmanager()
findfont = fontManager.findfont
get_font_names = fontManager.get_font_names

This means, we currently load the fontManager at import time, but don't use it. We thus could refactor FontManager to do the work only on first use instead of creation.

@lindsayad
Copy link

Our test harness broke on Mac in the last week because of all these system_profiler errors (e.g. libMesh/libmesh#3894). In our case we are doing direct import matplotlib.pyplot as plt. In all these CI recipes we are obtaining Matplotlib through conda. We're trying to figure out what changed; also any changes here to avoid these errors would be great

@tacaswell
Copy link
Member

#28498 was opened (possibly without knowing about this issue?) that will keep from failing on import.

We're trying to figure out what changed;

We finally get mpl 3.9.0 built on conda-forge yesterday (https://anaconda.org/conda-forge/matplotlib-base/files). I am a bit more concerned about this happening with conda packaging. @lindsayad Do you do modify PATH as part of your CI (per #28498 (comment))?

Alternatively to lazy imports, we could also load the fontManager / font cache lazily.

I remain skeptical of the cost/benefit ratio of being lazy about building the font manager. The only thing we are explicitly lazy about now is importing GUI toolkits (which we do because that can be a mutually-exclusive and irreversible action), once #28498 is merged this is only slight shifting paying the cost to generate the cache around, it is a cost you only pay the first time you import mpl on a given system (assuming configdir is writable), and it is possible to pre-generate the cache (e.g. in containers). That said, I won't block a PR to do this.

@loganharbour
Copy link

Do you do modify PATH as part of your CI (per #28498 (comment))?

We don't. However, all of our mac CI runs are executed within the mac sandbox to comply with some zero-trust requirements. This includes not bind-mounting just about everything in /usr/sbin, including system_profiler.

With this, I think #28498 is a sufficient solution for our problem. We may end up making system_profiler available, but we haven't gotten to that point yet.

@FRidh
Copy link
Author

FRidh commented Jul 3, 2024

The main effect of the issue reported here has been resolved with #28498 (by another Nixpkgs maintainer btw). Thanks for the discussion here.

@dvgica
Copy link

dvgica commented Jul 4, 2024

I would also like the ability to turn off this cache generation, ideally through an env var. My situation is that I'm running an API service with some ML models which use optbinning. For reasons I don't understand, optbinning eager-loads matplotlib even though it is clearly unused in this scenario. Generating the font cache adds 30 seconds to service startup time.

Yes, I can pre-populate the cache at service build time. But it would be nice if everyone in similar scenarios didn't have to deal with this prepopulation step. The env var could have a big warning "USE AT OWN RISK, THIS WILL BREAK MATPLOTLIB" or something.

@tacaswell
Copy link
Member

@dvgica Can you comment on #28488 (comment) ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants