Thanks to visit codestin.com
Credit goes to github.com

Skip to content

RFC: Built-in module extending and removing weak links / umodules #9018

Closed
@jimmo

Description

@jimmo

We use the "u" prefix for built-in modules for several reasons (see #7499 (comment) for the full story). The primary reason though is to allow foo.py to exist on the filesystem and extend the built-in ufoo. This feature is called "weak links" because the name foo used to be a "weak link" to ufoo (i.e. the link is "broken" by having the file foo.py on the filesystem). Today the feature, when enabled, is automatic for any built-in named ufoo.

There's a few drawbacks:

  • import foo is slower than necessary because it must search the filesystem to (usually) not find foo.py in sys.path only to eventually find ufoo in the builtins table.
  • This is a CPython incompatibility (in CPython you cannot replace a built-in from Python via the filesystem... although you can hook builtin.__import__)
  • It doesn't work particularly well if you want to apply multiple extensions from different sources (i.e. you can't really compose this approach)
  • It's weird that when you write import foo; print(foo) you get ufoo. Also help('modules') lists everything as ufoo.
  • It's just generally confusing and difficult to document and explain. We are slowly improving this, but there's still a lot of code out there writing "import ufoo".
  • Weak links don't apply to frozen modules, so it doesn't align well with non-built-in-but-frozen-and-kind-of-like-builtin modules like "uasyncio". You should be able to write import asyncio, and also it should be possible to extend asyncio with optional features.

Our goal here is usually just to provide "optional" implementations of functionality that aren't general-purpose enough to include in standard firmware, or possibly are just better suited to implementation in Python. (i.e. they could still be good candidates for being frozen?). But in general we're "filling in" missing functionality from the CPython equivalent module, and therefore there's no precedent for this in CPython. Note that CircuitPython does not currently enable the "weak link" feature and doesn't provide a way to extend built-in modules from Python.

I would argue that based on the reasons above, the "ufoo" mechanism doesn't support this use case particularly well, and the other historical reasons for the "u" prefix aren't compelling either. So let's say that in a future release (2.0?) that we remove all traces of the "u" prefix for built-in modules (i.e. rename all built-ins, remove the weak links feature, remove the last remaining traces from the documentation). This is obviously a hugely breaking change, would need to be done gradually.

So that means we need to solve what to do about extending built-in modules. There's four options that I can think of:

  1. Don't allow this. It's worked so far for CircuitPython, and it's entirely possible that many MicroPython users have never needed/wanted it either (I know this isn't quite true).

  2. Allow a way to "runtime patch" built-in modules. So instead of replacing "foo", you could import a module (e.g. import foo_ext) that would install/replace its own additional methods/classes into some built-in module foo. This is currently impossible because the modules' globals tables are in ROM. We have a partial mechanism for this that is selectively enabled for the sys module (to allow e.g. sys.ps1), and also the builtins module is special cased to allow this for all elements.

It seems feasible to implement this for all built-in modules without too much RAM or runtime performance cost. It composes well -- for example we can provide a bunch of different os_ext_foo, os_ext_bar, os_ext_baz to provide different extensions to the os module, and the user can choose which ones to install, and then "activate" them by importing them once at the top of main.py (or even boot.py). It's perhaps a bit awkward that you need this activation step though, but from that point on import os will have all the extensions ready.

It also addresses the asyncio case. We can rename the frozen uasyncio, and then provide optional extensions which install themselves at runtime (except for asyncio it's simple because it's frozen not built-in so the modules are already mutable).

In more detail, the idea would be to have lazy-constructed MP_STATE_VM(module_overrides), (either a single layer of module+qstr -> obj or two-layer module->(qstr->obj)) that is used preferentially for the module's globals dict. As a performance optimisation (especially for tab completion which does a lot of module lookups) we need a single bit somewhere to indicate that a module has had at least one override.

  1. Provide a better way to force importing of a built-in to replace the existing from ufoo import * approach. One simple example would be from builtins.foo import * or from micropython.foo import * (this is a fairly straightforward mod to objmodule.c and builtinimport.c to detect the "builtins." prefix and resolve the module via mp_module_get_builtin). Another alternative is to make builtins available in micropython.builtins or something (this would also provide a mechanism to programatically query built-ins which is occasionally requested). Another way could be to empty sys.path before importing (although this doesn't currently work, an empty sys.path is equivalent to ['']).

This doesn't solve the composability, or the fact that we still need to search the filesystem, or how to extend asyncio.

It also has a RAM cost for extending a builtin-module because the "from builtin.foo import *" needs to copy the entire globals dict of the built-in into the Python module that's replacing it.

  1. Do this via builtin.__import__. As mentioned above, it is actually technically possible in CPython to hook import, even for builtins, and then return an extended module. For example, in "os_ext_foo.py".
import builtins, sys

from os import *

# Add os.foo
def foo():
    print('hello foo')

x = builtins.__import__
def _hook(name, globals, locals, fromlist, level):
    if name == "os":
        return sys.modules[__name__]
    return x(name, globals, locals, fromlist, level)

builtins.__import__ = _hook

then

import os_ext_foo

import os
os.foo()

This composes well, avoids the filesystem search for builtin import, works for asyncio, but has a fairly high RAM cost for duplicating the globals table (potentially once for each extension) (edit: solution below), as well as the cost for the hook code.

CC a few people who have been involved in this topic in the past: @mattytrentini @andrewleech @tannewt @jepler @dhalbert @dlech @stinos

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementFeature requests, new feature implementations

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions