Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[mypyc] Add native char type + codepoint fast paths for str ops#3

Closed
tobymao wants to merge 1 commit intorelease-1.20from
fast-codegen
Closed

[mypyc] Add native char type + codepoint fast paths for str ops#3
tobymao wants to merge 1 commit intorelease-1.20from
fast-codegen

Conversation

@tobymao
Copy link
Copy Markdown
Collaborator

@tobymao tobymao commented Apr 13, 2026

Summary

Adds a first-class char native type to mypyc, modeled on i64: stored unboxed as int32 codepoint, with -1 as the empty-string sentinel, and bidirectional strchar promotion. Unblocks several codepoint-level fast paths in per-char loops.

Motivation

sqlglot's tokenizer hot loop is dominated by per-char work — each iteration does s[i], character classification (.isspace() / .isdigit() / ...), and equality compares against 1-char literals. Under stock mypyc each of those allocates a 1-char PyObject and dispatches through generic unicode handling. With char, all of these become int ops on a codepoint.

Changes

Core type plumbing

  • MYPYC_NATIVE_CHAR_NAMES alongside MYPYC_NATIVE_INT_NAMES
  • str ↔ char bidirectional _promote in semanal_classprop (mirrors int ↔ i64)
  • str covers char in subtypes.covers_at_runtime; str/char overlap in meet
  • char_rprimitive (int32, is_native_int, error_overlap=False)
  • mypy_extensions.char stub with the actually-used methods

Boxing / unboxing

  • CPyChar_FromObject (accepts 0/1-char str, returns -113 on type error, -1 for empty)
  • CPyChar_ToStr (uses interned empty-str singleton for -1)
  • bool(char) checks != -1 so "\0" stays truthy

Codegen fast paths (in transform_comparison_expr)

  • char == char / char == "x" → int compare of codepoint
  • s[i] == "x" where s: str → int compare against codepoint (no 1-char PyObject alloc)
  • ord(s[i]) refactored to share the same codepoint read path
  • char.isspace/isdigit/isalnum/isalpha/isidentifier/upper method_ops route to codepoint-taking C helpers in str_extra_ops.h
  • CPyChar_Upper is ASCII-only (matches the common {chr(i): chr(i).upper() for i in range(97,123)} idiom — fallback to str(c).upper() for full Unicode casing)

New IR transform passes (run after lower_ir, before dep collection)

  • char_str_index_fold: folds Unbox(CPyStr_GetItem(s, i) -> char) into CPyStr_GetCharAt (int32 read), eliminating the 1-char PyObject alloc
  • str_buffer_hoist: for function-argument strings, hoists PyUnicode_KIND / PyUnicode_DATA reads out of per-char loops (strings are immutable so it's safe)

Misc

  • Adds str.isalpha() method_op via CPyStr_IsAlpha (used by sqlglot/parser.py)

Perf

sqlglot parse benchmark — char variant vs stock mypyc (this same PR base, compiled without char annotations):

query mypyc char Δ
tpch 1.27ms 0.66ms +91.6%
deep_arithmetic 4.05ms 2.24ms +80.7%
many_numbers 34.3ms 27.1ms +26.5%
short 93.6µs 75.7µs +23.6%
nested_functions 231µs 192µs +20.7%
geomean (16 queries) +17.6%

Also +332% geomean vs pure Python.

Test plan

  • tests/test_tokens.py + tests/test_parser.py — 73 passed
  • sqlglot parse benchmark suite — no regressions outside noise
  • Full mypyc test suite (CI)

@tobymao tobymao changed the base branch from release-1.19 to native-mode April 13, 2026 04:47
@tobymao tobymao changed the base branch from native-mode to release-1.20 April 13, 2026 04:50
Adds a first-class `char` native type to mypyc, modeled on i64: stored
unboxed as int32 codepoint, with -1 as the empty-string sentinel, and
bidirectional str<->char promotion. Unblocks several codepoint-level
fast paths in per-char loops.

Core type plumbing:
- `MYPYC_NATIVE_CHAR_NAMES` alongside `MYPYC_NATIVE_INT_NAMES`
- str <-> char bidirectional `_promote` in semanal_classprop
- str covers char in subtypes.covers_at_runtime + overlap in meet
- `char_rprimitive` (int32, is_native_int, error_overlap=False)
- `mypy_extensions.char` stub with `.is*()`, `.upper()`, `.strip()`

Boxing / unboxing:
- `CPyChar_FromObject` (accepts 0/1-char str, -113 on type error)
- `CPyChar_ToStr` (uses interned empty-str singleton for -1)
- `bool(char)` checks `!= -1`, not `!= 0`, so "\\0" stays truthy

Codegen fast paths:
- `char == char` / `char == "x"` / `s[i] == "x"` specializers in
  transform_comparison_expr compile to int compare of the codepoint
- `ord(s[i])` refactored to share the codepoint read path
- `char.isspace/isdigit/isalnum/isalpha/isidentifier/upper` method_ops
  route to codepoint-taking C helpers in str_extra_ops.h

Two new IR transform passes (run after lower_ir, before dep collection):
- char_str_index_fold: folds `Unbox(CPyStr_GetItem(s, i) -> char)` to a
  direct `CPyStr_GetCharAt` int32 read, avoiding the 1-char PyObject alloc
- str_buffer_hoist: for function-arg strings, hoists PyUnicode_KIND/DATA
  reads out of per-char loops (strings are immutable so it's safe)

Also adds `str.isalpha()` method_op via `CPyStr_IsAlpha`.

Perf (sqlglot parse benchmarks, char vs stock mypyc):
- tpch:           +91.6%   (1.27ms -> 0.66ms)
- deep_arithmetic: +80.7%
- many_numbers:   +26.5%
- geomean:        +17.6% across 16 queries
@VaggelisD
Copy link
Copy Markdown
Owner

Ended up checking this branch out locally, applying some fixes, adding tests etc and pushing it on top of release-1.20 branch so closing this PR

@VaggelisD VaggelisD closed this Apr 21, 2026
VaggelisD added a commit that referenced this pull request Apr 23, 2026
…sqlglot)

This is the minimal set of fixes needed for `separate=True` to build and run
correctly against sqlglot, a ~100-module project with cross-group class
inheritance, generator helper classes, non-ext subclasses with fast methods,
and mutually-dependent compiled modules. Each of the fixes below is a real
bug that was never hit by mypy itself (mypy's setup.py uses multi_file on
Windows only, never separate=True) or by the toy fixtures in mypyc's
TestRunSeparate.

1. Non-extension classes never have vtables -- short-circuit is_method_final
   to True for them so codegen doesn't try to index into a vtable that
   compute_vtable skipped.

2. emit_method_call: under separate=True, a method's FuncIR body may live in
   another group while only its FuncDecl is visible here. Use method_decl(name)
   instead of get_method(name).decl -- the decl is enough to emit a direct C
   call. Split native_function_type to accept a decl too.

3. Cross-group native/Python-wrapper calls weren't routing through the
   exports-table indirection at a dozen sites in emitwrapper / emitfunc /
   emitclass. Added Emitter.native_function_call(decl) and
   Emitter.wrapper_function_call(decl) helpers and migrated all offending
   sites. Also made CPyPy_* wrapper declarations needs_export=True so those
   symbols reach the exports table.

4. Defer cross-group imports to shim load time. The shared lib's exec_
   function used to PyImport_ImportModule sibling groups at PyInit time,
   which re-enters the enclosing package's __init__.py mid-flight and blows
   up on partial-init attribute walks. Split exec_ into a self-contained
   capsule-setup phase (runs in PyInit) and a deferred ensure_deps_<short>()
   (runs from the shim just before per-module init). Shim uses
   PyImport_ImportModuleLevel with a non-empty fromlist so the lookup
   returns the leaf directly via sys.modules, and fetches capsules via
   PyObject_GetAttrString instead of PyCapsule_Import (which itself performs
   the same dotted attribute walk).

5. Fix broken fallback in lib-rt CPyImport_ImportFrom: the code tried
   PyObject_GetItem(module, fullname) where it intended PyImport_GetModule
   (comment says as much). Modules don't implement __getitem__, so the
   fallback always raised TypeError. Also Py_XDECREF the potentially-NULL
   package_path in the error path.

6. Incremental-mode plumbing for separate=True: compile_modules_to_ir now
   syncs freshly built ClassIR/FuncIR into deser_ctx so later cache-loaded
   SCCs can resolve cross-SCC references. load_type_map tolerates mypy's
   synthetic TypeInfo entries (e.g. "<subclass of X and Y>") that have no
   corresponding mypyc ClassIR.

Also adds three regression tests targeted to fail on TestRunSeparate
without the fixes above:

- testSeparateCrossGroupEnumMethod exercises fix #1.
- testSeparateCrossGroupGenerator exercises fix #2.
- testSeparateCrossGroupInheritedInit exercises fix #3.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants