-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Align psuedo ops to CPython 3.14.2 #6846
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Caution Review failedThe pull request is closed. 📝 WalkthroughWalkthroughThis PR refactors attribute and super attribute load instructions across the compiler stack. It introduces encoded opargs for LoadAttr/LoadSuperAttr operations, consolidates multiple pseudo-instruction variants into single encoded forms, removes public re-exports of encoding helpers, and updates bytecode instruction definitions accordingly. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Code has been automatically formatted The code in this PR has been formatted using:
git pull origin pseudo-ops |
136d0e1 to
c2ad5eb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@crates/compiler-core/src/bytecode/instruction.rs`:
- Around line 474-483: The Arg<NameIdx> used for the LoadAttr operand currently
holds an encoded value which is confusing and error-prone; introduce a new
OpArgType called EncodedLoadAttrArg and replace uses of Arg<NameIdx> for
LoadAttr so the type signals that decoding is required. Update the emitter sites
(emit_load_attr, emit_load_attr_method) to produce EncodedLoadAttrArg (using
encode_load_attr_arg), update consumers (stack_effect implementation for
LoadAttr, disassembly, and the load_attr runtime) to accept EncodedLoadAttrArg
and call decode_load_attr_arg before extracting the name index, and adjust any
trait impls or type aliases so Arg<EncodedLoadAttrArg> integrates with the
existing OpArgType system.
In `@crates/stdlib/src/_opcode.rs`:
- Around line 154-156: Remove the public exposure of the internal macro by
deleting or disabling the #[pyattr] export for ENABLE_SPECIALIZATION_FT (and any
ENABLE_SPECIALIZATION) so they are not added to the opcode module; keep the
internal constants locally if needed but do not annotate them with #[pyattr] (or
change to private/internal symbol) and ensure only CPython-public attributes
(e.g., cmp_op, stack_effect, hascompare, opname, opmap, HAVE_ARGUMENT,
EXTENDED_ARG, hasarg, hasconst, hasname, hasjump, hasjrel, hasjabs, hasfree,
haslocal, hasexc) remain exported from the module.
🧹 Nitpick comments (1)
crates/compiler-core/src/bytecode/instruction.rs (1)
932-967: Pseudo-op range alignment looks consistent; do a clean build to avoid stale bytecode artifacts.
Based on learnings, please run a clean build and clear any stale.pyc/target/debug/build/rustpython-*artifacts after bytecode instruction changes.
| Self::LoadFromDictOrDeref(_) => 1, | ||
| Self::StoreSubscr => -3, | ||
| Self::DeleteSubscr => -2, | ||
| Self::LoadAttr { .. } => 0, | ||
| Self::LoadAttr { idx } => { | ||
| // Stack effect depends on method flag in encoded oparg | ||
| // method=false: pop obj, push attr → effect = 0 | ||
| // method=true: pop obj, push (method, self_or_null) → effect = +1 | ||
| let (_, is_method) = decode_load_attr_arg(idx.get(arg)); | ||
| if is_method { 1 } else { 0 } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Locate all LoadAttr construction sites and encode/decode usage
echo "=== LoadAttr construction/usage sites ==="
rg -n -C2 --type=rs 'LoadAttr\s*\{|Instruction::LoadAttr\b' crates/
echo -e "\n=== encode_load_attr_arg and decode_load_attr_arg usage ==="
rg -n -C2 --type=rs 'encode_load_attr_arg|decode_load_attr_arg' crates/Repository: RustPython/RustPython
Length of output: 238
🏁 Script executed:
#!/bin/bash
# Locate all LoadAttr construction sites and encode/decode usage
echo "=== LoadAttr construction/usage sites ==="
rg -n -C2 'LoadAttr\s*\{|Instruction::LoadAttr' crates/
echo -e "\n=== encode_load_attr_arg and decode_load_attr_arg usage ==="
rg -n -C2 'encode_load_attr_arg|decode_load_attr_arg' crates/
echo -e "\n=== Files containing these terms ==="
rg -l 'LoadAttr|encode_load_attr_arg|decode_load_attr_arg' crates/Repository: RustPython/RustPython
Length of output: 7418
🏁 Script executed:
#!/bin/bash
# Search for any direct LoadAttr construction without going through emit functions
echo "=== Search for direct LoadAttr { idx construction (potential bypass) ==="
rg -n 'LoadAttr\s*\{\s*idx\s*:' crates/
echo -e "\n=== Check emit_load_attr calls to confirm all usages go through encoding ==="
rg -n 'emit_load_attr' crates/
echo -e "\n=== Check Arg type definition ==="
rg -n 'pub struct Arg' crates/ -A 5
echo -e "\n=== Check if there are any other places calling Instruction::LoadAttr ==="
rg -n 'Instruction::LoadAttr' crates/ -C 3Repository: RustPython/RustPython
Length of output: 8611
🏁 Script executed:
#!/bin/bash
# Get more context on Arg<T> and OpArgType definition
echo "=== Arg<T> definition and usage ==="
rg -n 'impl.*Arg<' crates/compiler-core/src/bytecode/instruction.rs -A 10 | head -50
echo -e "\n=== OpArgType trait definition ==="
rg -n 'pub trait OpArgType|pub enum OpArgType' crates/ -A 5
echo -e "\n=== NameIdx type definition ==="
rg -n 'type NameIdx|pub.*NameIdx' crates/compiler-core/src/bytecode/instruction.rs | head -10
echo -e "\n=== LoadAttr field type vs actual usage ==="
sed -n '167,169p' crates/compiler-core/src/bytecode/instruction.rs
echo "---"
sed -n '842,846p' crates/compiler-core/src/bytecode/instruction.rsRepository: RustPython/RustPython
Length of output: 1779
🏁 Script executed:
#!/bin/bash
# Search for NameIdx definition
echo "=== NameIdx definition ==="
rg -n 'type NameIdx\s*=|pub.*NameIdx\s*=' crates/ -A 2
echo -e "\n=== Arg.get method implementation ==="
rg -n 'pub fn get\|pub const fn get' crates/compiler-core/src/bytecode/instruction.rs -A 8
echo -e "\n=== How idx.get is used in all LoadAttr contexts ==="
rg -n 'idx\.get\(' crates/ -B 2 -A 1
echo -e "\n=== Check if NameIdx implements OpArgType ==="
rg -n 'impl.*OpArgType.*NameIdx\|NameIdx.*OpArgType' crates/Repository: RustPython/RustPython
Length of output: 13278
🏁 Script executed:
#!/bin/bash
# Find the .get() method on Arg<T>
echo "=== Search for .get method on Arg<T> ==="
rg -n '\.get\s*\(' crates/compiler-core/src/bytecode/instruction.rs -B 5 | head -40
echo -e "\n=== Full Arg implementation ==="
sed -n '1177,1230p' crates/compiler-core/src/bytecode/instruction.rs
echo -e "\n=== Check for OpArgType impl for NameIdx/u32 ==="
rg -n 'impl.*OpArgType' crates/compiler-core/src/bytecode/oparg.rs -A 10 | head -80Repository: RustPython/RustPython
Length of output: 5623
Type safety for encoded LOAD_ATTR args is currently correct but could be improved.
All 2 emission sites (emit_load_attr, emit_load_attr_method) properly use encode_load_attr_arg, and all 3 consumption sites (stack_effect, disassembly, load_attr runtime) properly decode before using the name index. However, the field type Arg<NameIdx> is misleading since it contains an encoded value, not a raw name index. Since Arg<u32>.get() just returns the raw u32 without decoding, a future maintainer could naively use it as a direct index. Consider introducing a dedicated EncodedLoadAttrArg type implementing OpArgType to make the encoding requirement explicit in the type system.
🤖 Prompt for AI Agents
In `@crates/compiler-core/src/bytecode/instruction.rs` around lines 474 - 483, The
Arg<NameIdx> used for the LoadAttr operand currently holds an encoded value
which is confusing and error-prone; introduce a new OpArgType called
EncodedLoadAttrArg and replace uses of Arg<NameIdx> for LoadAttr so the type
signals that decoding is required. Update the emitter sites (emit_load_attr,
emit_load_attr_method) to produce EncodedLoadAttrArg (using
encode_load_attr_arg), update consumers (stack_effect implementation for
LoadAttr, disassembly, and the load_attr runtime) to accept EncodedLoadAttrArg
and call decode_load_attr_arg before extracting the name index, and adjust any
trait impls or type aliases so Arg<EncodedLoadAttrArg> integrates with the
existing OpArgType system.
crates/codegen/src/ir.rs
Outdated
| } | ||
|
|
||
| // Real instructions are already encoded by compile.rs | ||
| if info.instr.real().is_some() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd replace the line 214, no need to check twice for if it's real/pseudo
16cfbe1 to
1c95128
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@crates/stdlib/src/_opcode.rs`:
- Around line 320-326: The current get_special_method_names function only
returns the four context-manager names; replace or extend that array to match
CPython's full _Py_SpecialMethods list so callers receive the complete set of
special method names. Locate or create a static Rust slice/const (e.g.,
SPECIAL_METHODS or _PY_SPECIAL_METHODS) containing all CPython special method
strings, then change get_special_method_names to iterate over that slice (using
the existing .into_iter().map(|x| vm.ctx.new_str(x).into()).collect()) to
produce the Vec<PyObjectRef>; ensure the symbol name get_special_method_names
remains unchanged and the returned type stays Vec<PyObjectRef>.
| #[pyfunction] | ||
| fn get_executor(_code: PyObjectRef, vm: &VirtualMachine) -> PyResult<PyObjectRef> { | ||
| // TODO | ||
| fn get_special_method_names(vm: &VirtualMachine) -> Vec<PyObjectRef> { | ||
| ["__enter__", "__exit__", "__aenter__", "__aexit__"] | ||
| .into_iter() | ||
| .map(|x| vm.ctx.new_str(x).into()) | ||
| .collect() | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
cat -n crates/stdlib/src/_opcode.rs | sed -n '310,330p'Repository: RustPython/RustPython
Length of output: 754
🏁 Script executed:
cd crates/stdlib && cargo clippy --all-targets 2>&1 | head -100Repository: RustPython/RustPython
Length of output: 174
🏁 Script executed:
rg "get_special_method_names\|__enter__\|__exit__" crates/stdlib/src/_opcode.rs -A 2 -B 2Repository: RustPython/RustPython
Length of output: 47
🏁 Script executed:
rg "get_intrinsic1_descs|get_intrinsic2_descs" crates/stdlib/src/_opcode.rs -A 6Repository: RustPython/RustPython
Length of output: 639
🏁 Script executed:
rg "class.*Manager|__enter__|__exit__|__aenter__|__aexit__" --type rs | head -30Repository: RustPython/RustPython
Length of output: 91
🏁 Script executed:
cat -n crates/stdlib/src/_opcode.rs | sed -n '1,50p'Repository: RustPython/RustPython
Length of output: 1893
🏁 Script executed:
rg "get_special_method_names" crates/stdlib/Repository: RustPython/RustPython
Length of output: 168
🏁 Script executed:
rg "__enter__|__exit__|__aenter__|__aexit__" crates/stdlib/src/_opcode.rsRepository: RustPython/RustPython
Length of output: 125
🏁 Script executed:
git log --oneline -10 crates/stdlib/src/_opcode.rsRepository: RustPython/RustPython
Length of output: 99
🏁 Script executed:
find crates -name "*test*" -type f | xargs grep -l "special_method\|__enter__\|__exit__" 2>/dev/null | head -5Repository: RustPython/RustPython
Length of output: 159
🏁 Script executed:
rg "def get_special_method_names|special_method" crates/stdlib/src/_opcode.rs -B 5 -A 10Repository: RustPython/RustPython
Length of output: 446
🏁 Script executed:
cat -n crates/stdlib/src/_opcode.rs | sed -n '320,350p'Repository: RustPython/RustPython
Length of output: 776
🌐 Web query:
CPython _opcode module get_special_method_names function
💡 Result:
In CPython, _opcode.get_special_method_names() is a CPython-internal helper that returns a Python list of the interpreter’s “special method” (dunder) names.
Implementation (CPython 3.14): it allocates a list of length SPECIAL_MAX + 1 and fills it with _Py_SpecialMethods[i].name for i = 0..SPECIAL_MAX. The returned list index corresponds to CPython’s internal “special method” id. [1]
API docstring in the generated clinic header: “Return a list of special method names.” [2]
Use:
import _opcode
names = _opcode.get_special_method_names()Sources: [1] [2]
🏁 Script executed:
rg "ASYNC_WITH|WITH_CLEANUP|context.*manager" crates/ --type-list | head -10Repository: RustPython/RustPython
Length of output: 454
🏁 Script executed:
cat -n crates/stdlib/src/_opcode.rs | sed -n '280,330p'Repository: RustPython/RustPython
Length of output: 2069
Incomplete implementation: Returns only context manager special methods, not all special methods.
The function syntactically follows the established pattern (array → .into_iter().map().collect()), but does not match CPython's get_special_method_names(). CPython's implementation returns a comprehensive list of all special method names (indexed from an internal _Py_SpecialMethods array), while this implementation returns only the four context manager methods: __enter__, __exit__, __aenter__, __aexit__. This implementation is incomplete relative to CPython and may cause compatibility issues if callers expect the full set of special method names.
🤖 Prompt for AI Agents
In `@crates/stdlib/src/_opcode.rs` around lines 320 - 326, The current
get_special_method_names function only returns the four context-manager names;
replace or extend that array to match CPython's full _Py_SpecialMethods list so
callers receive the complete set of special method names. Locate or create a
static Rust slice/const (e.g., SPECIAL_METHODS or _PY_SPECIAL_METHODS)
containing all CPython special method strings, then change
get_special_method_names to iterate over that slice (using the existing
.into_iter().map(|x| vm.ctx.new_str(x).into()).collect()) to produce the
Vec<PyObjectRef>; ensure the symbol name get_special_method_names remains
unchanged and the returned type stays Vec<PyObjectRef>.
ShaharNaveh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
576421e to
c0bdb9a
Compare
📦 Library DependenciesThe following Lib/ modules were modified. Here are their dependencies: [+] lib: cpython/Lib/_opcode_metadata.py dependencies:
dependent tests: (13 tests) Legend:
|
Summary by CodeRabbit
Release Notes
New Features
Refactor
✏️ Tip: You can customize this high-level summary in your review settings.