Replace system readelf with pyelftools for multi-arch support#3470
Replace system readelf with pyelftools for multi-arch support#3470carlpinto25 wants to merge 32 commits intopwndbg:devfrom
Conversation
tests/unit_tests/test_readelf.py
Outdated
|
|
||
| # Mock gdb before importing pwndbg | ||
| sys.modules["gdb"] = MagicMock() | ||
| sys.modules["gdb"].VERSION = "12.1" |
There was a problem hiding this comment.
I believe this is wrong / not needed.
There was a problem hiding this comment.
If we do need to mock gdb I think we have something done for it already somewhere?
There was a problem hiding this comment.
yes there is i updated the code
tests/unit_tests/test_readelf.py
Outdated
| assert isinstance(item["value"], int) | ||
| assert isinstance(item["name"], str) | ||
|
|
||
| print("test_get_got_entry passed") |
There was a problem hiding this comment.
my bad its removed now
| # Ensure types for mypy | ||
| assert isinstance(offset, int) | ||
| assert isinstance(value, int) | ||
| assert isinstance(name, str) |
There was a problem hiding this comment.
Can't we just do sth like offset: int = ... instead or it won't work?
There was a problem hiding this comment.
yes its possible i updated it
pwndbg/wrappers/readelf.py
Outdated
| for c in RelocationType: | ||
| if c.name in category: | ||
| entries[c].append(line) | ||
| def get_got_entry(local_path: str) -> Dict[RelocationType, List[Dict[str, int | str]]]: |
There was a problem hiding this comment.
Why did you rewrite the logic here? Do the old logic with parsing lines had any bugs or something?
I get it we may want to return a structured output - that's fine - but - does the code do exactly the same stuff now?
| # Check types | ||
| assert isinstance(item["offset"], int) | ||
| assert isinstance(item["value"], int) | ||
| assert isinstance(item["name"], str) |
There was a problem hiding this comment.
Honestly, the test should really check for some values here instead of just checking if the types match
There was a problem hiding this comment.
Added checks for specific symbol names (puts, __libc_start_main) and GLIBC version presence
disconnect3d
left a comment
There was a problem hiding this comment.
Please see comments. I don't quite understand why did we change the get_got_entry parsing. I understand the want of it returning a dict, though a NamedTuple would be much better here, but I don't understand why did we change the parsing logic to something else.
Some questions and notes, apart from what is there in the comments:
- Was there any deficiency or a bug with previous parsing logic?
- The
tests/binaries/host/reference-binarymust be removed from the PR. Isn't it gitignored anyway?
- Added test_get_got_entry_direct() that tests pyelftools directly without complex mocks - Checks for specific symbols: puts, __libc_start_main - Verifies GLIBC version suffixes are present - Fixed dbg mock to implement name() method Addresses reviewer feedback to check actual values, not just types.
|
@disconnect3d The old logic had executed the system's readelf command, which is architecture-specific. When debugging an ARM binary on x86_64, the host's readelf fails to parse ARM-specific structures correctly, causing incorrect/missing GOT entries. This is literally issue #3295.
Agreed it'd be cleaner. Should I change it in this PR?
i removed it |
These are no longer needed as mypy no longer flags these pyelftools calls.
The CI error about 'unused type: ignore' was incorrect - these comments are required for mypy --strict to pass locally.
|
@disconnect3d I'm seeing conflicting mypy requirements for the pyelftools calls in With # type: ignore[no-untyped-call]: Without the comments: |
Please make strict happy ideally by adding additional annotations instead of 'unused type ignore' comments. Anyway, If one works and another fails, don't worry and let us know, we are going to fix this soon (we just noticed this problem too). |
|
@disconnect3d Thanks for the suggestion!
adding annotations didnt work,snice nothing what i tried worked i did Mypy config override since it would be cleaner as temporary fix |
| "pwndbg.wrappers.readelf", | ||
| "pwndbg.commands.got", | ||
| ] | ||
| disable_error_code = ["no-untyped-call"] |
There was a problem hiding this comment.
What do the tool mypy overrides module do even ugh
There was a problem hiding this comment.
we dont really want it but idk of a better solution so yeah its fine
There was a problem hiding this comment.
or maybe add the #[no-typed-call] at the end of the effected lines rather the entire file
Resolved conflicts in: - pwndbg/commands/got.py - pwndbg/wrappers/readelf.py - pyproject.toml - tests/unit_tests/test_readelf.py Kept symbol version support from readelf branch while integrating ruff isort configuration and other updates from dev branch.
- Fix ruff formatting in test_readelf.py - Add type assertions in got.py for mypy - Regenerate documentation
- Use modern type annotations (dict/list) - Fix MockDebugger return type - Remove unused type:ignore comments
CI runs mypy --strict which requires ignores for untyped third-party library calls
Fixes conflict between regular mypy and --strict: - Strict mode requires type:ignore for untyped libs - Regular mode warns about unused ignores Solution: disable the warning for test files
|
I'd also suggest that the readelf wrapper file should be removed since it is now technically not a wrapper anymore. It's one functions anyways and only gets invoked in one place so we should move it into commands/got.py |
This PR replaces the dependency on the system's readelf command with the pyelftools library. This addresses issue #3295 by allowing pwndbg to correctly analyze binaries from foreign architectures (e.g., ARM on x86) without relying on the host's toolchain.