Thanks to visit codestin.com
Credit goes to github.com

Skip to content

JUMP_ABSOLUTE decompilation error #310

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
abmyii opened this issue Apr 2, 2020 · 6 comments
Closed

JUMP_ABSOLUTE decompilation error #310

abmyii opened this issue Apr 2, 2020 · 6 comments
Labels
Control Flow Problem has to do with bad control-flow detection insufficient bug report The instructions given when opening a new issue are not followed Python 3.8 Volunteer wanted Volunteer wanted to fix if a bug or to implement if a new feature.

Comments

@abmyii
Copy link

abmyii commented Apr 2, 2020

Description

Attempting to decompile a tkinter script which was extracted from a PyInstaller executable. I got this error:

Traceback (most recent call last):
  File "~/.local/bin/uncompyle6", line 10, in <module>
    sys.exit(main_bin())
  File "~/.local/lib/python3.6/site-packages/uncompyle6/bin/uncompile.py", line 194, in main_bin
    **options)
  File "~/.local/lib/python3.6/site-packages/uncompyle6/main.py", line 327, in main
    do_fragments,
  File "~/.local/lib/python3.6/site-packages/uncompyle6/main.py", line 225, in decompile_file
    do_fragments=do_fragments,
  File "~/.local/lib/python3.6/site-packages/uncompyle6/main.py", line 144, in decompile
    co, out, bytecode_version, debug_opts=debug_opts, is_pypy=is_pypy
  File "~/.local/lib/python3.6/site-packages/uncompyle6/semantics/pysource.py", line 2531, in code_deparse
    co, code_objects=code_objects, show_asm=debug_opts["asm"]
  File "~/.local/lib/python3.6/site-packages/uncompyle6/scanners/scanner38.py", line 106, in ingest
    jump_back_index = self.offset2tok_index[jump_target] - 1
KeyError: 4416

It seemed like an interesting and simple-ish issue so I decided to investigate! With this code below line 101, I found the problem.

# jump target instruction.

if token.attr == 4416:
	print()
	print(vars(token))
	print({i: self.offset2tok_index[i] for i in self.offset2tok_index if '4416' in str(i)})

This was the output:

{'kind': 'JUMP_ABSOLUTE', 'has_arg': True, 'attr': 4416, 'pattr': 4416, 'offset': 2428, 'linestart': None, 'opc': <module 'xdis.opcodes.opcode_38' from '~/.local/lib/python3.6/site-packages/xdis/opcodes/opcode_38.py'>, 'op': 113}
{'4416_0': 2222, '4416_1': 2223, '4416_4418': 2224}

I noticed that there was no 4416 key - all of the keys had _... values. After a bit more digging I saw that it was being added by these lines:

j = tokens_append(
j,
Token(
come_from_name,
jump_offset,
repr(jump_offset),
offset="%s_%s" % (inst.offset, jump_idx),
has_arg=True,
opc=self.opc,
has_extended_arg=False,
),
)

I don't understand why this JUMP doesn't have the "base" 4416 key, but I found a simple solution. I printed some other JUMP values and noticed that in every case - regardless of 1 or 3+ jumps with the same offset, the jump_back_index is self.offset2tok_index[last_index] - 1 - so in this case, the last 4416 jump is '4416_4418' and thus jump_back_index = self.offset2tok_index['4416_4418'] - 1. I don't understand why, however. So, in short, I changed the code to get the jump_back_index in this way, and it fixed the problem:

From:

jump_back_index = self.offset2tok_index[jump_target] - 1

To:

offset_instances = [inst for inst in self.offset2tok_index if str(jump_target) in str(inst)]
jump_back_index = self.offset2tok_index[offset_instances[-1]] - 1

And that fixes this problem.

This issue also applies to https://github.com/rocky/python-decompile3.

How to Reproduce

$ uncompyle6 Main.pyc
Traceback (most recent call last):
  File "~/.local/bin/uncompyle6", line 10, in <module>
    sys.exit(main_bin())
  File "~/.local/lib/python3.6/site-packages/uncompyle6/bin/uncompile.py", line 194, in main_bin
    **options)
  File "~/.local/lib/python3.6/site-packages/uncompyle6/main.py", line 327, in main
    do_fragments,
  File "~/.local/lib/python3.6/site-packages/uncompyle6/main.py", line 225, in decompile_file
    do_fragments=do_fragments,
  File "~/.local/lib/python3.6/site-packages/uncompyle6/main.py", line 144, in decompile
    co, out, bytecode_version, debug_opts=debug_opts, is_pypy=is_pypy
  File "~/.local/lib/python3.6/site-packages/uncompyle6/semantics/pysource.py", line 2531, in code_deparse
    co, code_objects=code_objects, show_asm=debug_opts["asm"]
  File "~/.local/lib/python3.6/site-packages/uncompyle6/scanners/scanner38.py", line 106, in ingest
    jump_back_index = self.offset2tok_index[jump_target] - 1
KeyError: 4416
$

A link to the pyc file: https://gofile.io/?c=MV8jCW

@abmyii abmyii changed the title JUMP_ABSOLUTE decompilation error JUMP_ABSOLUTE decompilation error (INCOMPLETE) Apr 2, 2020
@abmyii abmyii changed the title JUMP_ABSOLUTE decompilation error (INCOMPLETE) JUMP_ABSOLUTE decompilation error Apr 2, 2020
@abmyii
Copy link
Author

abmyii commented Apr 2, 2020

I'll submit a PR to whichever repo if this solution is acceptable. Also, I'd appreciate any insight to the questions I had!

@rocky
Copy link
Owner

rocky commented Apr 2, 2020

Thanks for looking at, reporting and investigating. I am a little short of time right now, but I'll be going over this in detail and will give detailed information and feedback when I have time which I hope will be soon.

@abmyii
Copy link
Author

abmyii commented Apr 2, 2020

No problem, thank you very much for your quick reply and for this awesome program!

@rocky
Copy link
Owner

rocky commented Apr 3, 2020

I just tried applying the change you suggested and while that no longer throws a KeyError exception, I am not getting a parse of the instructions and therefore no dcompilation.

If you are getting a decompilation, then attach the output of running uncompyle6 using options -agT.

Otherwise we can start the discussion here, but let's continue this in decompyle3 because that will be the easier place to fix and once that's done the fix can be backported here.

It seemed like an interesting and simple-ish issue so I decided to investigate!

That's the spirit! I applaud you. Alas after looking at this, it looks like it is not as simple as we would have liked...

I don't understand why this JUMP doesn't have the "base" 4416 key,

This program is huge. A disassembly of it is about 2.7K lines with 2K instructions in the main routine. A disassembly will show the that the instruction is:

            2468 JUMP_ABSOLUTE          4416 (to 4416)

And to be able to get the large number 4416 as the operand value, an EXTENDED_ARG instruction needs to precede that instruction. It looks like this:

         >> 2466 EXTENDED_ARG             17 (4352)

The "extended arg" instructions were rare in 2.7, but are now very common in Python 3.6 and above because the word size was reduced from 1-3 bytes to a fixed 2 bytes, one byte for an operand is too small especially with larger programs. The EXTENDED_ARG instruction wreaks havoc on a grammar based parsing program like uncompyle6 or decompye3 because now for every instruction there are possibly many forms of that instruction: the one without EXTENDED_ARG and those with one or more of them.

So what's done is we try to fold instructions with EXTENDED_ARG into one instruction. Of course the internal Python bytecode instruction object is not limited to one byte for jump addresses, so it can easily fit in say 4516 rather than have to represent this as 4352 in one instruction and 64 in the next. Also if we were not to combine the two numbers, it would wreak havoc on logic when we are trying to figure out where something jumps to.

But now, if we do this what should we call the offset of just combined instructions? The offset is just a string of the first EXTENDED_ARG offset and a string of the non-EXTENDED_ARG offset. Here this the offset value is 2466_2468

I hope this answers the questions here. For what should be done, and moving towards addressing this let's move the discussion to decompiyle3 where I'll post the remainder.

@rocky rocky added Python 3.8 Control Flow Problem has to do with bad control-flow detection labels Apr 3, 2020
@rocky rocky added insufficient bug report The instructions given when opening a new issue are not followed Volunteer wanted Volunteer wanted to fix if a bug or to implement if a new feature. labels Jul 4, 2022
@Berbe
Copy link
Contributor

Berbe commented Sep 18, 2022

Description

It seems I am running against a similar problem with another piece of bytecode, this time a Python 2.7 one, using uncompyle6 3.9.0a1 (source code from GitHub, current master branch).

Encountered error
Traceback (most recent call last):
  File "/home/user/venv/uncompyle6/bin/uncompyle6", line 11, in <module>
    load_entry_point('uncompyle6', 'console_scripts', 'uncompyle6')()
  File "/home/user/python-uncompyle6/uncompyle6/bin/uncompile.py", line 197, in main_bin
    result = main(src_base, out_base, pyc_paths, source_paths, outfile,
  File "/home/user/python-uncompyle6/uncompyle6/main.py", line 305, in main
    deparsed = decompile_file(
  File "/home/user/python-uncompyle6/uncompyle6/main.py", line 216, in decompile_file
    decompile(
  File "/home/user/python-uncompyle6/uncompyle6/main.py", line 143, in decompile
    deparsed = deparse_fn(
  File "/home/user/python-uncompyle6/uncompyle6/semantics/pysource.py", line 1376, in code_deparse
    deparsed.gen_source(
  File "/home/user/python-uncompyle6/uncompyle6/semantics/pysource.py", line 1164, in gen_source
    self.text = self.traverse(ast, is_lambda=is_lambda)
  File "/home/user/python-uncompyle6/uncompyle6/semantics/pysource.py", line 451, in traverse
    self.preorder(node)
  File "/home/user/python-uncompyle6/uncompyle6/semantics/pysource.py", line 429, in preorder
    super(SourceWalker, self).preorder(node)
  File "/home/user/venv/uncompyle6/lib/python3.9/site-packages/spark_parser/ast.py", line 117, in preorder
    self.preorder(kid)
  File "/home/user/python-uncompyle6/uncompyle6/semantics/pysource.py", line 429, in preorder
    super(SourceWalker, self).preorder(node)
  File "/home/user/venv/uncompyle6/lib/python3.9/site-packages/spark_parser/ast.py", line 110, in preorder
    func(node)
  File "/home/user/python-uncompyle6/uncompyle6/semantics/n_actions.py", line 192, in n_classdef
    self.build_class(subclass_code)
  File "/home/user/python-uncompyle6/uncompyle6/semantics/pysource.py", line 1134, in build_class
    self.gen_source(ast, code.co_name, code._customize)
  File "/home/user/python-uncompyle6/uncompyle6/semantics/pysource.py", line 1164, in gen_source
    self.text = self.traverse(ast, is_lambda=is_lambda)
  File "/home/user/python-uncompyle6/uncompyle6/semantics/pysource.py", line 451, in traverse
    self.preorder(node)
  File "/home/user/python-uncompyle6/uncompyle6/semantics/pysource.py", line 429, in preorder
    super(SourceWalker, self).preorder(node)
  File "/home/user/venv/uncompyle6/lib/python3.9/site-packages/spark_parser/ast.py", line 117, in preorder
    self.preorder(kid)
  File "/home/user/python-uncompyle6/uncompyle6/semantics/pysource.py", line 429, in preorder
    super(SourceWalker, self).preorder(node)
  File "/home/user/venv/uncompyle6/lib/python3.9/site-packages/spark_parser/ast.py", line 112, in preorder
    self.default(node)
  File "/home/user/python-uncompyle6/uncompyle6/semantics/pysource.py", line 872, in default
    self.template_engine(table[key.kind], node)
  File "/home/user/python-uncompyle6/uncompyle6/semantics/pysource.py", line 770, in template_engine
    self.preorder(node[index])
  File "/home/user/python-uncompyle6/uncompyle6/semantics/pysource.py", line 429, in preorder
    super(SourceWalker, self).preorder(node)
  File "/home/user/venv/uncompyle6/lib/python3.9/site-packages/spark_parser/ast.py", line 110, in preorder
    func(node)
  File "/home/user/python-uncompyle6/uncompyle6/semantics/n_actions.py", line 1017, in n_mkfunc
    self.make_function(node, is_lambda=False, code_node=code_node)
  File "/home/user/python-uncompyle6/uncompyle6/semantics/pysource.py", line 543, in make_function
    make_function2(self, node, is_lambda, nested, code_node)
  File "/home/user/python-uncompyle6/uncompyle6/semantics/make_function2.py", line 85, in make_function2
    code = Code(code, self.scanner, self.currentclass)
  File "/home/user/python-uncompyle6/uncompyle6/scanner.py", line 101, in __init__
    self._tokens, self._customize = scanner.ingest(co, classname, show_asm=show_asm)
  File "/home/user/python-uncompyle6/uncompyle6/scanners/scanner2.py", line 420, in ingest
    j = self.offset2inst_index[offset]
KeyError: 65587

Investigation

By using @abmyii's trick to edit uncompyle6 source code to add debug stanzas, I managed to isolate the problematic instruction from the disassembled bytecode:

65587 JUMP_ABSOLUTE        (to 65540)

The instructions were part of a for loop:

[...]
3796:     >> 65533 SETUP_LOOP           (to 65591)
            65536 LOAD_GLOBAL          (data)
            65539 GET_ITER
         >> 65540 FOR_ITER             (to 65590)
            65543 STORE_FAST           (element)

3797:        65546 LOAD_FAST            (element)
            65549 LOAD_CONST           (1)
            65552 BINARY_SUBSCR
            65553 LOAD_FAST            (self)
            65556 LOAD_ATTR            (marker)
            65559 COMPARE_OP           (==)
            65562 EXTENDED_ARG         (65536)
            65565 POP_JUMP_IF_FALSE    (to 65584)

3798:        65568 LOAD_GLOBAL          (data)
            65571 LOAD_ATTR            (remove)
            65574 LOAD_FAST            (element)
            65577 CALL_FUNCTION        (1 positional, 0 named)
            65580 POP_TOP
            65581 JUMP_FORWARD         (to 65584)
         >> 65584 EXTENDED_ARG         (65536)
            65587 JUMP_ABSOLUTE        (to 65540)
         >> 65590 POP_BLOCK

3799:     >> 65591 SETUP_LOOP           (to 65649)
[...]

The location where the problem declares itself, on a JUMP_ABSOLUTE, is preceded by an EXTENDED_ARG instruction.
IIUC, per documentation, EXTENDED_ARG's argument is supposed to contain a 2-byte value extending the value of the subsequent instruction, here JUMP_ABSOLUTE.

I was surprised to find the EXTENDED_ARG's value is exactly one bit over the maximum value 2 bytes can hold.
I found a code section in rocky/python-xdis which might be responsible for such a value, but the behaviour eludes me.

Of course the problem does not appear if there is no need for that EXTENDED_ARG, ie if the jump target instruction # is small enough to be contained into 2 bytes.

Reproduction

I was able to put together a few lines focused on that code section:

Code
data = [
    [
        "a",
        "b"
    ],
    [
        "c",
        "d"
    ]
]

class Test:
    marker = "b"
    def test(self):
<BEGIN repeat for padding>
        for element in data:
            if element[1] == self.marker:
                data.remove(element)
<END repeat for padding>

test = Test()
print(data)
test.test()
print(data)

@rocky rocky closed this as completed in 62760eb Sep 19, 2022
@rocky
Copy link
Owner

rocky commented Sep 19, 2022

Thanks - should be fixed in 62760eb

As for the xdis sequence decoding I don't see anything wrong with that. instructions abstract out EXTENDED_ARGS which is a limitation of the bytecode format. That code is in service of that to compute the offset value of of the instruction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Control Flow Problem has to do with bad control-flow detection insufficient bug report The instructions given when opening a new issue are not followed Python 3.8 Volunteer wanted Volunteer wanted to fix if a bug or to implement if a new feature.
Projects
None yet
Development

No branches or pull requests

3 participants