-
Notifications
You must be signed in to change notification settings - Fork 299
peports: Add -m flag to print PE machine type #271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Turns out the mingw-w64 headers are actually missing some entries. I'll use the list from microsoft's documentation: |
d37afc9 to
9b69799
Compare
|
Thanks! I like this, and I plan to merge it, but I want to sort out some
details:
1. Should -m behave like -i and -e? That is, disable other output not
explicitly requested?
2. A goal is unambiguous output, hence escaping of non-ASCII bytes, or
characters peports uses for its own delimiters. I'm not entirely happy
with "EXPORTS" in the same position as module names. In theory a module
could be called "EXPORTS" and so the default output is ambiguous: Are
these exports, or are these imports from the module named "EXPORT"? The
EXPORTS heading should probably be delimited by a character disallowed in
module names, which would be escaped when printed as a module name.
This is relevant because there's a new ambiguity with "MACHINE TYPE", and
so perhaps it and "EXPORTS" should get unambiguous representation. Colon
is practically disallowed in module names, though I'd still like them
unescaped in symbol names. Backslash is disallowed, too, and already
reserved. So maybe open with a backslash like "\exports" "\machine"? Those
are distinct from the \x escapes, too. Angle brackets are already
reserved, too, like "<exports>" or "<machine>". Or even ">machine"?
3. There's no "fat PE" (right?) and so an image can only have a single
machine type. For consistency it's formatted like a list though it can
only have on item. Perhaps it should have a different syntax entirely?
"\machine AMD64 0x8664"?
Drawing on some of this for a hypothetical output:
$ peports -m example.dll
\machine I386 0x014c
$ peports -iem example.dll
\machine I386 0x014c
\exports
example_main
example_version
KERNEL32.dll
ReadFile
WriteFile
I like "machine" being one word instead of "machine type" so that it
doesn't occupy two "fields" (e.g. in awk).
Ideas? What do you think?
|
I think yes. I.e.
I think this is overengineering. The parser of We have one raw value with If we want it more human readable, then maybe If we want both under one option, then I think a form of In general, I don't think there's much value in printing both imports and exports together, and it makes the output harder to make unambiguous. It can be convenient, but personally I like explicit and simple. I think it's best to default to |
Yeah, agreed, I think it should do this. I am leaning towards
Yes. an image can only have a single machine type. The Mac thing of having multiple executables in the file and selecting the one by architecture isn't done by windows. The closest is the aforementioned ARM64EC |
9b69799 to
f92d448
Compare
|
One more clarification: Do you have a preference if -m works by default (with no arguments), or would you prefer if it's only enabled if requested? |
|
Hmm. Is there any reason the number of exports is counted, but the number of imports isn't? How about we allow any combination of -mei and separate them like this: exports would be blank if there aren't any (probably the case for a majority of exes). Or is this becoming too complicated or similar to gendef? |
|
An idea about options usage:
|
I prefer only enabled on requested, and per (1) requesting it puts peports in requested-outputs-only mode. When I casually point peports at an EXE or DLL, the current behavior is pretty much exactly what I want. With a (typical) EXE I get a listing of imports and nothing else, and the machine type is probably already known or unimportant. With a DLL, the exports are most important, and they come first. Thinking about it more, maybe even better default behavior would be: show only exports, but if there are none then show imports. So then you get only the most important information for either "kind" of PE. Don't worry about that in this MR, though.
I don't understand. What do you mean?
While I was initially writing peports, I started with hierarchical printing and I didn't like it.
Alright, let's go with that, and so And with that I think it's ready to merge.
I agree, that's a good point. (It doesn't necessarily need to happen in this MR.)
If the machine type is checked in a separate call from extracting exports/imports, then the PE will be handled twice (i.e. opened and read twice). Better to request everything at once, and if the machine type is wrong, discard the rest. So that means parsing all the output together, even if the machine type is known to list first.
The order isn't important to me (hex or name first) at least so long as the name is only a single word, but I really do not like the parentheses. They're extraneous and do not disambiguate output. Parsers that want the name have to do work remove the parentheses. In listings, That's a good point about
I know you're not actually proposing JSON, but as is more often the case than people appreciate, JSON is limited to unicode text and cannot itself represent byte strings. That's a substantial handicap in systems programming. For example, JSON cannot represent arbitrary file names, either on unix or Windows. In PE images, names and modules are supposed to be ASCII-only, though in practice they're really arbitrary byte strings. So they may not be representable in JSON. Storing PE listings as JSON would require an extra layer of encoding within JSON strings, which is mostly back to square one. |
You don't know the contexts where it's going to be used, or whether the actual arch is important, regardless of the need for deps to have the same arch. There's a reason why "do one thing well" exists. Because it's easier to handle one kind of thing at a time. It might be theoretically more efficient to process all the outputs together, but this necessarily complicates the output handling. It can be preferable for the caller to call it twice, once for each form of output. They may or may not care about performance, or they may, but not enough in this case (the delta of one extra invocation is likely negligible in the grand scheme of things, while more complex code to handle it is not negligible). At least make this exception: If only one output type is requested, then don't decorate it with title and other unrelated outputs. The invoker knows what they requested, and extra decoration only makes it more cumbersome to handle. |
You know it's a single word, but does the user know that too? If it's not documented, then they can't assume what chars it contains, including spaces and parenthesis and whatever. Would you make such an assumption about some 3rd party tool which prints the architecture of a binary? But users can assume safely and logically that a hex value is a single word.
Correct, and I also don't know enough about these names, their constraints, and their potential encoding. Keep in mind that if they're arbitrary bytes then a As for JSON, if it was on the table, then it could be encoded, like base64. |
|
By the way, how are file names (modules) encoded? (at the binary itself which needs to load the modules, and at the printout) |
Hmm, that sounds reasonable. So then maybe: (Again, this doesn't have to happen in this MR.)
True, but that's just a couple sentences of documentation from being settled. A motivation for peports was that none of the existing tools of which I'm aware handle weird inputs well. They crash (see #135), or byte strings are decoded in some ambiguous way, or output is simply missing due to imprecise decoding/encoding. (It's also, in general, dangerous to use link.exe or Binutils on untrusted PE images, but safe with peports.) So faced with an edge case, I might wonder, "What exactly did the linker produce for this unicode function name?" The only way to inspect weird export/import tables was tediously via a hexdump. For instance, the situation in Windows dynamic linking depends on the active code page is essentially only debuggable by peports or a manual hexdump inspection. The unambiguous output was first and foremost for human inspection, without any code page contamination. It just so happens that unambiguous output is also useful for robust machine consumption.
Officially, it's ASCII-only: This has the funny situation that non-ASCII DLL names are disallowed, or at least impractical, e.g. Hence the importance of utilities like peports not decoding it at all! If a program did link a There's of course the general problem with representing control characters (like the aforementioned |
Yeah. No hurry.
Yeah. It's seemingly a tiny thing which is trivial to add, but it makes it much more reassuring to use. As suggested previously, printf format string is used commonly for such things where possible, so this would help for the "single output" mode with all 3 outputs, as well as the non-printable (and non-space?) escapes where applicable (I presume that's
Interesting. Go Windows!
Yes.
Maybe. Does it need the indent? isn't that inconsistent with the Granted, in my use case I don't care for the numbered parts (I only need import modules list, and the arch), but what are the numbers good for? I presume not for machine processing, because then it can simply count if it cares about the numeric index of elements... unless the numbers can be non-sequential? Similarly, both question for the imports on its own. I think that except for imports which is inherently indented internally (symbols per module), the output itself doesn't need the "title indentation" with one output, i.e. like you did with the And also it doesn't need the numbering in such case IMO, because it doesn't help automated processing as far as I can tell.
Maybe, but to be honest I don't think anyone would use this for automatic processing, because it's just more combersome and error-prone to process. So if that's for humans, then if can be anything. I still don't find the numbering very useful, but I don't mind them either. As suggested elsewhere, and assuming the numbers are always sequential, an alternative might be to print only the total per numbered list (at the same line as the list title), though this might make the code more complex if the number of entries is unknown when printing the title (exports) or module (imports). |
|
Re the indentation, I would think, for consistency, that each items should indented from its parent. So with multiple outputs, I would think each should have a title (MACHINE/EXPORTS/IMPORTS or however you want to name them), then one tab inside is the content - which may itself be indented further (only imports). This also removes the ambiguity, because first column is always parent or indent, and the the line data itself never contains indent (tab - or space?) because those are escaped. And each single output is without the title, and one level of indentation removed, and possibly without the numbering too. |
On exports they're export ordinals, and on imports they're hints (with a name) or ordinal imports (without a name, If a module imports by ordinal, there's no string comparison nor search search. It just links the Nth export from the DLL. This also enables obfuscation within a piece of software: A module can link its private DLLs by ordinal without revealing function names. A few older Win32 functions have stable ordinals, and so permit ordinal imports of those functions. GNU Traditionally MSVC When importing by name, the ordinals listed in an import library are used as hints. The dynamic linker first tries the hint (in practice usually wrong), then reverts to a search. Because these are all zero in w64dk, you'll see zeroes for all these hints in w64dk-linked modules. I like seeing what's going on with hints and ordinals, so they're in the output. In particular, the hints fingerprint a build revealing what toolchain linked it. If you see all zeroes you know it's w64dk, or at least something unusual (Go). Otherwise you can distinguish what version of Mingw-w64 or Visual Studio are behind a particular image. It's pretty nifty for detective work. |
|
Thanks. I was not aware of any of these. So I guess the numbering can be useful. |
Sorry for the confusion, I didn't actually know what an ordinal was and that w64devkit has a patch to zero them. I figured there was code in peports that was counting the number of exports/imports so you can know if you are close to exceeding the 65535 limit or so. But that's not the case. |
Adds support for displaying the target machine type of PE files with a new -m command line option.
6c15412 to
b88e8b3
Compare
|
Thanks! I made a couple of small tweaks (I want the option listing in alphabetical order) and squashed. |
|
Thanks
Any reason for this btw? |
|
Any reason for this btw?
So that it's easier to find an option via eyeball binary search. :-)
|
|
Line 344 in b88e8b3
unknown should correspond only to 0x0 if the machine type is unrecognized (not in the known list, i.e. the default case), then you could return the string of the |
Adds support for displaying the target machine type of PE files with a new -m command line option. I tested this using llvm-mingw for its various target triples:
armv7-w64-mingw32-clang:
MACHINE TYPE
ARMNT (0x01c4)
aarch64-w64-mingw32-clang:
MACHINE TYPE
ARM64 (0xaa64)
i686-w64-mingw32-clang:
MACHINE TYPE
i386 (0x014c)
x86_64-w64-mingw32-clang:
MACHINE TYPE
x64 (0x8664)