peports: Add -m flag to print PE machine type #271

Peter0x44 · 2025-08-14T17:43:02Z

Adds support for displaying the target machine type of PE files with a new -m command line option. I tested this using llvm-mingw for its various target triples:

armv7-w64-mingw32-clang:
MACHINE TYPE
ARMNT (0x01c4)

aarch64-w64-mingw32-clang:
MACHINE TYPE
ARM64 (0xaa64)

i686-w64-mingw32-clang:
MACHINE TYPE
i386 (0x014c)

x86_64-w64-mingw32-clang:
MACHINE TYPE
x64 (0x8664)

Peter0x44 · 2025-08-14T17:48:54Z

Turns out the mingw-w64 headers are actually missing some entries. I'll use the list from microsoft's documentation:
https://learn.microsoft.com/en-us/windows/win32/debug/pe-format#machine-types

skeeto · 2025-08-14T20:50:13Z

Thanks! I like this, and I plan to merge it, but I want to sort out some details: 1. Should -m behave like -i and -e? That is, disable other output not explicitly requested? 2. A goal is unambiguous output, hence escaping of non-ASCII bytes, or characters peports uses for its own delimiters. I'm not entirely happy with "EXPORTS" in the same position as module names. In theory a module could be called "EXPORTS" and so the default output is ambiguous: Are these exports, or are these imports from the module named "EXPORT"? The EXPORTS heading should probably be delimited by a character disallowed in module names, which would be escaped when printed as a module name. This is relevant because there's a new ambiguity with "MACHINE TYPE", and so perhaps it and "EXPORTS" should get unambiguous representation. Colon is practically disallowed in module names, though I'd still like them unescaped in symbol names. Backslash is disallowed, too, and already reserved. So maybe open with a backslash like "\exports" "\machine"? Those are distinct from the \x escapes, too. Angle brackets are already reserved, too, like "<exports>" or "<machine>". Or even ">machine"? 3. There's no "fat PE" (right?) and so an image can only have a single machine type. For consistency it's formatted like a list though it can only have on item. Perhaps it should have a different syntax entirely? "\machine AMD64 0x8664"? Drawing on some of this for a hypothetical output: $ peports -m example.dll \machine I386 0x014c $ peports -iem example.dll \machine I386 0x014c \exports example_main example_version KERNEL32.dll ReadFile WriteFile I like "machine" being one word instead of "machine type" so that it doesn't occupy two "fields" (e.g. in awk). Ideas? What do you think?

avih · 2025-08-14T21:12:29Z

Should -m behave like -i and -e? That is, disable other output not explicitly requested?

I think yes. I.e. -m would print the machine type and exit.

"\machine AMD64 0x8664"?

I think this is overengineering. The parser of -m will not be the same parser of other outputs from peports. If you want unified content format, then just use JSON, but I think this would make the parsing more involved than necessary, and less useful with traditional posix utilities pipelines (mainly referring to the impots/exports lists).

We have one raw value with -m, so this should be the sole output IMO. Users of this feature can interpret these values according to current or future type maps.

If we want it more human readable, then maybe -M could provide that, with some title, and mapping of raw value to readable name where we can.

If we want both under one option, then I think a form of 0xHHHH (MTYPE) parses more easily by scripts, as the raw value - the only value which is guaranteed to identify the type (unlike UNKNOWN which can print for different values) is first, and can be extracted easily by removing everything starting at the first space.

In general, I don't think there's much value in printing both imports and exports together, and it makes the output harder to make unambiguous. It can be convenient, but personally I like explicit and simple.

I think it's best to default to -i (imports). It's also possible to default to exports for dll and imports for exe, but i don't like the guesswork involved, and it can have tricky edge cases.

Peter0x44 · 2025-08-14T21:15:29Z

Should -m behave like -i and -e? That is, disable other output not
explicitly requested?

Yeah, agreed, I think it should do this.

I am leaning towards >machine. No major preference either way.

There's no "fat PE" (right?) and so an image can only have a single
machine type. For consistency it's formatted like a list though it can
only have on item. Perhaps it should have a different syntax entirely?
"\machine AMD64 0x8664"?

Yes. an image can only have a single machine type.

The Mac thing of having multiple executables in the file and selecting the one by architecture isn't done by windows. The closest is the aforementioned ARM64EC

Peter0x44 · 2025-08-14T22:28:37Z

One more clarification: Do you have a preference if -m works by default (with no arguments), or would you prefer if it's only enabled if requested?

Peter0x44 · 2025-08-14T23:14:16Z

Hmm. Is there any reason the number of exports is counted, but the number of imports isn't?
I've given it some deeper thought, and here is what feels "good" to me:

How about we allow any combination of -mei and separate them like this:
The default output would be of -mei, but passing any of those individually disables any that aren't explicitly passed

>machine
    0xAA64 (ARM64)
>exports
    whatever.dll
        1 whatever_function
>imports
    msvcrt.dll
        1 printf
        2 malloc
    somelib.dll
        1 somelib_func

exports would be blank if there aren't any (probably the case for a majority of exes).

Or is this becoming too complicated or similar to gendef?

avih · 2025-08-15T07:26:41Z

An idea about options usage:

Each of -i/-e/-m do just one thing and output a simple form which is easy to process [in a pipeline] with traditional tools (the current -i/-e forms are good). The output formats can be documented at the -h page (e.g. using printf format string) so that parsing it doesn't require guesswork.
If -i/-e/-m are mutually exclussive, then peports should probably error if more than one is provided. If they're not mutually exclussive, then the outputs should be at the order of the given options, and can be separated by an empty line, and maybe error out if the same option is used more than once.
Without options, it prints some human-readable summary, which is not intended for machine-parsing, and is relatively small so that it's easy to grasp in a glance. For instance the arch, the import modules (without symbol names, or just with the number of symbols per module), and the number of exported symbols.

skeeto · 2025-08-16T13:41:49Z

would you prefer if it's only enabled if requested?

I prefer only enabled on requested, and per (1) requesting it puts peports in requested-outputs-only mode. When I casually point peports at an EXE or DLL, the current behavior is pretty much exactly what I want. With a (typical) EXE I get a listing of imports and nothing else, and the machine type is probably already known or unimportant. With a DLL, the exports are most important, and they come first.

Thinking about it more, maybe even better default behavior would be: show only exports, but if there are none then show imports. So then you get only the most important information for either "kind" of PE. Don't worry about that in this MR, though.

Is there any reason the number of exports is counted, but the number of imports isn't?

I don't understand. What do you mean?

Or is this becoming too complicated or similar to gendef?

While I was initially writing peports, I started with hierarchical printing and I didn't like it.

I am leaning towards >machine.

Alright, let's go with that, and so >exports, too. Other than the extra >imports nesting, and the parentheses, I like your example output. That resolves (2). So that means:

$ peports -m example.dll
>machine
    0xAA64 ARM64

$ peports example.dll
>exports
    1 whatever_function
msvcrt.dll
    1 printf
    2 malloc
somelib.dll
    1 somelib_func

$ peports -mei example.dll
>machine
    0xAA64 ARM64
>exports
    1 whatever_function
msvcrt.dll
    1 printf
    2 malloc
somelib.dll
    1 somelib_func

And with that I think it's ready to merge.

outputs should be at the order of the given options

I agree, that's a good point. (It doesn't necessarily need to happen in this MR.)

The parser of -m will not be the same parser of other outputs from peports.

If the machine type is checked in a separate call from extracting exports/imports, then the PE will be handled twice (i.e. opened and read twice). Better to request everything at once, and if the machine type is wrong, discard the rest. So that means parsing all the output together, even if the machine type is known to list first.

0xHHHH (MTYPE)

The order isn't important to me (hex or name first) at least so long as the name is only a single word, but I really do not like the parentheses. They're extraneous and do not disambiguate output. Parsers that want the name have to do work remove the parentheses.

In listings, <…> is used for <NONAME> because a name with these bytes would instead print as \x3cNONAME\x3e. The angle brackets have meaning. Same for fowarders naming their target modules. The angle brackets separate the name from the module. Space isn't enough because both names and modules can contain spaces.

That's a good point about UNKNOWN, that machine processing of the machine type (e.g. to match up modules) should use the hex, not the name.

then just use JSON

I know you're not actually proposing JSON, but as is more often the case than people appreciate, JSON is limited to unicode text and cannot itself represent byte strings. That's a substantial handicap in systems programming. For example, JSON cannot represent arbitrary file names, either on unix or Windows. In PE images, names and modules are supposed to be ASCII-only, though in practice they're really arbitrary byte strings. So they may not be representable in JSON. Storing PE listings as JSON would require an extra layer of encoding within JSON strings, which is mostly back to square one.

avih · 2025-08-16T13:57:22Z

If the machine type is checked in a separate call from extracting exports/imports, then the PE will be handled twice (i.e. opened and read twice). Better to request everything at once, and if the machine type is wrong, discard the rest. So that means parsing all the output together, even if the machine type is known to list first.

You don't know the contexts where it's going to be used, or whether the actual arch is important, regardless of the need for deps to have the same arch.

There's a reason why "do one thing well" exists. Because it's easier to handle one kind of thing at a time. It might be theoretically more efficient to process all the outputs together, but this necessarily complicates the output handling.

It can be preferable for the caller to call it twice, once for each form of output. They may or may not care about performance, or they may, but not enough in this case (the delta of one extra invocation is likely negligible in the grand scheme of things, while more complex code to handle it is not negligible).

At least make this exception:

If only one output type is requested, then don't decorate it with title and other unrelated outputs. The invoker knows what they requested, and extra decoration only makes it more cumbersome to handle.

avih · 2025-08-16T14:15:28Z

The order isn't important to me (hex or name first) at least so long as the name is only a single word

You know it's a single word, but does the user know that too? If it's not documented, then they can't assume what chars it contains, including spaces and parenthesis and whatever. Would you make such an assumption about some 3rd party tool which prints the architecture of a binary?

But users can assume safely and logically that a hex value is a single word.

I know you're not actually proposing JSON

Correct, and I also don't know enough about these names, their constraints, and their potential encoding. Keep in mind that if they're arbitrary bytes then a \n would likely also throw off any parser, so regardless of whether it's handled or not (it might be escaped?) it should be documented. There's no manpage, and it's not documented at the -h output how non-ALNUM bytes are handled. Source comments don't really count even if they cover everything. Users don't read source code typically to understand what a program does, except if they intend to modify it.

As for JSON, if it was on the table, then it could be encoded, like base64.

avih · 2025-08-16T15:18:12Z

By the way, how are file names (modules) encoded? (at the binary itself which needs to load the modules, and at the printout)

skeeto · 2025-08-16T16:25:29Z

If only one output type is requested, then don't decorate it with title and other unrelated outputs.

Hmm, that sounds reasonable. So then maybe:

$ peports -m example.dll
0xAA64 ARM64

$ peports -e example.dll
    1 whatever_function

$ peports -mei example.dll
>machine
    0xAA64 ARM64
>exports
    1 whatever_function
msvcrt.dll
    1 printf
    2 malloc
somelib.dll
    1 somelib_func

(Again, this doesn't have to happen in this MR.)

There's no manpage, and it's not documented

True, but that's just a couple sentences of documentation from being settled. A motivation for peports was that none of the existing tools of which I'm aware handle weird inputs well. They crash (see #135), or byte strings are decoded in some ambiguous way, or output is simply missing due to imprecise decoding/encoding. (It's also, in general, dangerous to use link.exe or Binutils on untrusted PE images, but safe with peports.) So faced with an edge case, I might wonder, "What exactly did the linker produce for this unicode function name?" The only way to inspect weird export/import tables was tediously via a hexdump.

For instance, the situation in Windows dynamic linking depends on the active code page is essentially only debuggable by peports or a manual hexdump inspection. The unambiguous output was first and foremost for human inspection, without any code page contamination. It just so happens that unambiguous output is also useful for robust machine consumption.

By the way, how are file names (modules) encoded?

Officially, it's ASCII-only:
https://learn.microsoft.com/en-us/windows/win32/debug/pe-format#the-edata-section-image-only

This has the funny situation that non-ASCII DLL names are disallowed, or at least impractical, e.g. cálculo.dll. Like many important parts of Windows' behavior, it's undocumented what happens for non-ASCII bytes. Per my article, in practice Windows translates the byte string to UTF-16 (for file system lookup) using the current code page, and so how it decodes depends on who's looking at it. Any dependency walker must go through a similar procedure to turn these byte strings into concrete paths.

Hence the importance of utilities like peports not decoding it at all! If a program did link a cálculo.dll then it's important to distinguish if the linker (or whatever) encoded using its own code page (ex. CP1252: c\xe1lculo.dll in peports) or, say, UTF-8 (c\xc3\xa1lculo.dll in peports). Again, I was not anticipating programs actually decoding this stuff to recover the byte strings, but intending to communicate clearly for human inspection.

There's of course the general problem with representing control characters (like the aforementioned \n), which peports always renders escaped. This prevents malicious PE images from "log injecting" forged outputs.

avih · 2025-08-16T19:44:31Z

(Again, this doesn't have to happen in this MR.)

Yeah. No hurry.

There's no manpage, and it's not documented

True, but that's just a couple sentences of documentation from being settled.

Yeah. It's seemingly a tiny thing which is trivial to add, but it makes it much more reassuring to use. As suggested previously, printf format string is used commonly for such things where possible, so this would help for the "single output" mode with all 3 outputs, as well as the non-printable (and non-space?) escapes where applicable (I presume that's "\\x%02x"?).

This has the funny situation that non-ASCII DLL names are disallowed, or at least impractical, e.g. cálculo.dll. Like many important parts of Windows' behavior, it's undocumented what happens for non-ASCII bytes...

Interesting. Go Windows!

$ peports -m example.dll
0xAA64 ARM64

Yes.

$ peports -e example.dll
1 whatever_function

Maybe. Does it need the indent? isn't that inconsistent with the -m output above - which is unindented?

Granted, in my use case I don't care for the numbered parts (I only need import modules list, and the arch), but what are the numbers good for? I presume not for machine processing, because then it can simply count if it cares about the numeric index of elements... unless the numbers can be non-sequential?

Similarly, both question for the imports on its own.

I think that except for imports which is inherently indented internally (symbols per module), the output itself doesn't need the "title indentation" with one output, i.e. like you did with the -m output is OK, but I'd guess it was accidental in your example rather than intentional?

And also it doesn't need the numbering in such case IMO, because it doesn't help automated processing as far as I can tell.

$ peports -mei example.dll
>machine
0xAA64 ARM64
>exports
1 whatever_function
msvcrt.dll
1 printf
2 malloc
somelib.dll
1 somelib_func

Maybe, but to be honest I don't think anyone would use this for automatic processing, because it's just more combersome and error-prone to process. So if that's for humans, then if can be anything. I still don't find the numbering very useful, but I don't mind them either.

As suggested elsewhere, and assuming the numbers are always sequential, an alternative might be to print only the total per numbered list (at the same line as the list title), though this might make the code more complex if the number of entries is unknown when printing the title (exports) or module (imports).

avih · 2025-08-17T12:36:21Z

Re the indentation, I would think, for consistency, that each items should indented from its parent.

So with multiple outputs, I would think each should have a title (MACHINE/EXPORTS/IMPORTS or however you want to name them), then one tab inside is the content - which may itself be indented further (only imports).

This also removes the ambiguity, because first column is always parent or indent, and the the line data itself never contains indent (tab - or space?) because those are escaped.

And each single output is without the title, and one level of indentation removed, and possibly without the numbering too.

skeeto · 2025-08-17T14:09:46Z

but what are the numbers good for?

On exports they're export ordinals, and on imports they're hints (with a name) or ordinal imports (without a name, <NONAME>). It's mostly a holdover from the 16-bit era to enable faster dynamic linking, so that not even a binary search is necessary:

https://learn.microsoft.com/en-us/cpp/build/exporting-functions-from-a-dll-by-ordinal-rather-than-by-name

If a module imports by ordinal, there's no string comparison nor search search. It just links the Nth export from the DLL. This also enables obfuscation within a piece of software: A module can link its private DLLs by ordinal without revealing function names. A few older Win32 functions have stable ordinals, and so permit ordinal imports of those functions.

GNU ld has --out-implib to produce an import library when building a module (even EXE's can export!). Every export has an ordinal whether or not it's actually stable across builds. Unless you specify ordinals (via a DEF) to the linker, it just numbers them 1-indexed, monotonically. The ordinals in this import library are the true ordinals for the corresponding module.

Traditionally MSVC lib.exe when producing an import library from a DEF, given no ordinal preference, simply assigns ordinals monotonically as a blind guess. Binutils dlltool does the same. These ordinals are virtually never correct. I've patched dlltool to produce zero ordinals ("null") instead of guessing, including when building w64dk itself, which is visible in all the system import libraries. This removes pointless noise from images and improves build reproduceability.

When importing by name, the ordinals listed in an import library are used as hints. The dynamic linker first tries the hint (in practice usually wrong), then reverts to a search. Because these are all zero in w64dk, you'll see zeroes for all these hints in w64dk-linked modules.

I like seeing what's going on with hints and ordinals, so they're in the output. In particular, the hints fingerprint a build revealing what toolchain linked it. If you see all zeroes you know it's w64dk, or at least something unusual (Go). Otherwise you can distinguish what version of Mingw-w64 or Visual Studio are behind a particular image. It's pretty nifty for detective work.

avih · 2025-08-17T14:20:03Z

Thanks. I was not aware of any of these. So I guess the numbering can be useful.

Peter0x44 · 2025-08-17T15:37:39Z

Is there any reason the number of exports is counted, but the number of imports isn't?

I don't understand. What do you mean?

Sorry for the confusion, I didn't actually know what an ordinal was and that w64devkit has a patch to zero them.
https://github.com/skeeto/w64devkit/blob/master/src/binutils-dlltool-zero-ordinals.patch

I figured there was code in peports that was counting the number of exports/imports so you can know if you are close to exceeding the 65535 limit or so. But that's not the case.

Adds support for displaying the target machine type of PE files with a new -m command line option.

skeeto · 2025-08-17T21:29:52Z

Thanks! I made a couple of small tweaks (I want the option listing in alphabetical order) and squashed.

Peter0x44 · 2025-08-17T21:49:29Z

Thanks

(I want the option listing in alphabetical order)

Any reason for this btw?

skeeto · 2025-08-17T22:18:21Z

Any reason for this btw?

So that it's easier to find an option via eyeball binary search. :-)

wesinator · 2025-08-18T00:40:17Z

w64devkit/src/peports.c

Line 344 in b88e8b3

default: return s8("UNKNOWN");

unknown should correspond only to 0x0

if the machine type is unrecognized (not in the known list, i.e. the default case), then you could return the string of the machine u16 hex value.

followup to skeeto#271

Peter0x44 mentioned this pull request Aug 14, 2025

[FEATURE REQUEST] peports: detect arch #270

Open

Peter0x44 force-pushed the peports_arch_detection branch from d37afc9 to 9b69799 Compare August 14, 2025 17:57

Peter0x44 force-pushed the peports_arch_detection branch from 9b69799 to f92d448 Compare August 14, 2025 22:23

peports: Add -m flag to print PE machine type

b88e8b3

Adds support for displaying the target machine type of PE files with a new -m command line option.

skeeto force-pushed the peports_arch_detection branch from 6c15412 to b88e8b3 Compare August 17, 2025 21:28

skeeto merged commit b88e8b3 into skeeto:master Aug 17, 2025

Peter0x44 deleted the peports_arch_detection branch August 17, 2025 21:43

wesinator added a commit to wesinator/w64devkit that referenced this pull request Aug 19, 2025

peports - machine type, return unknown type code in default case

ced49ea

followup to skeeto#271

wesinator mentioned this pull request Aug 19, 2025

peports - specify if machine type is unrecognized #272

Closed

peports: Add -m flag to print PE machine type #271

peports: Add -m flag to print PE machine type #271

Uh oh!

Conversation

Peter0x44 commented Aug 14, 2025

Uh oh!

Peter0x44 commented Aug 14, 2025

Uh oh!

skeeto commented Aug 14, 2025 via email

Uh oh!

avih commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Peter0x44 commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Peter0x44 commented Aug 14, 2025

Uh oh!

Peter0x44 commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

avih commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

skeeto commented Aug 16, 2025

Uh oh!

avih commented Aug 16, 2025

Uh oh!

avih commented Aug 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

avih commented Aug 16, 2025

Uh oh!

skeeto commented Aug 16, 2025

Uh oh!

avih commented Aug 16, 2025

Uh oh!

avih commented Aug 17, 2025

Uh oh!

skeeto commented Aug 17, 2025

Uh oh!

avih commented Aug 17, 2025

Uh oh!

Peter0x44 commented Aug 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

skeeto commented Aug 17, 2025

Uh oh!

Peter0x44 commented Aug 17, 2025

Uh oh!

skeeto commented Aug 17, 2025 via email

Uh oh!

wesinator commented Aug 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

avih commented Aug 14, 2025 •

edited

Loading

Peter0x44 commented Aug 14, 2025 •

edited

Loading

Peter0x44 commented Aug 14, 2025 •

edited

Loading

avih commented Aug 15, 2025 •

edited

Loading

avih commented Aug 16, 2025 •

edited

Loading

Peter0x44 commented Aug 17, 2025 •

edited

Loading