Reviewed RetDec's compiler detection signatures across all supported platforms (PE x86/x64, ELF x86/x64, Mach-O) to identify gaps in modern compiler version detection.
- ✅ Good Coverage: AutoIt, AutoHotKey, Go compilers, assembly tools (FASM, MASM, TASM)
⚠️ Outdated: GCC signatures only cover versions 4.6.3-4.7.2 (2012-2013 era)- ❌ Missing: No Clang/LLVM detection signatures at all
⚠️ Generic: MSVC detection is generic (no version-specific signatures)
Current State: RetDec can detect very old GCC versions but cannot identify:
- GCC 5.x through 14.x (10+ years of releases)
- Any Clang/LLVM version
- Specific MSVC versions (2015, 2017, 2019, 2022)
Recommendation: LOW PRIORITY - Current signatures work for general detection, but adding modern compiler signatures would improve analysis accuracy for recent binaries.
Detected Compilers:
- GCC 4.7.0 (RHEL) - Entry point signature
- GCC 4.6.3 (Ubuntu) - Entry point signature
- GCC 4.7.2 - Entry point signature
- Tiny C Compiler 0.9.26 - Entry point signature
- Go compiler (gc) - 7 different entry point patterns
Version Coverage:
- Oldest: GCC 4.6.3 (Ubuntu 12.04, April 2012)
- Newest: GCC 4.7.2 (September 2012)
- Gap: GCC 4.8 through 14.x (2013-2024) - Missing
Missing Compilers:
- ❌ Clang/LLVM (any version)
- ❌ GCC 5.x through 14.x
- ❌ Intel ICC
- ❌ Rust compiler (rustc)
Detected Compilers:
- MSVC (generic) - Single entry point signature, no version detection
- AutoIt (Aut2Exe) - Multiple versions (2.64, 3.3.0.0, 3.3.8.x, 3.3.10.x, 3.3.14.x)
- AutoHotKey (AHK2Exe) - Generic detection
- F2KO Bat2Exe - Generic detection
- Go compiler (gc) - Two patterns (native Go, MinGW cross-compiled)
- QB64 - BASIC compiler detection
Version Coverage:
- MSVC: Generic only (no VS2015, VS2017, VS2019, VS2022 distinction)
- AutoIt: Excellent coverage (multiple specific versions)
- Go: Two compilation variants
Missing Compilers:
- ❌ Clang/LLVM (Windows build)
- ❌ GCC/MinGW-w64 specific versions
- ❌ Intel ICL (Windows)
- ❌ Version-specific MSVC (2015-2022)
Detected Compilers:
- FASM - Multiple versions (1.3x, 1.5x)
- MASM/TASM - Multiple generic patterns
- MASM32 - Specific pattern
- AutoIt (Aut2Exe) - Multiple versions (matches x64)
- AutoHotKey (AHK2Exe) - Generic detection
Similar gaps to PE x64
| Version | Release Date | Notable Features |
|---|---|---|
| GCC 14.2 | August 2024 | Latest stable, C++23/C++26 features, Intel APX, AVX10.1 |
| GCC 14.1 | May 7, 2024 | C++23 library features, Intel ISA extensions |
| GCC 13.x | April 2023 | C++23 partial support |
| GCC 12.x | May 2022 | C++20 complete support |
| GCC 11.x | April 2021 | C++17/20 improvements |
| GCC 10.x | May 2020 | C++20 partial support |
| GCC 9.x | May 2019 | C++17 full support |
| GCC 8.x | May 2018 | - |
| GCC 7.x | May 2017 | - |
| GCC 6.x | April 2016 | - |
| GCC 5.x | April 2015 | Major ABI change |
Current RetDec Coverage: GCC 4.6.3-4.7.2 (2012-2013) Gap: GCC 5.x through 14.x (2015-2024) - 12 major versions missing
| Version | Release Date | Notable Features |
|---|---|---|
| Clang 19.1.0 | September 20, 2024 | Latest stable, 18925 commits |
| Clang 18.1.0 | March 2024 | - |
| Clang 17.0.0 | September 2023 | - |
| Clang 16.0.0 | March 2023 | - |
| Clang 15.0.0 | September 2022 | - |
| Clang 14.0.0 | March 2022 | - |
| Clang 13.0.0 | September 2021 | - |
Current RetDec Coverage: None Gap: All Clang versions - Complete absence
| VS Version | MSVC Toolset | Compiler (_MSC_VER) | Release Date |
|---|---|---|---|
| VS 2022 (17.14) | 143 (14.40+) | 1940 | October 2025 |
| VS 2022 (17.10) | 143 (14.40) | 1940 | May 2024 |
| VS 2022 (17.0) | 143 (14.30) | 1930 | November 2021 |
| VS 2019 (16.x) | 142 (14.2x) | 1920-1929 | April 2019 |
| VS 2017 (15.x) | 141 (14.1x) | 1910-1916 | March 2017 |
| VS 2015 (14.0) | 140 (14.0x) | 1900 | July 2015 |
Current RetDec Coverage: Generic MSVC detection only Gap: No version-specific detection for VS2015-2022
Impact: HIGH
- Clang is the default compiler on macOS
- Widely used on Linux
- Growing adoption for cross-platform development
- Used by major projects (Chrome, Firefox, LLVM itself)
Missing Versions: All (Clang 3.x through 19.x)
Impact: MEDIUM-HIGH
- Current signatures: GCC 4.6-4.7 (12 years old)
- Modern distributions ship GCC 11-14
- Ubuntu 22.04: GCC 11.4
- Ubuntu 24.04: GCC 13.2
- RHEL 9: GCC 11.4
- Debian 12: GCC 12.2
Missing Versions: GCC 5.x through 14.x (2015-2024)
Impact: MEDIUM
- Can detect "compiled with MSVC" but not which version
- Versioning matters for:
- C++ standard support (C++11/14/17/20/23)
- ABI compatibility
- Security features (Control Flow Guard, Spectre mitigations)
Missing Granularity: VS2015, VS2017, VS2019, VS2022
Impact: LOW-MEDIUM
- Intel ICC/ICL (x86/x64 optimizing compiler)
- Rust compiler (rustc) - Increasingly common
- Other modern toolchains (Zig, Swift on Windows/Linux)
-
Entry Point Variability
- CRT (C Runtime) code changes between versions
- Optimization levels affect entry point code
- Static vs dynamic linking changes patterns
- LTO (Link-Time Optimization) can drastically alter code
-
Distribution Differences
- GCC 11.4 on Ubuntu differs from GCC 11.4 on RHEL
- Different patches applied by distributors
- Compiler flags affect output
-
Modern Features Break Signatures
- PIE (Position Independent Executables) - now default
- ASLR (Address Space Layout Randomization)
- Stack canaries and security features
- Stripped binaries remove identifying info
-
Clang Compatibility
- Clang intentionally mimics GCC behavior
- Can be hard to distinguish from GCC in binary form
- Uses similar CRT initialization code
Instead of entry point signatures, modern detection could use:
-
Rich Header (PE files)
- Embeds toolchain information
- RetDec already parses this (
retdec-fileinfo) - More reliable than byte patterns
-
Build ID (ELF files)
.note.gnu.build-idsection- Can indicate compiler and version
-
String Patterns
- Compiler identification strings
- Exception handling strings
- Debug info remnants
-
ABI Analysis
- Name mangling schemes
- Exception handling mechanisms
- Calling conventions
Action: Update README.md or create COMPILER_DETECTION.md Effort: 1 hour Rationale: Users should know detection limitations
Content:
## Compiler Detection Limitations
RetDec's compiler detection uses YARA signatures for entry point patterns.
Current coverage:
**Well-Supported:**
- GCC 4.6-4.7 (Linux)
- AutoIt, AutoHotKey compilers
- Go compiler
- FASM, MASM, TASM assemblers
**Limited Support:**
- MSVC: Generic detection (no version)
- GCC: Older versions only
**Not Detected:**
- Clang/LLVM (any version)
- GCC 5.x through 14.x
- Intel ICC/ICL
- Rust compiler
For modern compilers, RetDec may report "unknown compiler"
but will still perform decompilation successfully.Action: Create signatures for GCC 9, 11, 13, 14 Effort: 4-8 hours (research + testing) Benefit: Detect most Linux binaries from 2020-2024
Approach:
- Compile test programs with GCC 9, 11, 13, 14 on Ubuntu/Debian/RHEL
- Extract entry point patterns for each
- Test against variety of optimization levels (-O0, -O2, -O3, -Os)
- Add to
support/yara_patterns/tools/elf/x64/compilers.yara
Example Signature Pattern (research needed):
rule gcc_140_ubuntu
{
meta:
tool = "C"
name = "GCC"
version = "14.0"
extra = "Ubuntu 24.04"
source = "Made by RetDec Team"
strings:
$1 = { [pattern to be researched] }
condition:
$1 at elf.entry_point
}Action: Research Clang entry point patterns Effort: 6-10 hours (Clang mimics GCC, harder to distinguish) Benefit: Detect macOS and Clang-built Linux binaries
Challenges:
- Clang often looks identical to GCC in binary form
- May need alternative detection (strings, ABI analysis)
- Worth investigating if signatures are even feasible
Action: Research Rich Header parsing enhancement Effort: 2-4 hours Benefit: Accurate MSVC version detection without signatures
Current State: RetDec already parses PE Rich Header (PELib library) Enhancement: Map Rich Header values to Visual Studio versions
Rich Header contains:
- Product ID (e.g., 0x011B = MSVC 19.30 = VS2022 17.0)
- Build count per tool
- More reliable than entry point signatures
Instead of maintaining fragile binary signatures, leverage existing structures:
Current: Generic MSVC detection via entry point Proposed: Parse Rich Header for exact version
Implementation:
// In cpdetect/cpdetect.cpp or similar
if (peFile->hasRichHeader()) {
auto richHeader = peFile->getRichHeader();
for (auto& entry : richHeader.getEntries()) {
// Map Product ID to compiler version
// 0x011B (283) = MSVC 19.30 (VS2022 17.0)
// 0x0118 (280) = MSVC 19.20 (VS2019)
// etc.
if (entry.productId == 0x011B) {
return ToolInfo{"MSVC", "19.30", "Visual Studio 2022"};
}
}
}Advantages:
- No fragile byte patterns
- Works with any optimization level
- Survives obfuscation/packing
- Already present in PE files
Current: Limited GCC signatures Proposed: Multi-faceted detection
Strategy 1: GCC/Clang identification strings
strings binary | grep -E "(GCC|clang version)"
# GCC: (Ubuntu 13.2.0-23ubuntu4) 13.2.0
# clang version 19.1.0Strategy 2: .comment section
readelf -p .comment binary
# Contains: "GCC: (Ubuntu 13.2.0-23ubuntu4) 13.2.0"Strategy 3: DWARF debug info (if not stripped)
readelf --debug-dump=info binary | grep DW_AT_producer
# DW_AT_producer: GNU C17 13.2.0 -mtune=generic -march=x86-64Implementation in RetDec:
// In fileformat or cpdetect
std::string compiler = elfFile->getCommentSection();
if (compiler.find("GCC") != std::string::npos) {
// Parse version: "GCC: (Ubuntu 14.2.0-1) 14.2.0"
// Extract: GCC 14.2.0
}Costs:
- High Maintenance: Every compiler update breaks signatures
- Fragile: Optimization levels, patches, distributions vary
- Limited Coverage: Can't cover all configurations
- Testing Burden: Must test each compiler/version/distribution combo
Benefits:
- Fast Detection: Entry point check is immediate
- No Dependencies: Works on stripped binaries
- Historical Value: Can identify very old compilers
Hybrid Strategy:
- Keep existing signatures for historical compilers (GCC 4.x, etc.)
- Add string-based detection for modern compilers (parse
.comment, Rich Header) - Document limitations clearly for users
- Low-priority enhancement: Add GCC 11/13/14 signatures if time permits
Rationale:
- Modern binaries often retain identification strings
- Rich Header is standardized and reliable
- Effort better spent on decompilation quality than detection
- Detection is informational; decompilation works regardless
File: docs/COMPILER_DETECTION.md
Content:
- Current capabilities and limitations
- Explanation of detection methods
- Why modern compilers may not be detected
- Impact on decompilation (none - detection is informational)
Deliverable: Clear documentation for users
Files:
- Enhance
src/cpdetect/cpdetect.cpp - Add mapping table for Rich Header Product IDs
Benefit: Accurate MSVC version detection
Files:
- Enhance
src/cpdetect/cpdetect.cpp - Parse ELF
.commentsection - Parse DWARF
DW_AT_producerif available
Benefit: Detect modern GCC/Clang without fragile signatures
Files:
support/yara_patterns/tools/elf/x64/compilers.yara- Add GCC 11, 13, 14 signatures
- Research Clang signatures (may not be feasible)
Benefit: Traditional signature detection for modern compilers
If signature updates are pursued:
| Compiler | Versions to Test | Platforms | Optimization Levels |
|---|---|---|---|
| GCC | 9.x, 11.x, 13.x, 14.x | Ubuntu 20.04, 22.04, 24.04 | -O0, -O2, -O3, -Os |
| GCC | Same | Debian 11, 12 | Same |
| GCC | Same | RHEL 8, 9 | Same |
| Clang | 14.x, 16.x, 18.x, 19.x | Ubuntu 22.04, 24.04 | -O0, -O2, -O3, -Os |
| MSVC | VS2019, VS2022 | Windows 10/11 | /Od, /O1, /O2, /Ox |
Simple C Program:
#include <stdio.h>
int main() {
printf("Hello, World!\n");
return 0;
}With Dependencies:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char** argv) {
char* buf = malloc(100);
strcpy(buf, "test");
printf("%s\n", buf);
free(buf);
return 0;
}Compilation Matrix: Test all compiler/version/optimization combinations
- Functional: RetDec can decompile binaries from any compiler
- Limited Detection: Compiler identification is outdated (GCC 4.x, generic MSVC)
- Not Critical: Detection is informational; doesn't affect decompilation
Rationale:
- Decompilation Works: Compiler detection doesn't affect core functionality
- Maintenance Burden: Signature-based detection is fragile and high-maintenance
- Better Alternatives: Rich Header and string parsing are more reliable
- Limited ROI: Effort better spent on decompilation quality improvements
Suggested Actions (in priority order):
- ✅ Document current limitations (1 hour) - DO THIS
⚠️ Enhance Rich Header detection for MSVC (2-4 hours) - OPTIONAL⚠️ Add string-based detection for GCC/Clang (4-6 hours) - OPTIONAL- ❌ Research and add modern signatures (8-12 hours) - LOW PRIORITY
Final Verdict: Accept current limitations and focus modernization effort elsewhere. Compiler detection is a "nice to have" but not critical for RetDec's core mission of decompilation.
Document Version: 1.0 Date: 2025-11-03 Author: RetDec Modernization Team Status: Analysis Complete - Recommendations Provided