A cross-engine SEO poisoning detector for software downloads.
SEO poisoning is a real and underappreciated attack vector. Threat actors register convincing-looking domains, stuff them with the right keywords, and buy or manipulate their way into the top results on Google, Bing, or Brave. An unsuspecting user searches for "Siemens TIA Portal V17 download", clicks the third result, and downloads a trojanised installer.
The key insight behind Arkoi: it's hard to poison every search engine simultaneously at scale.
A malicious domain might crack the top 3 on Google. But if it's absent on Bing, Brave, DuckDuckGo, and Yandex for the same query — that inconsistency is itself a signal. Arkoi cross-references results across six engines and asks a different question than any URL scanner:
Given that I searched for X, does this result actually belong here — or was it placed here to deceive me?
This is a personal experiment and an ongoing demo. It is not production security software.
Query → Parse intent (vendor, software, version)
→ Fetch all engines in parallel (async)
→ For each result, run signals concurrently:
① Vendor domain verification
② Cross-engine consensus scoring
③ Rank anomaly detection
④ Query-result relevance + path analysis
⑤ URLhaus threat intel lookup
⑥ Domain age (WHOIS)
→ Assemble verdict from signals
→ Render results + summary
No numeric risk scores. Verdicts are categorical with explicit, human-readable reasoning:
| Verdict | Meaning |
|---|---|
✓ TRUSTED |
Official vendor domain or trusted partner, consistent across engines |
? UNVERIFIED |
No red flags found, but no relationship to queried vendor confirmed |
⚠ SUSPICIOUS |
One or more moderate signals — new domain, SEO anomaly, suspicious path |
✗ DECEPTIVE |
Strong indicators of deceptive placement — impersonation, piracy signals, single-engine promotion |
- Python 3.10+
- A running SearXNG instance on
http://127.0.0.1:8080
SearXNG is a self-hosted meta search engine. Arkoi queries it for Google, Bing, Brave, DuckDuckGo, Yahoo, and Yandex results simultaneously. You need to run your own instance — this keeps queries private and avoids API rate limits.
Quick SearXNG setup with Docker:
docker run -d -p 8080:8080 searxng/searxnggit clone https://github.com/404saint/arkoi.git
cd arkoi
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt# Interactive
python arkoi.py
# Direct query
python arkoi.py "Siemens TIA Portal V17 download"
python arkoi.py "AutoCAD 2025 download"
python arkoi.py "Wireshark install"
python arkoi.py "PyCharm professional"══════════════════════════════════════════════════════════════════════════════
ARKOI — SEO Poisoning Detector
══════════════════════════════════════════════════════════════════════════════
Query : Siemens TIA Portal V17 download
Vendor : Siemens
Version: V17
Fetching results from all engines... done in 3.0s (21 unique domains, 3/6 engines responded)
Running signal checks... done in 6.0s
[✓ TRUSTED ] sieportal.siemens.com
Engines: bing #8 · google #2 · brave #1 (3/3 engines)
Vendor : VENDOR_MATCH
├─ Official vendor domain
└─ Consistent across 3 search engine(s)
↳ This is a verified source.
[✗ DECEPTIVE ] plc4me.com
Engines: google #3 (1/3 engines)
Vendor : UNRELATED
└─ URL path contains piracy/bypass signals on an unverified domain
↳ Avoid this result. Do not download anything from this domain.
✓ Safest result : sieportal.siemens.com
✗ Avoid : plc4me.com, plcshare.com
Total runtime : 9.1s
arkoi/
├── arkoi.py # Entry point and pipeline orchestrator
├── query_parser.py # Extracts vendor, version, tokens from query
│ # Contains the vendor registry and product aliases
├── fetcher.py # Async multi-engine search via SearXNG
├── signals.py # All signal checks (vendor, consensus, age, malware...)
├── verdict.py # Assembles signals into verdict + reasons
├── renderer.py # Terminal output formatting
└── requirements.txt
Arkoi currently recognises vendors and products across:
- Industrial / PLC: Siemens, Rockwell, Schneider, Honeywell, Beckhoff, Omron, ABB
- CAD / CAE / Simulation: Autodesk, ANSYS, PTC, Dassault, SolidWorks, Altair
- Developer tools: Microsoft, JetBrains, HashiCorp, Docker, Atlassian, GitHub, GitLab
- Creative: Adobe, Affinity, Blender, Blackmagic, Foundry, Maxon
- Scientific / Data: MathWorks, NI, Wolfram, ESRI, Anaconda
- Networking / Security: Cisco, Palo Alto, Fortinet, Wireshark, Nmap, PuTTY
- Remote access: AnyDesk, TeamViewer, Zoom, Slack
- Virtualisation / OS: VMware, VirtualBox, Ubuntu, Red Hat
- Cloud: AWS, Google Cloud, Apple
- Databases: Oracle, MySQL, PostgreSQL, MongoDB, Elastic
Product aliases are supported — searching for "autocad" automatically resolves to the Autodesk vendor profile. See query_parser.py for the full registry.
- SearXNG engine availability: Not all six engines respond on every query. Consensus scoring adapts to however many responded, but results vary based on your SearXNG configuration.
- WHOIS coverage: Many domains show
UNKNOWNage due to privacy protection or WHOIS rate limiting. Age is a supporting signal, not a primary one. - No vendor in query: If the query doesn't match any known vendor or product alias, vendor verification is skipped and all results fall back to consensus + anomaly scoring only.
- Not a replacement for VirusTotal: Arkoi is specifically designed to catch SEO poisoning through search result analysis. For deep malware analysis of a specific file or URL, use dedicated tools.
See Issues for known bugs and planned improvements. Contributions are welcome — please read CONTRIBUTING.md first.
MIT