A comprehensive Open Source Intelligence (OSINT) gathering and analysis platform for conducting investigations across multiple data sources. OIF provides automated data collection, pattern recognition, entity extraction, timeline reconstruction, and detailed reporting capabilities.
- Overview
- Features
- Requirements
- Installation
- Quick Start
- Usage
- Investigation Types
- Data Sources
- Analysis Modules
- Report Formats
- Configuration
- Examples
- Project Structure
- Contributing
- License
The OSINT Investigation Framework is designed to help security researchers, investigators, and analysts gather, correlate, and analyze open-source intelligence data. It supports multiple investigation types and can process various data formats including log files, CSV, JSON, and network captures.
| Feature | Description |
|---|---|
| Multi-Source Data Ingestion | Process logs, CSV, JSON, network captures, and more |
| Pattern Recognition | Automated regex-based entity extraction |
| Entity Extraction | Identify emails, IPs, URLs, domains, hashes, crypto wallets, etc. |
| Timeline Reconstruction | Chronologically organize events from multiple sources |
| Relationship Mapping | Discover connections between extracted entities |
| Anomaly Detection | Identify unusual patterns in network and log data |
| Automated Reporting | Generate reports in Markdown, JSON, or CSV formats |
| SQLite Database Storage | Persist investigation data for future reference |
| Caching System | Improve performance with intelligent data caching |
| LLM Integration | Local Ollama-powered AI analysis for enhanced insights |
- 📧 Email Addresses
- 🌐 IP Addresses (IPv4)
- 🔗 URLs & Domains
- 📞 Phone Numbers
- 🔐 Hashes (MD5, SHA1, SHA256)
- 💰 Cryptocurrency Wallets (Bitcoin, Ethereum)
- 📱 Social Media Handles
- 🖥️ MAC Addresses
- 📁 File Paths
- ⏰ Timestamps
- Python 3.8 or higher
- Core packages for full file format support (see requirements.txt)
- Ollama (optional, for LLM-enhanced analysis) - https://ollama.ai
- Tesseract (optional, for OCR text extraction from images)
git clone https://github.com/yourusername/OSINT-Investigative-Framework.git
cd OSINT-Investigative-FrameworkWindows (PowerShell):
python -m venv oifENV
.\oifENV\Scripts\Activate.ps1Windows (Command Prompt):
python -m venv oifENV
oifENV\Scripts\activate.batLinux/macOS:
python3 -m venv oifENV
source oifENV/bin/activateThe framework uses only Python standard library modules. However, if you want to install optional dependencies for enhanced functionality:
pip install -r requirements.txtpython oif-v1.py --helpSimply run the script without arguments to enter interactive mode:
python oif-v1.pyAnalyze a specific file:
python oif-v1.py analyze --source ./logs/access.log --output ./resultspython oif-v1.py extract --source ./document.txt --format jsonThe framework provides several commands for different operations:
python oif-v1.py init --name "Case 001" --type person --output ./case001Options:
--name, -n: Investigation name (required)--type, -t: Investigation type (default: incident)--targets: Comma-separated list of targets--sources: Comma-separated list of data source paths--output, -o: Output directory (default: ./investigation)
python oif-v1.py run --config ./case001/config.jsonpython oif-v1.py analyze --source ./data/logfile.log --output ./analysis# Text output
python oif-v1.py extract --source ./document.txt
# JSON output
python oif-v1.py extract --source ./document.txt --format jsonpython oif-v1.py search --database ./case001/investigation.db --query "192.168"Launch interactive mode for a guided investigation experience:
python oif-v1.pyAvailable Interactive Commands:
| Command | Description |
|---|---|
help |
Display help message |
new <type> <name> |
Create a new investigation of specified type |
load <config> |
Load an existing investigation |
add target <t> |
Add an investigation target |
add source <s> |
Add a data source |
run |
Execute the investigation |
findings |
Display all findings |
entities |
Display extracted entities |
export <format> |
Export report (markdown/json/csv) |
status |
Show investigation status |
exit or quit |
Exit interactive mode |
The framework supports the following investigation types:
| Type | Description |
|---|---|
PERSON |
Individual person investigations |
ORGANIZATION |
Company or organization research |
DOMAIN |
Domain name investigations |
IP_ADDRESS |
IP address analysis |
EMAIL |
Email address investigations |
PHONE |
Phone number lookups |
SOCIAL_MEDIA |
Social media account research |
CRYPTOCURRENCY |
Cryptocurrency wallet tracking |
VEHICLE |
Vehicle-related investigations |
LOCATION |
Geographic location research |
INCIDENT |
Security incident analysis |
NETWORK |
Network traffic analysis |
MALWARE |
Malware analysis investigations |
| Category | Extensions | Description | Required Package |
|---|---|---|---|
| Documents | |||
| PDF Files | .pdf |
Adobe PDF documents with text & table extraction | pdfplumber or PyPDF2 |
| Word Documents | .docx |
Microsoft Word documents | python-docx |
| Text Files | .txt, .text, .md, .rst |
Plain text and markdown | Built-in |
| Spreadsheets | |||
| Excel (modern) | .xlsx |
Microsoft Excel 2007+ | openpyxl |
| Excel (legacy) | .xls |
Microsoft Excel 97-2003 | xlrd |
| CSV Files | .csv |
Comma-separated values | Built-in |
| Data Formats | |||
| JSON Files | .json |
JSON-formatted data | Built-in |
| XML Files | .xml |
XML documents | Built-in |
| YAML Files | .yaml, .yml |
YAML configuration files | pyyaml |
| Email Files | .eml, .msg |
Email messages with headers, body & attachments | Built-in |
| Images | |||
| Common Formats | .jpg, .jpeg, .png, .gif, .bmp |
Standard image formats | Pillow |
| High Quality | .tiff, .tif, .webp |
Professional image formats | Pillow |
| RAW Formats | .raw, .cr2, .nef, .arw |
Camera RAW files | Pillow |
| Other | .ico, .heic, .heif |
Icons and Apple formats | Pillow |
| Logs & Network | |||
| Log Files | .log |
Application and system logs | Built-in |
| Network Captures | .pcap, .netflow, .conn |
Network traffic data | Built-in |
| Config Files | .ini, .cfg, .conf |
Configuration files | Built-in |
| Archives | |||
| Compressed | .bz2 |
BZ2 compressed files | Built-in |
When processing images, the framework extracts:
- EXIF Metadata: Camera info, date taken, software used
- GPS Coordinates: Location data with lat/long conversion
- Image Properties: Dimensions, color mode, format
- OCR Text: Text extraction from images (requires Tesseract)
Email files are parsed to extract:
- Headers: From, To, Cc, Subject, Date, Message-ID, X-Originating-IP
- Body Content: Plain text and HTML versions
- Attachments: Filename, type, and size information
- Recipient Lists: All recipients across To, Cc, Bcc fields
- LOG_FILE: Application and system logs
- CSV_FILE: Structured tabular data
- JSON_FILE: API responses and structured data
- NETWORK_CAPTURE: Packet captures and flow data
- WEATHER_DATA: Location-based weather information
- PUBLIC_RECORDS: Public records databases
Automatically extracts entities from all collected data using regex patterns:
- Emails, IPs, URLs, domains
- Phone numbers, hashes
- Cryptocurrency wallets
- Social media handles
- MAC addresses, file paths
Builds a chronological timeline of events by:
- Extracting timestamps from all records
- Sorting events chronologically
- Identifying event sequences and patterns
Identifies suspicious patterns:
- Unusual port activity in network connections
- High error rates in log files
- Abnormal traffic patterns
Maps connections between entities:
- Co-occurrence analysis
- Source correlation
- Entity relationship graphs
Leverages local Large Language Models for advanced analysis:
- Enhanced Entity Extraction: Uses LLM for contextual entity identification
- Log Anomaly Detection: AI-powered detection of suspicious patterns
- Entity Correlation: Intelligent relationship discovery between entities
- Threat Classification: Automated threat indicator classification
- Investigation Summaries: AI-generated executive summaries
Supported Models (in order of preference):
wizardlm2:latest- Default - Optimized for GTX 1070, strong reasoningllama3.1:latest- Best overall performancephi3:3.8b- Fastest inferencemistral:7b-instruct- Best instruction followinggemma3:4b- Good alternative
Setup Ollama:
# Install Ollama (https://ollama.ai)
# Start Ollama server
ollama serve
# Pull a recommended model
ollama pull wizardlm2:latestHuman-readable report with:
- Executive summary
- Findings by severity
- Detailed finding descriptions
- Evidence and recommendations
Machine-readable format ideal for:
- Integration with other tools
- Data interchange
- Automated processing
Spreadsheet-compatible format for:
- Data analysis in Excel/Google Sheets
- Bulk data review
- Custom filtering
{
"name": "Investigation Name",
"type": "INCIDENT",
"targets": ["target1", "target2"],
"data_sources": ["./path/to/data"],
"output_dir": "./output",
"api_keys": {},
"custom_patterns": {},
"max_depth": 3,
"timeout": 30,
"parallel_workers": 4,
"enable_caching": true,
"cache_ttl": 3600,
"report_format": "markdown"
}| Option | Type | Default | Description |
|---|---|---|---|
name |
string | - | Investigation name |
type |
string | INCIDENT | Investigation type |
targets |
array | [] | List of investigation targets |
data_sources |
array | [] | Paths to data sources |
output_dir |
string | ./output | Output directory |
api_keys |
object | {} | API keys for external services |
custom_patterns |
object | {} | Custom regex patterns |
max_depth |
int | 3 | Maximum recursion depth |
timeout |
int | 30 | Request timeout in seconds |
parallel_workers |
int | 4 | Number of parallel workers |
enable_caching |
bool | true | Enable data caching |
cache_ttl |
int | 3600 | Cache time-to-live (seconds) |
report_format |
string | markdown | Report format |
# Initialize investigation
python oif-v1.py init --name "Security Audit 2024" --type incident --output ./audit
# Add data sources to config.json, then run
python oif-v1.py run --config ./audit/config.jsonpython oif-v1.py extract --source ./suspicious_email.txt --format json > entities.jsonpython oif-v1.py
osint> new NETWORK "Network Investigation"
Created NETWORK investigation: Network Investigation
osint> add source ./network_logs/
Added source: ./network_logs/
osint> add target 192.168.1.100
Added target: 192.168.1.100
osint> run
[INFO] Starting investigation: Network Investigation
[INFO] Phase 1: Collecting data...
[INFO] Phase 2: Analyzing data...
[INFO] Phase 3: Generating reports...
[INFO] Investigation complete. Found 5 findings.
osint> findings
[HIGH] High error rate detected in logs
Error rate: 15.2%
osint> export json
Report exported to ./INVESTIGATIONS/network_investigation/
osint> exit
Goodbye!The framework supports various investigation types. Here are examples of how to create and conduct investigations for each type:
python oif-v1.py
osint> new PERSON "John Doe"
Created PERSON investigation: John Doe
osint> add target "[email protected]"
osint> add target "@johndoe_twitter"
osint> add source ./john_doe_emails/
osint> runosint> new ORGANIZATION "Acme Corporation"
Created ORGANIZATION investigation: Acme Corporation
osint> add target "acme.com"
osint> add target "192.168.1.0/24"
osint> add source ./acme_logs/
osint> runosint> new DOMAIN "suspicious-site.net"
Created DOMAIN investigation: suspicious-site.net
osint> add target "suspicious-site.net"
osint> add source ./domain_logs/
osint> runosint> new IP_ADDRESS "192.168.1.100"
Created IP_ADDRESS investigation: 192.168.1.100
osint> add target "192.168.1.100"
osint> add source ./network_logs/
osint> runosint> new EMAIL "[email protected]"
Created EMAIL investigation: [email protected]
osint> add target "[email protected]"
osint> add source ./email_logs/
osint> runosint> new PHONE "+1-555-123-4567"
Created PHONE investigation: +1-555-123-4567
osint> add target "+1-555-123-4567"
osint> add source ./phone_records/
osint> runosint> new SOCIAL_MEDIA "@johndoe_twitter"
Created SOCIAL_MEDIA investigation: @johndoe_twitter
osint> add target "@johndoe_twitter"
osint> add target "facebook.com/john.doe"
osint> add source ./social_media_data/
osint> runosint> new CRYPTOCURRENCY "1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa"
Created CRYPTOCURRENCY investigation: 1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa
osint> add target "1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa"
osint> add source ./blockchain_data/
osint> runosint> new VEHICLE "VIN: 1HGCM82633A123456"
Created VEHICLE investigation: VIN: 1HGCM82633A123456
osint> add target "1HGCM82633A123456"
osint> add source ./vehicle_records/
osint> runosint> new LOCATION "New York City, NY"
Created LOCATION investigation: New York City, NY
osint> add target "40.7128,-74.0060"
osint> add source ./location_data/
osint> runosint> new INCIDENT "Data Breach 2024"
Created INCIDENT investigation: Data Breach 2024
osint> add target "breach_logs"
osint> add source ./incident_logs/
osint> runosint> new NETWORK "Corporate LAN Analysis"
Created NETWORK investigation: Corporate LAN Analysis
osint> add target "10.0.0.0/8"
osint> add source ./network_traffic/
osint> runosint> new MALWARE "Trojan.Downloader"
Created MALWARE investigation: Trojan.Downloader
osint> add target "trojan_hash"
osint> add source ./malware_samples/
osint> run# List available investigations
osint> load
Available investigations:
- john_doe (PERSON)
- acme_corp (ORGANIZATION)
- security_breach (INCIDENT)
# Load a specific investigation
osint> load john_doe
Loaded investigation: john_doe
# Check status
osint> status
Investigation: john_doe (PERSON)
Status: Active
Sources: 3 directories
Findings: 15
Last run: 2024-12-09 14:30:00
# View findings
osint> findings
[HIGH] Suspicious email activity detected
From: [email protected]
To: [email protected]
# Export report
osint> export markdown
Report exported to ./INVESTIGATIONS/john_doe/report.mdOSINT-Investigative-Framework/
├── oif-v1.py # Main application file
├── oifENV/ # Python virtual environment
├── requirements.txt # Python dependencies
├── README.md # This documentation
├── .gitignore # Git ignore rules
└── INVESTIGATIONS/ # Investigation output directory (created automatically)
├── investigation_name/
│ ├── config.json # Investigation configuration
│ ├── report.md # Markdown report
│ ├── report.json # JSON report
│ ├── investigation.db # SQLite database
│ └── .cache/ # Cached data
└── ...
Findings are categorized by severity:
| Level | Value | Description |
|---|---|---|
| 🔴 CRITICAL | 5 | Immediate action required |
| 🟠 HIGH | 4 | Significant concern |
| 🟡 MEDIUM | 3 | Moderate concern |
| 🟢 LOW | 2 | Minor concern |
| 🔵 INFO | 1 | Informational |
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
This tool is intended for legal and ethical use only. Users are responsible for ensuring compliance with all applicable laws and regulations when conducting OSINT investigations. The authors are not responsible for any misuse of this software.
For questions, issues, or feature requests, please open an issue on the GitHub repository.
OSINT Investigation Framework - Version 1.0.0