Codestin Search App

heiDGAF - Domain Generation Algorithms Finder

Machine learning-based DNS classifier for detecting Domain Generation Algorithms (DGAs), tunneling, and data exfiltration by malicious actors.
Explore the docs »

View Demo · Report Bug · Request Feature

Caution

The project is under active development right now. Everything might change, break, or move around quickly.

Continuous Integration

About the Project

Getting Started

If you want to use heiDGAF, just use the provided Docker compose to quickly bootstrap your environment:

docker compose -f docker/docker-compose.yml up

Configuration

The following table lists the most important configuration parameters with their default values. The configuration options can be set in the config.yaml in the root directory.

Path	Description	Default Value
logging	Global and module-specific logging configurations.
`logging.base.debug`	Default debug logging level for all modules if not overridden.	`false`
`logging.modules.<module_name>.debug`	Specific debug logging level for a given module (e.g., `log_storage.logserver`).	`false` (for all listed modules)
pipeline	Configuration for the data processing pipeline stages.
`pipeline.log_storage.logserver.input_file`	Path to the input file for the log server.	`"/opt/file.txt"`
`pipeline.log_collection.collector.logline_format`	Defines the format of incoming log lines, specifying field name, type, and parsing rules/values.	Array of field definitions (e.g., `["timestamp", Timestamp, "%Y-%m-%dT%H:%M:%S.%fZ"]`)
`pipeline.log_collection.batch_handler.batch_size`	Number of log lines to collect before sending a batch.	`10000`
`pipeline.log_collection.batch_handler.batch_timeout`	Maximum time (in seconds) to wait before sending a partially filled batch.	`30.0`
`pipeline.log_collection.batch_handler.subnet_id.ipv4_prefix_length`	IPv4 prefix length for subnet identification.	`24`
`pipeline.log_collection.batch_handler.subnet_id.ipv6_prefix_length`	IPv6 prefix length for subnet identification.	`64`
`pipeline.data_inspection.inspector.mode`	Mode of operation for the data inspector.	`univariate` (options: `multivariate`, `ensemble`)
`pipeline.data_inspection.inspector.ensemble.model`	Model to use when inspector mode is `ensemble`.	`WeightEnsemble`
`pipeline.data_inspection.inspector.ensemble.module`	Python module for the ensemble model.	`streamad.process`
`pipeline.data_inspection.inspector.ensemble.model_args`	Arguments for the ensemble model.	(empty by default)
`pipeline.data_inspection.inspector.models`	List of models to use for data inspection (e.g., anomaly detection).	Array of model definitions (e.g., `{"model": "ZScoreDetector", "module": "streamad.model", "model_args": {"is_global": false}}`)
`pipeline.data_inspection.inspector.anomaly_threshold`	Threshold for classifying an observation as an anomaly.	`0.01`
`pipeline.data_inspection.inspector.score_threshold`	Threshold for the anomaly score.	`0.5`
`pipeline.data_inspection.inspector.time_type`	Unit of time used in time range calculations.	`ms`
`pipeline.data_inspection.inspector.time_range`	Time range for inspection.	`20`
`pipeline.data_analysis.detector.model`	Model to use for data analysis (e.g., DGA detection).	`rf` (Random Forest) option: `XGBoost`
`pipeline.data_analysis.detector.checksum`	Checksum for the model file to ensure integrity.	`ba1f718179191348fe2abd51644d76191d42a5d967c6844feb3371b6f798bf06`
`pipeline.data_analysis.detector.base_url`	Base URL for downloading the model if not present locally.	`https://heibox.uni-heidelberg.de/d/0d5cbcbe16cd46a58021/`
`pipeline.data_analysis.detector.threshold`	Threshold for the detector's classification.	`0.5`
`pipeline.monitoring.clickhouse_connector.batch_size`	Batch size for sending data to ClickHouse.	`50`
`pipeline.monitoring.clickhouse_connector.batch_timeout`	Batch timeout (in seconds) for sending data to ClickHouse.	`2.0`
environment	Configuration for external services and infrastructure.
`environment.kafka_brokers`	List of Kafka broker hostnames and ports.	`[{"hostname": "kafka1", "port": 8097}, {"hostname": "kafka2", "port": 8098}, {"hostname": "kafka3", "port": 8099}]`
`environment.kafka_topics.pipeline.<topic_name>`	Kafka topic names for various stages in the pipeline.	e.g., `logserver_in: "pipeline-logserver_in"`
`environment.monitoring.clickhouse_server.hostname`	Hostname of the ClickHouse server for monitoring data.	`clickhouse-server`

Developing

Important

More information will be added soon! Go and watch the repository for updates.

Install all Python requirements:

python -m venv .venv
source .venv/bin/activate

sh install_requirements.sh

Alternatively, you can use pip install and enter all needed requirements individually with -r requirements.*.txt.

Now, you can start each stage, e.g. the inspector:

python src/inspector/main.py

Train your own models

Important

More information will be added soon! Go and watch the repository for updates.

Currently, we enable two trained models, namely XGBoost and RandomForest.

python -m venv .venv
source .venv/bin/activate

pip install -r requirements/requirements.train.txt

For training our models, we rely on the following data sets:

However, we compute all feature separately and only rely on the domain and class. Currently, we are only interested in binary classification, thus, the class is either benign or malicious.

(back to top)

Data

Important

We support custom schemes.

loglines:
  fields:
    - [ "timestamp", RegEx, '^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d{3}Z$' ]
    - [ "status_code", ListItem, [ "NOERROR", "NXDOMAIN" ], [ "NXDOMAIN" ] ]
    - [ "client_ip", IpAddress ]
    - [ "dns_server_ip", IpAddress ]
    - [ "domain_name", RegEx, '^(?=.{1,253}$)((?!-)[A-Za-z0-9-]{1,63}(?<!-)\.)+[A-Za-z]{2,63}$' ]
    - [ "record_type", ListItem, [ "A", "AAAA" ] ]
    - [ "response_ip", IpAddress ]
    - [ "size", RegEx, '^\d+b$' ]

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

Top contributors:

(back to top)

License

Distributed under the EUPL License. See LICENSE.txt for more information.

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 936 Commits
.github		.github
assets		assets
data		data
docker		docker
docs		docs
requirements		requirements
src		src
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yml		.readthedocs.yml
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
config.yaml		config.yaml
install_requirements.sh		install_requirements.sh
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

heiDGAF - Domain Generation Algorithms Finder

About the Project

Getting Started

Configuration

Developing

Train your own models

Data

Contributing

Top contributors:

License

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

License

FreWill9/heiDGAF

Folders and files

Latest commit

History

Repository files navigation

heiDGAF - Domain Generation Algorithms Finder

About the Project

Getting Started

Configuration

Developing

Train your own models

Data

Contributing

Top contributors:

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages