-
Notifications
You must be signed in to change notification settings - Fork 6.4k
feat(iac): Add Ansible role for Netdata deployment #21668
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
ktsaou
wants to merge
1
commit into
netdata:master
Choose a base branch
from
ktsaou:provisioning-systems
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,130 @@ | ||
| # IaC Provisioning - Agent Guidelines | ||
|
|
||
| This document defines the rules and principles that **all** Netdata provisioning systems (Ansible, Terraform, Puppet, Chef, Salt) must follow. | ||
|
|
||
| ## Core Principles | ||
|
|
||
| ### 1. Installation Method | ||
| - All systems use `kickstart.sh` as the installation baseline | ||
| - The kickstart script auto-selects native packages when available | ||
| - Support both `stable` and `nightly` release channels | ||
| - Installation should be idempotent (re-running produces same result) | ||
|
|
||
| ### 2. Cloud Claiming | ||
| - Claiming to Netdata Cloud is enabled by default | ||
| - Claiming is done by writing `claim.conf` directly and running `netdatacli reload-claiming-state` | ||
| - Do NOT use `netdata-claim.sh` script | ||
| - Support proxy configuration (`env`, `none`, or explicit URL) | ||
| - Support `insecure` option for environments with TLS inspection | ||
| - Disabling claim removes only `claim.conf`, preserves `/var/lib/netdata/cloud.d` | ||
|
|
||
| ### 3. Profile System (File-Level Isolation) | ||
| - Profiles are **file-level bundles** of configuration | ||
| - Each profile defines a list of managed files | ||
| - A host can use multiple profiles **only if they touch different files** | ||
| - File collisions (same destination from multiple profiles) must be rejected | ||
| - Profiles do NOT merge configuration - they provide complete files | ||
|
|
||
| Standard profile types: | ||
| - `standalone` - Single agent, no streaming | ||
| - `child` - Streams to parent, may have local storage | ||
| - `child_minimal` - Streams to parent, minimal local resources | ||
| - `parent` - Receives streams from children | ||
|
|
||
| ### 4. Managed Files | ||
| - Managed files have `src` (source path) and `dest` (destination relative to config dir) | ||
| - Files can be static or templates | ||
| - File types: `netdata_conf`, `stream_conf`, `health`, `plugin_conf`, `generic` | ||
| - Changes to managed files trigger Netdata restart | ||
| - No change = no restart (idempotent) | ||
| - Do NOT delete unmanaged files - only manage explicit files | ||
|
|
||
| ### 5. Configuration Approach | ||
| - Full config file ownership (not partial patching) | ||
| - Users provide complete `netdata.conf`, `stream.conf`, etc. | ||
| - Optional ini-style tweaks can be applied after full files | ||
| - If providing full config files, leave per-setting variables empty | ||
|
|
||
| ### 6. Standard Variables (Cross-Tool Consistency) | ||
|
|
||
| All provisioning tools must support these variable names: | ||
|
|
||
| **Installation:** | ||
| - `netdata_release_channel` - `stable` or `nightly` | ||
| - `netdata_install_only_if_missing` - Skip if already installed | ||
|
|
||
| **Claiming:** | ||
| - `netdata_claim_enabled` - Enable/disable claiming | ||
| - `netdata_claim_token` - Cloud claim token | ||
| - `netdata_claim_rooms` - Cloud room IDs | ||
| - `netdata_claim_url` - Cloud API URL | ||
| - `netdata_claim_proxy` - Proxy setting | ||
| - `netdata_claim_insecure` - Allow insecure TLS | ||
| - `netdata_reclaim` - Force reclaim if already claimed | ||
|
|
||
| **Database:** | ||
| - `netdata_db_mode` - `dbengine`, `ram`, or `none` | ||
| - `netdata_db_retention` - Retention in seconds | ||
| - `netdata_dbengine_multihost_disk_space` - Disk space in MB | ||
|
|
||
| **Streaming:** | ||
| - `netdata_stream_enabled` - Enable streaming | ||
| - `netdata_stream_destinations` - List of parent addresses | ||
| - `netdata_stream_api_key` - API key for streaming | ||
| - `netdata_stream_parent_keys` - Parent key definitions (for parents) | ||
|
|
||
| **Other:** | ||
| - `netdata_web_mode` - Web server mode | ||
| - `netdata_ml_enabled` - Machine learning on/off | ||
| - `netdata_health_enabled` - Alerts on/off | ||
| - `netdata_hostname` - Override hostname | ||
|
|
||
| ### 7. Directory Structure | ||
|
|
||
| Each provisioning system follows this layout: | ||
| ``` | ||
| src/IaC/<system>/ | ||
| README.md # End-user documentation | ||
| AGENTS.md # Technical details for AI agents | ||
| roles/ or modules/ # Reusable components | ||
| playbooks/ or manifests/ | ||
| inventories/example/ # Example configuration | ||
| inventory.* # Host definitions | ||
| group_vars/ # Variable files | ||
| files/ # Configuration files | ||
| global/ # Files for all hosts | ||
| profiles/ # Per-profile file bundles | ||
| e2e/ # End-to-end tests | ||
| README.md | ||
| run.sh | ||
| ``` | ||
|
|
||
| ### 8. Testing Requirements | ||
|
|
||
| Each system must have E2E tests covering: | ||
| - Minimum OS matrix: Ubuntu + Debian + RHEL family | ||
| - Service starts and runs | ||
| - Claiming works (when enabled) | ||
| - Streaming connects (parent/child scenarios) | ||
| - Profile application works correctly | ||
|
|
||
| E2E environment: libvirt VMs + Docker containers | ||
|
|
||
| ### 9. Documentation Requirements | ||
|
|
||
| Each system must document: | ||
| - Prerequisites (what users need installed) | ||
| - Quickstart (copy, configure, run) | ||
| - Variable reference (all supported variables) | ||
| - Profile system (how to define and use) | ||
| - Enterprise integration (AWX, Terraform Cloud, etc.) | ||
| - Validation steps (how to verify success) | ||
|
|
||
| ## Implementation Order | ||
|
|
||
| Systems are implemented one at a time in this order: | ||
| 1. Ansible (DONE) | ||
| 2. Terraform | ||
| 3. Puppet | ||
| 4. Chef | ||
| 5. Salt |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,99 @@ | ||
| # Netdata Infrastructure as Code (IaC) | ||
|
|
||
| Deploy and configure Netdata agents at scale using your preferred configuration management tool. | ||
|
|
||
| ## Supported Tools | ||
|
|
||
| | Tool | Status | Directory | | ||
| |------|--------|-----------| | ||
| | Ansible | Ready | [ansible/](ansible/) | | ||
| | Terraform | Planned | - | | ||
| | Puppet | Planned | - | | ||
| | Chef | Planned | - | | ||
| | Salt | Planned | - | | ||
|
|
||
| ## Key Concepts | ||
|
|
||
| ### Profiles | ||
|
|
||
| Profiles let you define different Netdata configurations for different roles in your infrastructure. | ||
|
|
||
| **Common profiles:** | ||
| - **standalone** - Single agent, full local storage, no streaming | ||
| - **parent** - Receives metrics from children, larger retention | ||
| - **child** - Streams metrics to parent, moderate local storage | ||
| - **child_minimal** - Streams metrics to parent, minimal local resources | ||
|
|
||
| Each profile is a bundle of configuration files. You assign profiles to hosts, and the provisioning tool deploys the right files. | ||
|
|
||
| ``` | ||
| Host: web-server-01 -> Profile: child | ||
| Host: web-server-02 -> Profile: child | ||
| Host: metrics-parent -> Profile: parent | ||
| ``` | ||
|
|
||
| ### File-Based Configuration | ||
|
|
||
| Configuration is managed at the **file level**, not by patching individual settings: | ||
| - You provide complete configuration files (`netdata.conf`, `stream.conf`, etc.) | ||
| - The tool deploys your files to the right locations | ||
| - Changes trigger a Netdata restart automatically | ||
|
|
||
| This approach is simpler and more predictable than merging partial configurations. | ||
|
|
||
| ### Cloud Claiming | ||
|
|
||
| Netdata Cloud claiming is supported and enabled by default: | ||
| - Provide your claim token and room ID | ||
| - The tool writes `claim.conf` and reloads the claiming state | ||
| - Proxy support included for restricted networks | ||
|
|
||
| To disable claiming, set `netdata_claim_enabled: false`. | ||
|
|
||
| ## Quick Start | ||
|
|
||
| 1. **Choose your tool** - Pick the provisioning system you already use | ||
| 2. **Copy the example** - Each tool has an example inventory/configuration | ||
| 3. **Set your values** - Claim token, profiles, hosts | ||
| 4. **Run** - Apply the configuration to your infrastructure | ||
|
|
||
| See the README in each tool's directory for specific instructions. | ||
|
|
||
| ## Directory Structure | ||
|
|
||
| ``` | ||
| src/IaC/ | ||
| README.md # This file | ||
| AGENTS.md # Internal guidelines for AI agents | ||
| ansible/ # Ansible role and playbooks | ||
| terraform/ # (planned) | ||
| puppet/ # (planned) | ||
| chef/ # (planned) | ||
| salt/ # (planned) | ||
| ``` | ||
|
|
||
| ## What Gets Configured | ||
|
|
||
| The IaC tools can manage: | ||
| - Netdata installation (via kickstart.sh) | ||
| - Cloud claiming | ||
| - Streaming (parent/child relationships) | ||
| - Database retention settings | ||
| - Web server mode | ||
| - Machine learning on/off | ||
| - Health alerts on/off | ||
| - Custom configuration files | ||
| - Plugin configurations | ||
|
|
||
| ## Requirements | ||
|
|
||
| - Target nodes accessible via SSH (or equivalent for your tool) | ||
| - Sudo/root access on target nodes | ||
| - Internet access to download Netdata (or local mirror) | ||
| - Claim token from Netdata Cloud (if claiming) | ||
|
|
||
| ## Security Notes | ||
|
|
||
| - Store claim tokens securely (Ansible Vault, Terraform secrets, etc.) | ||
| - The tools do not delete unmanaged files | ||
| - Configuration files may contain sensitive data - set appropriate permissions | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P3: Use the product naming convention (“Netdata Agent”) in documentation.
Prompt for AI agents