Thanks to visit codestin.com
Credit goes to github.com

Skip to content

feat(iac): Add Ansible role for Netdata deployment#21668

Draft
ktsaou wants to merge 1 commit intonetdata:masterfrom
ktsaou:provisioning-systems
Draft

feat(iac): Add Ansible role for Netdata deployment#21668
ktsaou wants to merge 1 commit intonetdata:masterfrom
ktsaou:provisioning-systems

Conversation

@ktsaou
Copy link
Member

@ktsaou ktsaou commented Jan 29, 2026

Summary

  • Add Infrastructure as Code (IaC) support for deploying Netdata with Ansible
  • Reusable role with profile-based configuration (standalone, parent, child, child_minimal)
  • Cloud claiming support via claim.conf
  • Streaming configuration for parent/child architectures
  • Managed files with automatic restart on changes
  • Example inventory and group variables
  • E2E testing framework using libvirt VMs and Docker containers

What's Included

Role (src/IaC/ansible/roles/netdata/)

  • Installation via kickstart.sh
  • Config directory auto-detection
  • Profile resolution with collision detection
  • Managed file deployment (static and templates)
  • Streaming setup for both children and parents
  • Cloud claiming with proxy support

Documentation

  • src/IaC/README.md - End-user overview of IaC provisioning
  • src/IaC/AGENTS.md - Internal guidelines for all provisioning systems
  • src/IaC/ansible/README.md - User guide for Ansible deployment
  • src/IaC/ansible/AGENTS.md - Technical reference for the implementation

Example Inventory (src/IaC/ansible/inventories/example/)

  • Sample hosts with profile assignments
  • Group variables for claiming and profiles
  • Configuration files for each profile

E2E Testing (src/IaC/ansible/e2e/)

  • Test script for Ubuntu 22.04, Debian 12, Rocky 9
  • libvirt VM provisioning with cloud-init
  • Docker container for fast smoke tests

Test plan

  • Run E2E tests on Ubuntu 22.04, Debian 12, Rocky 9
  • Verify standalone profile works
  • Verify parent/child streaming works
  • Verify Cloud claiming works
  • Test with AWX/Automation Controller

Summary by cubic

Adds an Ansible role to install and configure Netdata with profiles (standalone, parent, child, child_minimal), Cloud claiming, and streaming. Includes example inventory, documentation, and an E2E test harness for Ubuntu 22.04, Debian 12, and Rocky 9.

  • New Features

    • Reusable Ansible role using kickstart.sh with config dir auto-detection.
    • Profile-based, file-level configuration with collision detection.
    • Cloud claiming via claim.conf with proxy/insecure options and reclaim support.
    • Streaming for parent/child setups, including parent API key entries.
    • Managed files (static or templates) with automatic service restart.
    • Example inventory/group vars, plus end-user and technical docs; libvirt + Docker E2E tests.
  • Migration

    • Copy inventories/example, set claim token/rooms, assign profiles per host, then run playbooks/netdata.yml.
    • Store secrets in src/IaC/.netdata-iac-claim.env; .gitignore updated to ignore e2e artifacts.
    • No breaking changes for existing setups.

Written for commit 1adc8f1. Summary will update on new commits.

Add Infrastructure as Code support for deploying Netdata with Ansible:

- Reusable role with profile-based configuration
- Support for standalone, parent, child, and child_minimal profiles
- Cloud claiming via claim.conf
- Streaming configuration for parent/child architectures
- Managed files with automatic restart on changes
- Example inventory with group variables
- E2E testing framework (libvirt + Docker)
- Documentation for end-users and internal guidelines
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 issues found across 34 files

Confidence score: 3/5

  • Missing checksum verification for the Netdata kickstart download in src/IaC/ansible/roles/netdata/tasks/install.yml could allow a tampered script to be executed, which is a concrete security risk.
  • netdata_claim_enabled defaulting to true and the undefined netdata_managed_files_dest_profiles access in src/IaC/ansible/roles/netdata/defaults/main.yml and src/IaC/ansible/roles/netdata/tasks/managed-files.yml can cause the role to fail out of the box.
  • Score reflects multiple medium-to-high severity reliability/security issues, so there is some merge risk despite the changes being localized.
  • Pay close attention to src/IaC/ansible/roles/netdata/tasks/install.yml, src/IaC/ansible/roles/netdata/defaults/main.yml, src/IaC/ansible/roles/netdata/tasks/managed-files.yml - checksum verification and default/undefined variable handling.
Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="src/IaC/ansible/roles/netdata/tasks/install.yml">

<violation number="1" location="src/IaC/ansible/roles/netdata/tasks/install.yml:28">
P1: Add checksum verification for the kickstart script download to prevent executing a tampered artifact.</violation>
</file>

<file name="src/IaC/ansible/roles/netdata/defaults/main.yml">

<violation number="1" location="src/IaC/ansible/roles/netdata/defaults/main.yml:5">
P2: Defaulting `netdata_claim_enabled` to true causes the role to fail unless callers always set a claim token. Set the default to false so the role works out of the box when no token is provided.</violation>
</file>

<file name="src/IaC/ansible/e2e/run.sh">

<violation number="1" location="src/IaC/ansible/e2e/run.sh:16">
P2: The `run()` wrapper never prints its error block when a command fails because `set -e` exits immediately on `"$@"`. Wrap the command in an `if` so failures are handled inside the function.</violation>
</file>

<file name="src/IaC/README.md">

<violation number="1" location="src/IaC/README.md:3">
P3: Use the product naming convention (“Netdata Agent”) in documentation.</violation>
</file>

<file name="src/IaC/ansible/roles/netdata/tasks/managed-files.yml">

<violation number="1" location="src/IaC/ansible/roles/netdata/tasks/managed-files.yml:32">
P2: Avoid indexing netdata_managed_files_dest_profiles before it exists; the first iteration will fail because the variable is undefined. Default the dictionary before indexing it.</violation>
</file>
Architecture diagram
sequenceDiagram
    participant Ctrl as Ansible Controller
    participant Host as Target Host (Agent)
    participant Parent as Parent Agent (Optional)
    participant Cloud as Netdata Cloud

    Note over Ctrl,Host: NEW: Provisioning & Configuration Flow

    Ctrl->>Host: NEW: Run kickstart.sh (non-interactive)
    Host->>Host: Install Netdata binaries & packages
    
    Ctrl->>Ctrl: NEW: Resolve profiles (Standalone, Parent, Child)
    Note over Ctrl: Check for file collisions between profiles

    Ctrl->>Host: NEW: Deploy managed files (netdata.conf, stream.conf)
    Note over Host: Automatic service restart on file changes

    alt Profile: child / child_minimal
        Host->>Parent: NEW: Initiate streaming (Port 19999)
        Parent-->>Host: Accept metrics (API Key validation)
    else Profile: parent
        Host->>Host: NEW: Configure [web] static-threaded
        Host->>Host: NEW: Define allowed Child API Keys
    end

    Note over Ctrl,Cloud: NEW: Cloud Claiming Flow (State-based)

    opt netdata_claim_enabled: true
        Ctrl->>Host: NEW: Write claim.conf (Token, Rooms, Proxy)
        
        alt netdata_reclaim: true OR not claimed
            Host->>Host: NEW: netdatacli reload-claiming-state
            Host->>Cloud: NEW: Authenticate & Claim node
            Cloud-->>Host: Return claimed_id marker
        else Already Claimed
            Host->>Host: Skip reload (idempotent)
        end
    end

    Ctrl->>Host: Verify netdata.service state (Started/Enabled)
    Host-->>Ctrl: Provisioning Complete
Loading

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

- ansible_facts.os_family == "Debian"

- name: Download kickstart script
get_url:
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Add checksum verification for the kickstart script download to prevent executing a tampered artifact.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/IaC/ansible/roles/netdata/tasks/install.yml, line 28:

<comment>Add checksum verification for the kickstart script download to prevent executing a tampered artifact.</comment>

<file context>
@@ -0,0 +1,61 @@
+    - ansible_facts.os_family == "Debian"
+
+- name: Download kickstart script
+  get_url:
+    url: "{{ netdata_kickstart_url }}"
+    dest: /tmp/netdata-kickstart.sh
</file context>
Fix with Cubic

netdata_kickstart_url: "https://get.netdata.cloud/kickstart.sh"
netdata_release_channel: "stable"

netdata_claim_enabled: true
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Defaulting netdata_claim_enabled to true causes the role to fail unless callers always set a claim token. Set the default to false so the role works out of the box when no token is provided.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/IaC/ansible/roles/netdata/defaults/main.yml, line 5:

<comment>Defaulting `netdata_claim_enabled` to true causes the role to fail unless callers always set a claim token. Set the default to false so the role works out of the box when no token is provided.</comment>

<file context>
@@ -0,0 +1,51 @@
+netdata_kickstart_url: "https://get.netdata.cloud/kickstart.sh"
+netdata_release_channel: "stable"
+
+netdata_claim_enabled: true
+netdata_claim_token: ""
+netdata_claim_rooms: ""
</file context>
Fix with Cubic

printf >&2 "${GRAY}$(pwd) >${NC} ${YELLOW}"
printf >&2 "%q " "$@"
printf >&2 "${NC}\n"
"$@"
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: The run() wrapper never prints its error block when a command fails because set -e exits immediately on "$@". Wrap the command in an if so failures are handled inside the function.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/IaC/ansible/e2e/run.sh, line 16:

<comment>The `run()` wrapper never prints its error block when a command fails because `set -e` exits immediately on `"$@"`. Wrap the command in an `if` so failures are handled inside the function.</comment>

<file context>
@@ -0,0 +1,522 @@
+  printf >&2 "${GRAY}$(pwd) >${NC} ${YELLOW}"
+  printf >&2 "%q " "$@"
+  printf >&2 "${NC}\n"
+  "$@"
+  local exit_code=$?
+  if [[ ${exit_code} -ne 0 ]]; then
</file context>
Fix with Cubic

{{
netdata_managed_files_dest_profiles | default({})
| combine({
item.dest_resolved: (netdata_managed_files_dest_profiles[item.dest_resolved] | default([])) + [ item.__profile | default('host') ]
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Avoid indexing netdata_managed_files_dest_profiles before it exists; the first iteration will fail because the variable is undefined. Default the dictionary before indexing it.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/IaC/ansible/roles/netdata/tasks/managed-files.yml, line 32:

<comment>Avoid indexing netdata_managed_files_dest_profiles before it exists; the first iteration will fail because the variable is undefined. Default the dictionary before indexing it.</comment>

<file context>
@@ -0,0 +1,83 @@
+      {{
+        netdata_managed_files_dest_profiles | default({})
+        | combine({
+            item.dest_resolved: (netdata_managed_files_dest_profiles[item.dest_resolved] | default([])) + [ item.__profile | default('host') ]
+          }, recursive=True)
+      }}
</file context>
Fix with Cubic

@@ -0,0 +1,99 @@
# Netdata Infrastructure as Code (IaC)

Deploy and configure Netdata agents at scale using your preferred configuration management tool.
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3: Use the product naming convention (“Netdata Agent”) in documentation.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/IaC/README.md, line 3:

<comment>Use the product naming convention (“Netdata Agent”) in documentation.</comment>

<file context>
@@ -0,0 +1,99 @@
+# Netdata Infrastructure as Code (IaC)
+
+Deploy and configure Netdata agents at scale using your preferred configuration management tool.
+
+## Supported Tools
</file context>
Fix with Cubic

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an Ansible-based IaC workflow for installing and configuring Netdata with profile-based configuration, Cloud claiming, streaming support, and an accompanying E2E test harness.

Changes:

  • Introduces a reusable Ansible role (roles/netdata) to install Netdata via kickstart, manage config files, configure streaming, and handle Cloud claiming.
  • Adds example inventories/group vars and profile-based file bundles to demonstrate common topologies (standalone/parent/child).
  • Adds libvirt + Docker E2E test harness plus documentation, and updates .gitignore for secrets and test artifacts.

Reviewed changes

Copilot reviewed 33 out of 34 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/IaC/ansible/roles/netdata/templates/claim.conf.j2 Jinja template for Netdata claim.conf generation.
src/IaC/ansible/roles/netdata/tasks/stream.yml Streaming configuration via stream.conf ini updates (child + parent patterns).
src/IaC/ansible/roles/netdata/tasks/stream-parent-options.yml Helper task include for per-parent-key stream options.
src/IaC/ansible/roles/netdata/tasks/service.yml Ensures netdata service is enabled and started.
src/IaC/ansible/roles/netdata/tasks/profiles.yml Resolves profile definitions into an effective managed-file list.
src/IaC/ansible/roles/netdata/tasks/profile-files.yml Collects managed files for each selected profile.
src/IaC/ansible/roles/netdata/tasks/managed-files.yml Copies/templates managed files, detects destination collisions, and triggers restart.
src/IaC/ansible/roles/netdata/tasks/main.yml Orchestrates role execution order (install → config-dir → profiles → files → config → stream → service → claim).
src/IaC/ansible/roles/netdata/tasks/install.yml Installs Netdata via downloaded kickstart script with configurable args.
src/IaC/ansible/roles/netdata/tasks/configure.yml Optional ini-style tweaks for netdata.conf.
src/IaC/ansible/roles/netdata/tasks/config-dir.yml Auto-detects/configures Netdata config directory and ensures it exists.
src/IaC/ansible/roles/netdata/tasks/claim.yml Cloud claiming workflow (template claim.conf + netdatacli reload + marker waits).
src/IaC/ansible/roles/netdata/handlers/main.yml Defines Netdata restart handler.
src/IaC/ansible/roles/netdata/defaults/main.yml Provides default role variables (claiming, streaming, managed files, etc.).
src/IaC/ansible/playbooks/netdata.yml Playbook entrypoint applying the netdata role to hosts.
src/IaC/ansible/inventories/example/inventory.yml Example inventory demonstrating profile assignment per host.
src/IaC/ansible/inventories/example/group_vars/netdata_standalone.yml Deprecated placeholder group vars for older layout.
src/IaC/ansible/inventories/example/group_vars/netdata_parent.yml Deprecated placeholder group vars for older layout.
src/IaC/ansible/inventories/example/group_vars/netdata_child_minimal.yml Deprecated placeholder group vars for older layout.
src/IaC/ansible/inventories/example/group_vars/netdata_child.yml Deprecated placeholder group vars for older layout.
src/IaC/ansible/inventories/example/group_vars/all.yml Example end-user configuration (claiming + managed files + profile definitions).
src/IaC/ansible/inventories/example/files/profiles/parent/stream.conf Example parent streaming config file (profile-managed).
src/IaC/ansible/inventories/example/files/profiles/parent/netdata.conf Example parent Netdata config file (profile-managed).
src/IaC/ansible/inventories/example/files/profiles/child_minimal/stream.conf Example minimal child streaming config file (profile-managed).
src/IaC/ansible/inventories/example/files/profiles/child_minimal/netdata.conf Example minimal child Netdata config file (profile-managed).
src/IaC/ansible/inventories/example/files/profiles/child/stream.conf Example child streaming config file (profile-managed).
src/IaC/ansible/inventories/example/files/global/health.d/custom.conf Example global health override file (managed file example).
src/IaC/ansible/e2e/run.sh E2E automation to provision targets (libvirt + Docker) and validate claiming/streaming.
src/IaC/ansible/e2e/README.md E2E usage/prerequisites documentation.
src/IaC/ansible/README.md End-user guide for deploying Netdata with Ansible (profiles, claiming, streaming).
src/IaC/ansible/AGENTS.md Technical reference for the Ansible implementation details.
src/IaC/README.md Top-level IaC overview and concepts.
src/IaC/AGENTS.md Cross-tool IaC guidelines and conventions.
.gitignore Ignores IaC claim env file and E2E artifacts (plus additional generated artifacts).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +12 to +26
run() {
printf >&2 "${GRAY}$(pwd) >${NC} ${YELLOW}"
printf >&2 "%q " "$@"
printf >&2 "${NC}\n"
"$@"
local exit_code=$?
if [[ ${exit_code} -ne 0 ]]; then
echo -e >&2 "${RED}━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━${NC}"
echo -e >&2 "${RED}[ERROR]${NC} Command failed with exit code ${exit_code}: ${YELLOW}$1${NC}"
echo -e >&2 "${RED} Full command:${NC} $*"
echo -e >&2 "${RED} Working dir:${NC} $(pwd)"
echo -e >&2 "${RED}━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━${NC}"
return $exit_code
fi
}
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run() attempts to capture the failing command’s exit code and print a custom error, but with set -e enabled the script will exit immediately on a failing "$@" before reaching the error-handling block. Update run() to prevent set -e from aborting (e.g., use if ! "$@"; then ... fi or temporarily set +e around the command) so errors are reported as intended.

Copilot uses AI. Check for mistakes.
file:
path: "{{ netdata_lib_dir }}/cloud.d/claimed_id"
state: absent
when: netdata_reclaim | bool
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

netdata_reclaim currently removes {{ netdata_lib_dir }}/cloud.d/claimed_id even when claiming is disabled. This contradicts the intended behavior of preserving Cloud state when netdata_claim_enabled: false. Gate the reclaim deletion behind netdata_claim_enabled | bool as well (or otherwise ensure disabling claim never removes existing claim markers).

Suggested change
when: netdata_reclaim | bool
when:
- netdata_reclaim | bool
- netdata_claim_enabled | bool

Copilot uses AI. Check for mistakes.
Comment on lines +5 to +6
- not netdata_claim_enabled or (netdata_claim_token | length > 0)
fail_msg: "netdata_claim_token is required when netdata_claim_enabled=true"
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claim validation only asserts netdata_claim_token when claiming is enabled, but the docs/examples treat netdata_claim_rooms as required too. Either update the validation to require rooms when claiming is enabled, or update the docs to clearly mark rooms as optional (so users don’t get a harder-to-debug claim failure later).

Suggested change
- not netdata_claim_enabled or (netdata_claim_token | length > 0)
fail_msg: "netdata_claim_token is required when netdata_claim_enabled=true"
- not netdata_claim_enabled or ((netdata_claim_token | length > 0) and (netdata_claim_rooms | length > 0))
fail_msg: "netdata_claim_token and netdata_claim_rooms are required when netdata_claim_enabled=true"

Copilot uses AI. Check for mistakes.
Comment on lines +162 to +168
- Managed file changes
- Configuration changes via `ini_file`
- Service state changes

- `Reload Netdata health` - Triggered by:
- Health config file changes (type: health)

Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doc section references a Reload Netdata health handler and describes when it triggers, but roles/netdata/handlers/main.yml only defines Restart Netdata. Update the documentation to match the actual handlers, or add the missing handler + wiring if health reload is intended.

Suggested change
- Managed file changes
- Configuration changes via `ini_file`
- Service state changes
- `Reload Netdata health` - Triggered by:
- Health config file changes (type: health)
- Managed file changes (including health config files of type `health`)
- Configuration changes via `ini_file`
- Service state changes

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants