MENU: **[GETTING STARTED](#getting-started)** | **[HOW IT WORKS](#how-it-works)** | **[FAQ](#faq)** | **[DOCS](#book-documentation)** | **[COMMUNITY](#tada-community)** | **[CONTRIBUTE](#pray-contribute)** | **[LICENSE](#license)**
-> **Important** :bulb:
-> People get addicted to Netdata. Once you use it on your systems, **there's no going back!**
+> [!WARNING]
+> People **get addicted to Netdata.**
+> Once you use it on your systems, **there's no going back!**
[]()
-**TL;DR**
+## Most Energy-Efficient Monitoring Tool
-Netdata is an open-source, real-time infrastructure monitoring platform designed for instant visibility and proactive troubleshooting across your entire IT environment. It captures every metric, every second, providing detailed insights into systems, containers, applications, and logs without compromising performance or requiring complex setup.
+
+
+
-Key Advantages:
+According to the [University of Amsterdam study](https://www.ivanomalavolta.com/files/papers/ICSOC_2023.pdf), Netdata is "the most energy-efficient tool" for monitoring Docker-based systems. The study also shows Netdata excels in CPU usage, RAM usage, and execution time compared to other monitoring solutions.
-- **Instant Insights** Real-time, per-second metrics and visualizations for rapid problem detection.
-- **Automated and Zero-Configuration** Easy deployment with immediate monitoring—no complex setup needed.
-- **ML-Driven Intelligence** Built-in machine learning automatically detects anomalies, predicts issues, and assists in root-cause analysis.
-- **Highly Efficient** Proven minimal resource usage, exceptional scalability, and best-in-class energy efficiency validated by independent research.
-- **Distributed & Secure** Data stays securely within your infrastructure; no centralization required.
+---
-Netdata complements or replaces traditional monitoring tools, offering significant performance and usability advantages over Prometheus, Datadog, Dynatrace, and similar products, while remaining fully compatible and integration-friendly.
+## WHO WE ARE
-Designed for organizations seeking simplified operations, reduced overhead, and cost-effective monitoring solutions, Netdata provides a comprehensive, scalable, and user-friendly approach to observability.
+Netdata is an open-source, real-time infrastructure monitoring platform. Monitor, detect, and act across your entire infrastructure.
-**:sparkles: Key Features**:
+**Core Advantages:**
-- **Real-Time**
+* **Instant Insights** – With Netdata you can access per-second metrics and visualizations.
+* **Zero Configuration** – You can deploy immediately without complex setup.
+* **ML-Powered** – You can detect anomalies, predict issues, and automate analysis.
+* **Efficient** – You can monitor with minimal resource usage and maximum scalability.
+* **Secure & Distributed** – You can keep your data local with no central collection needed.
- Per-second data collection and real-time processing provides immediate visibility into your infrastructure's behavior.
+With Netdata, you get real-time, per-second updates. Clear **insights at a glance**, no complexity.
- _**Unique**: Netdata works in a beat, and everything happens at this pace. You hit Enter in the terminal, and just a second later, the result appears on the dashboard._
+---
-- **Zero-Configuration**
+## Key Features
- Start monitoring in minutes with automatic detection and discovery, fully automated dashboards, and hundreds of pre-configured alerts.
+| Feature | Description | What Makes It Unique |
+|----------------------------|-------------------------------------------|----------------------------------------------------------|
+| **Real-Time** | Per-second data collection and processing | Works in a beat – click and see results instantly |
+| **Zero-Configuration** | Automatic detection and discovery | Auto-discovers everything on the nodes it runs |
+| **ML-Powered** | Unsupervised anomaly detection | Trains multiple ML models per metric at the edge |
+| **Long-Term Retention** | High-performance storage | ~0.5 bytes per sample with tiered storage for archiving |
+| **Advanced Visualization** | Rich, interactive dashboards | Slice and dice data without query language |
+| **Extreme Scalability** | Native horizontal scaling | Parent-Child centralization with multi-million samples/s |
+| **Complete Visibility** | From infrastructure to applications | Simplifies operations and eliminates silos |
+| **Edge-Based** | Processing at your premises | Distributes code instead of centralizing data |
- _**Unique**: Netdata auto-discovers everything on the nodes it runs. All kernel technologies, all processes, all applications, all containers, all hardware components. And with its dynamic configuration, any changes can be done via the dashboard._
+> [!NOTE]
+> Want to put Netdata to the test against Prometheus?
+> Explore the [full comparison](https://www.netdata.cloud/blog/netdata-vs-prometheus-2025/).
-- **ML-Powered**
+---
- Unsupervised anomaly detection and pattern recognition for all metrics, providing advanced correlations and instant root cause analysis.
+## Netdata Ecosystem
- _**Unique**: Netdata trains multiple true ML models per metric, at the edge, for all metrics!_
- _**Unique**: A scoring engine identifies correlations across metrics, applications, nodes, services, even cloud providers and data centers!_
-
-- **Long-Term Retention**
+This three-part architecture enables you to scale from single nodes to complex multi-cloud environments:
- High-performance and efficient tiered storage for years of retention and fast query responses.
+| Component | Description | License |
+|-------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------|
+| **Netdata Agent** | • Core monitoring engine • Handles collection, storage, ML, alerts, exports • Runs on servers, cloud, K8s, IoT • Zero production impact | [GPL v3+](https://www.gnu.org/licenses/gpl-3.0) |
+| **Netdata Cloud** | • Enterprise features • User management, RBAC, horizontal scaling • Centralized alerts • Free community tier • No metric storage centralization | |
+| **Netdata UI** | • Dashboards and visualizations • Free to use • Included in standard packages • Latest version via CDN | [NCUL1](https://app.netdata.cloud/LICENSE.txt) |
- _**Unique**: Netdata needs ~0.5 per sample on disk, offering superb compression for high-resolution data!_
- _**Unique**: A tiered storage engine automatically downsamples old data for archiving, long term retention and capacity planning._
+## What You Can Monitor
-- **Advanced Visualization**
+With Netdata you can monitor all these components across platforms:
- Rich, interactive low-latency dashboards for deep system and applications insights and rapid troubleshooting.
+| Component | Linux | FreeBSD | macOS | Windows |
+|------------------------------------------------------------------------------------------------------------:|:--------------------------------:|:-------:|:-----:|:-------------------------------------------------:|
+| **System Resources** CPU, Memory and system shared resources | Full | Yes | Yes | Yes |
+| **Storage** Disks, Mount points, Filesystems, RAID arrays | Full | Yes | Yes | Yes |
+| **Network** Network Interfaces, Protocols, Firewall, etc | Full | Yes | Yes | Yes |
+| **Hardware & Sensors** Fans, Temperatures, Controllers, GPUs, etc | Full | Some | Some | Some |
+| **O/S Services** Resources, Performance and Status | Yes `systemd` | - | - | - |
+| **Processes** Resources, Performance, OOM, and more | Yes | Yes | Yes | Yes |
+| System and Application **Logs** | Yes `systemd`-journal | - | - | Yes `Windows Event Log`, `ETW` |
+| **Network Connections** Live TCP and UDP sockets per PID | Yes | - | - | - |
+| **Containers** Docker/containerd, LXC/LXD, Kubernetes, etc | Yes | - | - | - |
+| **VMs** (from the host) KVM, qemu, libvirt, Proxmox, etc | Yes `cgroups` | - | - | Yes `Hyper-V` |
+| **Synthetic Checks** Test APIs, TCP ports, Ping, Certificates, etc | Yes | Yes | Yes | Yes |
+| **Packaged Applications** nginx, apache, postgres, redis, mongodb, and hundreds more | Yes | Yes | Yes | Yes |
+| **Cloud Provider Infrastructure** AWS, GCP, Azure, and more | Yes | Yes | Yes | Yes |
+| **Custom Applications** OpenMetrics, StatsD and soon OpenTelemetry | Yes | Yes | Yes | Yes |
- _**Unique**: Netdata dashboards allow you to slice and dice any dataset, without learning a query language._
- _**Unique**: A multi-faceted query engine, analyzes all aspects of your data in one go, and the dashboard provides interactive analysis for all them (NIDL framework)._
-
-- **Extreme Scalability**
+On Linux, you can continuously monitor all kernel features and hardware sensors for errors, including Intel/AMD/Nvidia GPUs, PCI AER, RAM EDAC, IPMI, S.M.A.R.T, Intel RAPL, NVMe, fans, power supplies, and voltage readings.
- Native horizontal scalability, while maintaining performance and ease of use.
+---
- _**Unique**: Simple Parent-Child centralization and native horizontal scalability._
- _**Unique**: Each Netdata can scale to multi-million samples/s with best in-class resources utilization._
-
-- **Complete End-to-End Visibility**
+## Getting Started
- From infrastructure to applications, logs to metrics, hardware to databases, all in one solution.
+You can install Netdata on all major operating systems. To begin:
- _**Unique**: Netdata is built to simplify your operations, empower your team, provide clarity and eliminate silos._
-
-- **Edge-Based**
+### 1. Install Netdata
- All processing and storage of your data, at your premises, as close to the edge as possible.
+Choose your platform and follow the installation guide:
- _**Unique**: Instead of centralizing observability data, Netdata distributes the code. This provides higher processing capacity by utilizing resources that are usually available and spare, while eliminating most of the cost involved for metrics and logs management._
+* [Linux Installation](https://learn.netdata.cloud/docs/installing/one-line-installer-for-all-linux-systems)
+* [macOS](https://learn.netdata.cloud/docs/installing/macos)
+* [FreeBSD](https://learn.netdata.cloud/docs/installing/freebsd)
+* [Windows](https://learn.netdata.cloud/docs/netdata-agent/installation/windows)
+* [Docker Guide](/packaging/docker/README.md)
+* [Kubernetes Setup](https://learn.netdata.cloud/docs/installation/install-on-specific-environments/kubernetes)
----
+> [!NOTE]
+> You can access the Netdata UI at `http://localhost:19999` (or `http://NODE:19999` if remote).
-### The Netdata Ecosystem
-
-> **Note**: This repository contains the Netdata Agent, the open-source core of the Netdata ecosystem. For information about other components, see below.
-
-This three-part architecture enables Netdata to scale seamlessly from single-node deployments to complex multi-cloud environments with thousands of nodes, supporting long-term data retention without compromising performance.
-
-| Component | Description | License |
-|---------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------|
-| Netdata Agent | • The heart of Netdata's monitoring capabilities • Handles data collection, storage, querying, ML analysis, exports, and alerts • Runs on physical/virtual servers, cloud, Kubernetes, and IoT devices • Optimized for zero production impact • Core of all observability features | [GPL v3+](https://www.gnu.org/licenses/gpl-3.0) |
-| Netdata Cloud | • Adds enterprise-grade features (user management and RBAC, horizontal scalability, centralized alert management, access from anywhere) • Available as SaaS or on-premises • Includes free community tier • Does not centralize metric storage | |
-| Netdata UI | • Powers all dashboards and visualizations • Free to use with both Agent and Cloud • Included in standard Netdata packages • Latest version available via CDN | [NCUL1](https://app.netdata.cloud/LICENSE.txt) |
-
-### Key capabilities of the Netdata Agent
-
-With these capabilities, Netdata Agent provides a powerful, automated monitoring solution that works right out-of-the-box while remaining highly customizable for specific needs.
-
-| Capability | Description |
-|-------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| Comprehensive Data Collection | • 800+ integrations out of the box • Collects metrics from systems, containers, VMs, hardware sensors • Supports OpenMetrics exporters, StatsD, and logs • OpenTelemetry support coming soon |
-| Performance & Precision | • Per-second data collection • Real-time visualization with 1-second latency • High-resolution metrics for precise monitoring |
-| Edge-Based ML | • Trains ML models directly at the edge • Automatic anomaly detection per metric • Pattern recognition based on historical behavior |
-| Advanced Log Management | • Direct systemd-journald and Windows Event Log integrations • Tools for log conversion (log2journal, systemd-cat-native) • Process logs at the edge - no centralization needed • Rich log visualization dashboards |
-| Observability Pipeline | • Build Parent-Child relationships between Agents • Create flexible centralization points • Control data replication and retention at multiple levels |
-| Automated Visualization | • NIDL (Nodes, Instances, Dimensions & Labels) data model • Auto-generated, correlated dashboards • Filter and analyze data without query language • Free to use, powered by Netdata UI |
-| Smart Alerting | • Hundreds of pre-configured alerts • Detect common issues automatically • Multiple notification methods • Proactive problem detection |
-| Low Maintenance | • Auto-detection of metrics • Zero-touch machine learning • Easy scalability • CI/CD friendly deployment |
-| Open & Extensible | • Modular architecture • Easy to extend and customize • Integrates with existing monitoring tools • Active community ecosystem |
-
-### What can be monitored with the Netdata Agent
-
-Netdata monitors all the following:
-
-| Component | Linux | FreeBSD | macOS | Windows |
-|------------------------------------------------------------------------------------------------------------:|:--------------------------------:|:-------:|:-----:|:-------------------------------------------------------------------------------:|
-| **System Resources** CPU, Memory and system shared resources | Full | Yes | Yes | Yes |
-| **Storage** Disks, Mount points, Filesystems, RAID arrays | Full | Yes | Yes | Yes |
-| **Network** Network Interfaces, Protocols, Firewall, etc | Full | Yes | Yes | Yes |
-| **Hardware & Sensors** Fans, Temperatures, Controllers, GPUs, etc | Full | Some | Some | Some |
-| **O/S Services** Resources, Performance and Status | Yes `systemd` | - | - | - |
-| **Processes** Resources, Performance, OOM, and more | Yes | Yes | Yes | Yes |
-| System and Application **Logs** | Yes `systemd`-journal | - | - | Yes `Windows Event Log`, and `Event Tracing for Windows` |
-| **Network Connections** Live TCP and UDP sockets per PID | Yes | - | - | - |
-| **Containers** Docker/containerd, LXC/LXD, Kubernetes, etc | Yes | - | - | - |
-| **VMs** (from the host) KVM, qemu, libvirt, Proxmox, etc | Yes `cgroups` | - | - | Yes `Hyper-V` |
-| **Synthetic Checks** Test APIs, TCP ports, Ping, Certificates, etc | Yes | Yes | Yes | Yes |
-| **Packaged Applications** nginx, apache, postgres, redis, mongodb, and hundreds more | Yes | Yes | Yes | Yes |
-| **Cloud Provider Infrastructure** AWS, GCP, Azure, and more | Yes | Yes | Yes | Yes |
-| **Custom Applications** OpenMetrics, StatsD and soon OpenTelemetry | Yes | Yes | Yes | Yes |
-
-When operating on Linux, the Netdata Agent continuously monitors every available kernel feature and all hardware sensors for errors. This covers Intel, AMD, and Nvidia GPUs, PCI Advanced Error Reporting (PCI AER), RAM Error Detection and Correction (RAM EDAC), Intelligent Platform Management Interface (IPMI), S.M.A.R.T. for disks, Intel Running Average Power Limit (Intel RAPL), NVMe disks, as well as fans, power supplies, voltage readings, and more.
+### 2. Configure Collectors
----
+Netdata auto-discovers most metrics, but you can manually configure some collectors:
-### :star: Netdata is the most energy-efficient monitoring tool :star:
+* [All collectors](https://learn.netdata.cloud/docs/data-collection/)
+* [SNMP monitoring](https://learn.netdata.cloud/docs/data-collection/monitor-anything/networking/snmp)
-
+### 3. Configure Alerts
-Dec 11, 2023: [University of Amsterdam published a study](https://twitter.com/IMalavolta/status/1734208439096676680) related to the impact of monitoring tools for Docker based systems, aiming to answer 2 questions:
+You can use hundreds of built-in alerts and integrate with:
-1. **The impact of monitoring on the energy efficiency of Docker-based systems**
-2. **The impact of monitoring on Docker-based systems?**
+`email`, `Slack`, `Telegram`, `PagerDuty`, `Discord`, `Microsoft Teams`, and more.
-- 🚀 Netdata excels in energy efficiency: **"... Netdata is the most energy-efficient tool ..."**, as the study says.
-- 🚀 Netdata excels in CPU Usage, RAM Usage and Execution Time, and has a similar impact on Network Traffic as Prometheus.
+> [!NOTE]
+> Email alerts work by default if there's a configured MTA.
-The study didn’t normalize the results based on the number of metrics collected. Given that Netdata usually collects significantly more metrics than the other tools, Netdata managed to outperform the other tools, while ingesting a much higher number of metrics. [Read the full study here](https://www.ivanomalavolta.com/files/papers/ICSOC_2023.pdf).
+### 4. Configure Parents
----
+You can centralize dashboards, alerts, and storage with Netdata Parents:
-### Netdata vs Prometheus 2025 Review
+* [Streaming Reference](https://learn.netdata.cloud/docs/streaming/streaming-configuration-reference)
-
+> [!NOTE]
+> You can use Netdata Parents for central dashboards, longer retention, and alert configuration.
-NEW! On the same workload, Netdata uses **1/3rd less CPU**, consumes **1/8th of the RAM**, performes **31 times less disk I/O** and stores **40 times more data** while being up to **22 times faster** in query responses! [Read the full 2025 review in our blog](https://www.netdata.cloud/blog/netdata-vs-prometheus-2025/).
+### 5. Connect to Netdata Cloud
----
+[Sign in to Netdata Cloud](https://app.netdata.cloud/sign-in) and connect your nodes for:
-
-
-
-
-
- Netdata actively supports and is a member of the Cloud Native Computing Foundation (CNCF)
-
- ...and due to your love :heart:, it is one of the most :star:'d projects in the CNCF landscape!
-
-
+* Access from anywhere
+* Horizontal scalability and multi-node dashboards
+* UI configuration for alerts and data collection
+* Role-based access control
+* Free tier available
-
+> [!NOTE]
+> Netdata Cloud is optional. Your data stays in your infrastructure.
+
+## Live Demo Sites
- You can see Netdata live!
- FRANKFURT |
- NEWYORK |
- ATLANTA |
- SANFRANCISCO |
- TORONTO |
- SINGAPORE |
- BANGALORE
+ See Netdata in action
+ FRANKFURT |
+ NEWYORK |
+ ATLANTA |
+ SANFRANCISCO |
+ TORONTO |
+ SINGAPORE |
+ BANGALORE
- We've set up multiple demo clusters around the world, each running with the default configuration and showing real monitoring data.
+ These demo clusters run with default configuration and show real monitoring data.
- Choose the instance closest to you for the best experience.
+ Choose the instance closest to you for the best performance.
-
+---
-## Getting Started
+## How It Works
+
+With Netdata you can run a modular pipeline for metrics collection, processing, and visualization.
+
+```mermaid
+flowchart TB
+ A[Netdata Agent]
+ A1(Collect):::green --> A
+ A2(Store):::green --> A
+ A3(Learn):::green --> A
+ A4(Detect):::green --> A
+ A5(Check):::green --> A
+ A6(Stream):::green --> A
+ A7(Archive):::green --> A
+ A8(Query):::green --> A
+ A9(Score):::green --> A
+
+ classDef green fill:#bbf3bb,stroke:#333,stroke-width:1px
+ ```
+
+With each Agent you can:
+
+1. **Collect** – Gather metrics from systems, containers, apps, logs, APIs, and synthetic checks.
+2. **Store** – Save metrics to a high-efficiency, tiered time-series database.
+3. **Learn** – Train ML models per metric using recent behavior.
+4. **Detect** – Identify anomalies using trained ML models.
+5. **Check** – Evaluate metrics against pre-set or custom alert rules.
+6. **Stream** – Send metrics to Netdata Parents in real time.
+7. **Archive** – Export metrics to Prometheus, InfluxDB, OpenTSDB, Graphite, and others.
+8. **Query** – Access metrics via an API for dashboards or third-party tools.
+9. **Score** – Use a scoring engine to find patterns and correlations across metrics.
+
+> [!NOTE]
+> Learn more: [Netdata's architecture](https://learn.netdata.cloud/docs/netdata-agent/architecture-overview/)
+
+## Agent Capabilities
+
+With the Netdata Agent, you can use these core capabilities out-of-the-box:
+
+| Capability | Description |
+|------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------|
+| **Comprehensive Collection** | • 800+ integrations • Systems, containers, VMs, hardware sensors • OpenMetrics, StatsD, and logs • OpenTelemetry support coming soon |
+| **Performance & Precision** | • Per-second collection • Real-time visualization with 1-second latency • High-resolution metrics |
+| **Edge-Based ML** | • ML models trained at the edge • Automatic anomaly detection per metric • Pattern recognition based on historical behavior |
+| **Advanced Log Management** | • Direct systemd-journald and Windows Event Log integration • Process logs at the edge • Rich log visualization |
+| **Observability Pipeline** | • Parent-Child relationships • Flexible centralization • Multi-level replication and retention |
+| **Automated Visualization** | • NIDL data model • Auto-generated dashboards • No query language needed |
+| **Smart Alerting** | • Pre-configured alerts • Multiple notification methods • Proactive detection |
+| **Low Maintenance** | • Auto-detection • Zero-touch ML • Easy scalability • CI/CD friendly |
+| **Open & Extensible** | • Modular architecture • Easy to customize • Integrates with existing tools |
+
+---
+
+## CNCF Membership
-
-
-
-
+
-
-
-
-
+ Netdata actively supports and is a member of the Cloud Native Computing Foundation (CNCF).
+ It is one of the most starred projects in the CNCF landscape.
-### 1. **Install Netdata everywhere** :v:
-
-Netdata can be installed on all Linux, macOS, FreeBSD and Windows systems. We provide binary packages for the most popular operating systems and package managers.
-
-- Install on [Ubuntu, Debian CentOS, Fedora, Suse, Red Hat, Arch, Alpine, Gentoo, even BusyBox](https://learn.netdata.cloud/docs/installing/one-line-installer-for-all-linux-systems).
-- Install with [Docker](/packaging/docker/README.md).
- Netdata is a [Verified Publisher on DockerHub](https://hub.docker.com/r/netdata/netdata) and our users enjoy free unlimited DockerHub pulls :heart_eyes:.
-- Install on [macOS](https://learn.netdata.cloud/docs/installing/macos) :metal:.
-- Install on [FreeBSD](https://learn.netdata.cloud/docs/installing/freebsd) and [pfSense](https://learn.netdata.cloud/docs/installing/pfsense).
-- Install on [Windows](https://learn.netdata.cloud/docs/netdata-agent/installation/windows).
-- Install [from source](https://learn.netdata.cloud/docs/installing/build-the-netdata-agent-yourself/compile-from-source-code) 
-- For Kubernetes deployments [check here](https://learn.netdata.cloud/docs/installation/install-on-specific-environments/kubernetes/).
-
-Check also the [Netdata Deployment Guides](https://learn.netdata.cloud/docs/deployment-guides/) to decide how to deploy it in your infrastructure.
-
-By default, you will have immediately available a local dashboard. Netdata starts a web server for its dashboard at port `19999`. Open up your web browser of choice and
-navigate to `http://NODE:19999`, replacing `NODE` with the IP address or hostname of your Agent. If installed on localhost, you can access it through `http://localhost:19999`.
-
-_Note: the binary packages we provide, install Netdata UI automatically. Netdata UI is closed-source, but free to use with Netdata Agents and Netdata Cloud._
-
-### 2. **Configure Collectors** :boom:
-
-Netdata auto-detects and auto-discovers most operating system data sources and applications. However, many data sources require some manual configuration, usually to allow Netdata to get access to the metrics.
-
-- For a detailed list of the 800+ collectors available, check [this guide](https://learn.netdata.cloud/docs/data-collection/).
-- To monitor SNMP devices, check [this guide](https://learn.netdata.cloud/docs/data-collection/monitor-anything/networking/snmp).
-
-### 3. **Configure Alert Notifications** :bell:
-
-Netdata comes with hundreds of pre-configured alerts that automatically check your metrics immediately after they start getting collected.
-
-Netdata can dispatch alert notifications to multiple third party systems, including: `email`, `Alerta`, `AWS SNS`, `Discord`, `Dynatrace`, `flock`, `gotify`, `IRC`, `Matrix`, `MessageBird`, `Microsoft Teams`, `ntfy`, `OPSgenie`, `PagerDuty`, `Prowl`, `PushBullet`, `PushOver`, `RocketChat`, `Slack`, `SMS tools`, `Syslog`, `Telegram`, `Twilio`.
-
-By default, Netdata will send e-mail notifications if there is a configured MTA on the system.
-
-### 4. **Configure Netdata Parents** :family:
-
-Optionally, configure one or more Netdata Parents. A Netdata Parent is a Netdata Agent that has been configured to accept [streaming connections](https://learn.netdata.cloud/docs/streaming/streaming-configuration-reference) from other Netdata Agents.
-
-Netdata Parents provide:
-
-- **Infrastructure level dashboards, at `http://parent.server.ip:19999/`.**
-
- Each Netdata Agent has an API listening at the TCP port 19999 of each server.
- When you hit that port with a web browser (e.g. `http://server.ip:19999/`), the Netdata Agent UI is presented.
- When the Netdata Agent is also a Parent, the UI of the Parent includes data for all nodes that stream metrics to that Parent.
-
-- **Increased retention for all metrics of all your nodes.**
-
- Each Netdata Agent maintains each own database of metrics. But Parents can be given additional resources to maintain a much longer database than
- individual Netdata Agents.
-
-- **Central configuration of alerts and dispatch of notifications.**
-
- Using Netdata Parents, all the alert notifications integrations can be configured only once at the Parent and they can be disabled at the Netdata Agents.
-
-You can also use Netdata Parents to:
-
-- Offload your production systems (the parents run ML, alerts, queries, etc. for all their children)
-- Secure your production systems (the parents accept user connections for all their children)
-
-### 5. **Connect to Netdata Cloud** :cloud:
-
-[Sign-in](https://app.netdata.cloud/sign-in) to [Netdata Cloud](https://www.netdata.cloud/) and connect your Netdata Agents and Parents.
-If you connect your Netdata Parents, there is no need to connect your Netdata Agents. They will be connected via the Parents.
-
-When your Netdata nodes are connected to Netdata Cloud, you can (on top of the above):
-
-- Access your Netdata Agents from anywhere
-- Access sensitive Netdata Agent features (like "Netdata Functions": processes, systemd-journal)
-- Organize your infra in spaces and Rooms
-- Create, manage, and share **custom dashboards**
-- Invite your team and assign roles to them (Role-Based Access Control)
-- Get infinite horizontal scalability (multiple independent Netdata Agents are viewed as one infra)
-- Configure alerts from the UI
-- Configure data collection from the UI
-- Netdata Mobile App notifications
-
-:love_you_gesture: Netdata Cloud doesn’t prevent you from using your Netdata Agents and Parents directly, and vice versa.
-
-:ok_hand: Your metrics are still stored in your network when you connect your Netdata Agents and Parents to Netdata Cloud.
-
-
-
-## How it works
-
-Netdata is built around a **modular metrics processing pipeline**.
-
-Click to see more details about this pipeline...
-
-
-Each Netdata Agent can perform the following functions:
-
-1. **`COLLECT` metrics from their sources**
- Uses [internal](https://github.com/netdata/netdata/tree/master/src/collectors) and [external](https://github.com/netdata/go.d.plugin/tree/master/modules) plugins to collect data from their sources.
-
- Netdata auto-detects and collects almost everything from the operating system: including CPU, Interrupts, Memory, Disks, Mount Points, Filesystems, Network Stack, Network Interfaces, Containers, VMs, Processes, `systemd` units, Linux Performance Metrics, Linux eBPF, Hardware Sensors, IPMI, and more.
-
- It collects application metrics from applications: PostgreSQL, MySQL/MariaDB, Redis, MongoDB, Nginx, Apache, and hundreds more.
-
- Netdata also collects your custom application metrics by scraping OpenMetrics exporters, or via StatsD.
-
- It can convert web server log files to metrics and apply ML and alerts to them in real-time.
-
- And it also supports synthetic tests / white box tests, so you can ping servers, check API responses, or even check filesystem files and directories to generate metrics, train ML and run alerts and notifications on their status.
-
-2. **`STORE` metrics to a database**
- Uses database engine plugins to store the collected data, either in memory and/or on disk. We have developed our own [`dbengine`](https://github.com/netdata/netdata/tree/master/src/database/engine#readme) for storing the data in a very efficient manner, allowing Netdata to have less than one byte per sample on disk and amazingly fast queries.
-
-3. **`LEARN` the behavior of metrics** (ML)
- Trains multiple Machine-Learning (ML) models per metric to learn the behavior of each metric individually. Netdata uses the `kmeans` algorithm and creates by default a model per metric per hour, based on the values collected for that metric over the last 6 hours. The trained models are persisted to disk.
-
-4. **`DETECT` anomalies in metrics** (ML)
- Uses the trained machine learning (ML) models to detect outliers and mark collected samples as **anomalies**. Netdata stores anomaly information together with each sample and also streams it to Netdata Parents so that the anomaly is also available at query time for the whole retention of each metric.
-
-5. **`CHECK` metrics and trigger alert notifications**
- Uses its configured alerts (you can configure your own) to check the metrics for common issues and uses notification plugins to send alert notifications.
-
-6. **`STREAM` metrics to other Netdata Agents**
- Push metrics in real-time to Netdata Parents.
-
-7. **`ARCHIVE` metrics to third party databases**
- Export metrics to industry standard time-series databases, like `Prometheus`, `InfluxDB`, `OpenTSDB`, `Graphite`, etc.
-
-8. **`QUERY` metrics and present dashboards**
- Provide an API to query the data and present interactive dashboards to users.
-
-9. **`SCORE` metrics to reveal similarities and patterns**
- Score the metrics according to the given criteria, to find the needle in the haystack.
-
-When using Netdata Parents, all the functions of a Netdata Agent (except data collection) can be delegated to Parents to offload production systems.
-
-The core of Netdata is developed in C. We have our own `libnetdata`, that provides:
-
-- **`DICTIONARY`**
- A high-performance algorithm to maintain both indexed and ordered pools of structures Netdata needs. It uses JudyHS arrays for indexing, although it is modular: any hashtable or tree can be integrated into it. Despite being in C, dictionaries follow object-oriented programming principles, so there are constructors, destructors, automatic memory management, garbage collection, and more. For more, see [here](https://github.com/netdata/netdata/tree/master/src/libnetdata/dictionary).
-
-- **`ARAL`**
- ARray ALlocator (ARAL) is used to minimize the system allocations made by Netdata. ARAL is optimized for maximum multithreaded performance. It also allows all structures that use it to be allocated in memory-mapped files (shared memory) instead of RAM. For more, see [here](https://github.com/netdata/netdata/tree/master/src/libnetdata/aral).
-
-- **`PROCFILE`**
- A high-performance `/proc` (but also any) file parser and text tokenizer. It achieves its performance by keeping files open and adjusting its buffers to read the entire file in one call (which is also required by the Linux kernel). For more, see [here](https://github.com/netdata/netdata/tree/master/src/libnetdata/procfile).
-
-- **`STRING`**
- A string internet mechanism, for string deduplication and indexing (using JudyHS arrays), optimized for multithreaded usage. For more, see [here](https://github.com/netdata/netdata/tree/master/src/libnetdata/string).
-
-- **`ARL`**
- Adaptive Resortable List (ARL) is a very fast list iterator, that keeps the expected items on the list in the same order they are found in an input list. So, the first iteration is somewhat slower, but all the following iterations are perfectly aligned for the best performance. For more, see [here](https://github.com/netdata/netdata/tree/master/src/libnetdata/adaptive_resortable_list).
-
-- **`BUFFER`**
- A flexible text buffer management system that allows Netdata to automatically handle dynamically sized text buffer allocations. The same mechanism is used for generating consistent JSON output by the Netdata APIs. For more, see [here](https://github.com/netdata/netdata/tree/master/src/libnetdata/buffer).
-
-- **`SPINLOCK`**
- Like POSIX `MUTEX` and `RWLOCK` but a lot faster, based on atomic operations, with significantly smaller memory impact, while being portable.
-
-- **`PGC`**
- A caching layer that can be used to cache any kind of time-related data, with automatic indexing (based on a tree of JudyL arrays), memory management, evictions, flushing, pressure management. This is extensively used in `dbengine`. For more, see [here](/src/database/engine/README.md).
-
-The above, and many more, allow Netdata developers to work on the application fast and with confidence. Most of the business logic in Netdata is a work of mixing the above.
-
-Netdata data collection plugins can be developed in any language. Most of our application collectors though are developed in [Go](https://github.com/netdata/go.d.plugin).
-
-
+---
## FAQ
-### :shield: Is Netdata secure?
-
-Of course, it is! We do our best to ensure it is!
-
-Click to see detailed answer ...
-
+
+Is Netdata secure?
-We understand that Netdata is a software piece installed on millions of production systems across the world. So, it is important for us, Netdata to be as secure as possible:
+Yes. Netdata follows [OpenSSF best practices](https://bestpractices.coreinfrastructure.org/en/projects/2231), has a security-first design, and is regularly audited by the community.
-- We follow the [Open Source Security Foundation](https://bestpractices.coreinfrastructure.org/en/projects/2231) best practices.
-- We have given great attention to detail when it comes to security design. Check out our [security design](/docs/security-and-privacy-design/README.md).
-- Netdata is a popular open-source project and is frequently tested by many security analysts.
-- Check also our [security policies and advisories published so far](https://github.com/netdata/netdata/security).
+* [Security design](https://learn.netdata.cloud/docs/security-and-privacy-design)
+* [Security policies and advisories](https://github.com/netdata/netdata/security)
-
-### :cyclone: Will Netdata consume significant resources on my servers?
+
+Does Netdata use a lot of resources?
-No, it will not! We promise this will be fast!
+No. Even with ML and per-second metrics, Netdata uses minimal resources.
-Click to see detailed answer ...
-
+* \~5% CPU and 150MiB RAM by default on production systems
+* <1% CPU and \~100MiB RAM when ML and alerts are disabled and using ephemeral storage
+* Parents scale to millions of metrics per second with appropriate hardware
-Although each Netdata Agent is a complete monitoring solution packed into a single application, and despite the fact that Netdata collects **every metric every single second** and trains **multiple ML models** per metric, you will find that Netdata has amazing performance! In many cases, it outperforms other monitoring solutions that have significantly fewer features or far smaller data collection rates.
+> [!NOTE]
+> You can use the **Netdata Monitoring** section in the dashboard to inspect its resource usage.
-This is what you should expect:
-
-- For production systems, each Netdata Agent with default settings (everything enabled, ML, Health, DB) should consume about 5% CPU utilization of one core and about 150 MiB or RAM.
-
- By using a Netdata parent and streaming all metrics to that parent, you can disable ML & health and use an ephemeral DB (like `alloc`) on the children, leading to utilization of about 1% CPU of a single core and 100 MiB of RAM. Of course, these depend on how many metrics are collected.
-
-- For Netdata Parents, for about 1 to 2 million metrics, all collected every second, we suggest a server with 16 cores and 32GB RAM. Less than half of it will be used for data collection and ML. The rest will be available for queries.
-
-Netdata has extensive internal instrumentation to help us reveal how the resources consumed are used. All these are available in the "Netdata Monitoring" section of the dashboard. Depending on your use case, there are many options to optimize resource consumption.
-
-Even if you need to run Netdata on extremely weak embedded or IoT systems, you will find that Netdata can be tuned to be very performant.
-
-
-### :scroll: How much retention can I have?
-
-As much as you need!
+
+How much data retention is possible?
-Click to see detailed answer ...
-
+As much as your disk allows.
-Netdata supports **tiering**, to downsample past data and save disk space. With default settings, it has three tiers:
+With Netdata you can use tiered retention:
-1. `tier 0`, with high resolution, per-second, data.
-2. `tier 1`, mid-resolution, per minute, data.
-3. `tier 2`, low-resolution, per hour, data.
+* Tier 0: per-second resolution
+* Tier 1: per-minute resolution
+* Tier 2: per-hour resolution
-All tiers are updated in parallel during data collection. Increase the disk space you give to Netdata to get a longer history for your metrics. Tiers are automatically chosen at query time depending on the time frame and the resolution requested.
-
-
-
-
-### :rocket: Does it scale? I really have a lot of servers!
-
-Netdata is designed to scale and can handle large volumes of data.
-
-Click to see detailed answer ...
-
-Netdata is a distributed monitoring solution. You can scale it to infinity by spreading Netdata Agents across your infrastructure.
-
-With the streaming feature of the Agent, we can support monitoring ephemeral servers but also allow the creation of "monitoring islands" where metrics are aggregated to a few servers (Netdata Parents) for increased retention, or for offloading production systems.
-
-- :airplane: Netdata Parents provide great vertical scalability, so you can have as big parents as the CPU, RAM and Disk resources you can dedicate to them. In our lab, we constantly stress test Netdata Parents with several million metrics collected per second, to ensure it is reliable, stable, and robust at scale.
-
-- :rocket: In addition, Netdata Cloud provides virtually unlimited horizontal scalability. It "merges" all the Netdata parents you have into one unified infrastructure at query time. Netdata Cloud itself is probably the biggest single installation monitoring platform ever created, currently monitoring about 100k online servers with about 10k servers changing state (added/removed) per day!
-
-Example: the following chart comes from a single Netdata Parent. As you can see on it, 244 nodes stream to it metrics of about 20k running containers. On this specific chart, there are three dimensions per container, so a total of about 60k time-series queries are executed to present it.
-
-
-
-
-
-
-### :floppy_disk: My production servers are very sensitive in disk I/O. Can I use Netdata?
-
-Yes, you can!
-
-Click to see detailed answer ...
-
-
-The Netdata Agent has been designed to spread disk writes across time. Each metric is flushed to disk every 17 minutes (1000 seconds), but metrics are flushed evenly across time, at an almost constant rate. Also, metrics are packed into bigger blocks we call `extents` and are compressed with ZSTD before saving them, to minimize the number of I/O operations made.
-
-The Netdata Agent also employs direct I/O for all its database operations. By managing its own caches, Netdata avoids overburdening system caches, facilitating a harmonious coexistence with other applications.
-
-Single node Agents (not Parents), should have a constant write rate of about 50 KiB/s or less, with some spikes above that every minute (flushing of tier 1) and higher spikes every hour (flushing of tier 2).
-
-Health Alerts and Machine-Learning run queries to evaluate their expressions and learn from the metrics' patterns. These are also spread over time, so there should be an almost constant read rate too.
-
-To make Netdata not use the disks at all, we suggest the following:
-
-1. Use database mode `alloc` or `ram` to disable writing metric data to disk.
-2. Configure streaming to push in real-time all metrics to a Netdata Parent. The Netdata Parent will maintain metrics on disk for this node.
-3. Disable ML and health on this node. The Netdata Parent will do them for this node.
-4. Use the Netdata Parent to access the dashboard.
-
-Using the above, the Netdata Agent on your production system will not use a disk.
-
-
+These are queried automatically based on the zoom level.
-### :raised_eyebrow: How is Netdata different from a Prometheus and Grafana setup?
-
-Netdata is a "ready to use" monitoring solution. Prometheus and Grafana are tools to build your own monitoring solution.
-
-Netdata is also a lot faster, requires significantly fewer resources and puts almost no stress on the server it runs. For a performance comparison check [this blog](https://blog.netdata.cloud/netdata-vs-prometheus-performance-analysis/).
-
-Click to see detailed answer ...
-
-
-First, we have to say that Prometheus as a time-series database and Grafana as a visualizer are excellent tools for what they do.
-
-However, we believe that such a setup is missing a key element: A Prometheus and Grafana setup assumes that you know everything about the metrics you collect, and you understand deeply how they’re structured, they should be queried and visualized.
+
+Can Netdata scale to many servers?
-In reality, this setup has a lot of problems. The vast number of technologies, operating systems, and applications we use in our modern stacks makes it impossible for any single person to know and understand everything about anything. We get testimonials regularly from Netdata users across the biggest enterprises, that Netdata manages to reveal issues, anomalies and problems they weren’t aware of, and they didn't even have the means to find or troubleshoot.
+Yes. With Netdata you can:
-So, the biggest difference of Netdata to Prometheus, and Grafana, is that we decided that the tool needs to have a much better understanding of the components, the applications, and the metrics it monitors.
+* Scale horizontally with many Agents
+* Scale vertically with powerful Parents
+* Scale infinitely via Netdata Cloud
-- When compared to Prometheus, Netdata needs for each metric much more than just a name, some labels, and a value over time. A metric in Netdata is a structured entity that correlates with other metrics in a certain way and has specific attributes that depict how it should be organized, treated, queried, and visualized. We call this the NIDL (Nodes, Instances, Dimensions, Labels) framework.
+> [!NOTE]
+> You can use Netdata Cloud to merge many independent infrastructures into one logical view.
- Maintaining such an index is a challenge: first, because the raw metrics collected do not provide this information, so we have to add it, and second because we need to maintain this index for the lifetime of each metric, which with our current database retention, it is usually more than a year.
-
- At the same time, Netdata provides better retention than Prometheus due to database tiering, scales easier than Prometheus due to streaming, supports anomaly detection, and it has a metrics scoring engine to find the needle in the haystack when needed.
-
-- When compared to Grafana, Netdata is fully automated. Grafana has more customization capabilities than Netdata, but Netdata presents fully functional dashboards by itself, and most importantly, it gives you the means to understand, analyze, filter, slice and dice the data without the need for you to edit queries or be aware of any peculiarities the underlying metrics may have.
-
- Furthermore, to help you when you need to find the needle in the haystack, Netdata has advanced troubleshooting tools provided by the Netdata metrics scoring engine, that allows it to score metrics based on their anomaly rate, their differences or similarities for any given time frame.
-
-Still, if you’re already familiar with Prometheus and Grafana, Netdata integrates nicely with them, and we have reports from users who use Netdata with Prometheus and Grafana in production.
-
-
-### :raised_eyebrow: How is Netdata different from DataDog, New Relic, Dynatrace, X SaaS Provider?
-
-With Netdata your data are always on-prem and your metrics are always high-resolution.
-
-Click to see detailed answer ...
-
+
+Is disk I/O a concern?
-Most commercial monitoring providers face a significant challenge: they centralize all metrics to their infrastructure, and this is, inevitably, expensive. It leads them to one or more of the following:
+No. Netdata minimizes disk usage:
-1. be unrealistically expensive
-2. limit the number of metrics they collect
-3. limit the resolution of the metrics they collect
+* Metrics are flushed to disk every 17 minutes, spread out evenly
+* Uses direct I/O and compression (ZSTD)
+* Can run entirely in RAM or stream to a Parent
-As a result, they try to find a balance: collect the least possible data, but collect enough to have something useful out of it.
+> [!NOTE]
+> You can use `alloc` or `ram` mode for no disk writes.
-We, at Netdata, see monitoring in a completely different way: **monitoring systems should be built bottom-up and be rich in insights**, so we focus on each component individually to collect, store, check and visualize everything related to each of them, and we make sure that all components are monitored. Each metric is important.
-
-This is why Netdata trains multiple machine-learning models per metric, based exclusively on their own past (no sampling of data, no sharing of trained models) to detect anomalies based on the specific use case and workload each component is used.
-
-This is also why Netdata alerts are attached to components (instances) and are configured with dynamic thresholds and rolling windows, instead of static values.
-
-The distributed nature of Netdata helps scale this approach: your data is spread inside your infrastructure, as close to the edge as possible. Netdata is not one data lane. Each Netdata Agent is a data lane, and all of them together build a massive distributed metrics processing pipeline that ensures all your infrastructure components and applications are monitored and operating as they should.
-
-
-### :raised_eyebrow: How is Netdata different from Nagios, Icinga, Zabbix, etc.?
-
-Netdata offers real-time, comprehensive monitoring and the ability to monitor everything without any custom configuration required.
-
-Click to see detailed answer ...
-
-
-While Nagios, Icinga, Zabbix, and other similar tools are powerful and highly customizable, they can be complex to set up and manage. Their flexibility often comes at the cost of ease-of-use, especially for users who aren’t systems administrators or don’t have extensive experience with these tools. Additionally, these tools generally require you to know what you want to monitor in advance and configure it explicitly.
+
+How is Netdata different from Prometheus + Grafana?
-Netdata, on the other hand, takes a different approach. It provides a "ready to use" monitoring solution with a focus on simplicity and comprehensiveness. It automatically detects and starts monitoring many different system metrics and applications out-of-the-box, without any need for custom configuration.
+With Netdata you get a complete monitoring solution—not just tools.
-In comparison to these traditional monitoring tools, Netdata:
+* No manual setup or dashboards needed
+* Built-in ML, alerts, dashboards, and correlations
+* More efficient and easier to deploy
-- Provides real-time, high-resolution metrics, as opposed to the often minute-level granularity that tools like Nagios, Icinga, and Zabbix provide.
+> [!NOTE]
+> [Performance comparison](https://blog.netdata.cloud/netdata-vs-prometheus-performance-analysis/)
-- Automatically generates meaningful, organized, and interactive visualizations of the collected data. Unlike other tools, where you have to manually create and organize graphs and dashboards, Netdata takes care of this for you.
-
-- Applies machine learning to each individual metric to detect anomalies, providing more insightful and relevant alerts than static thresholds.
-
-- Designed to be distributed, so your data is spread inside your infrastructure, as close to the edge as possible. This approach is more scalable and avoids the potential bottleneck of a single centralized server.
-
-- Has a more modern and user-friendly interface, allowing anyone, not just experienced administrators, to easily assess the health and performance of their systems.
-
-Even if you're already using Nagios, Icinga, Zabbix, or similar tools, you can use Netdata alongside them to augment your existing monitoring capabilities with real-time insights and user-friendly dashboards.
-
-
-### :flushed: I feel overwhelmed by the amount of information in Netdata. What should I do?
-
-Netdata is designed to provide comprehensive insights, but we understand that the richness of information might sometimes feel overwhelming. Here are some tips on how to navigate and use Netdata effectively...
-
-Click to see detailed answer ...
-
+
+How is Netdata different from commercial SaaS tools?
-Netdata is indeed a very comprehensive monitoring tool. It's designed to provide you with as much information as possible about your system and applications, so that you can understand and address any issues that arise. However, we understand that the sheer amount of data can sometimes be overwhelming.
+With Netdata you can store all metrics on your infrastructure—no sampling, no aggregation, no loss.
-Here are some suggestions on how to manage and navigate this wealth of information:
+* High-resolution metrics by default
+* ML per metric, not shared models
+* Unlimited scalability without skyrocketing cost
-1. **Start with the Metrics Dashboard**
- Netdata's Metrics Dashboard provides a high-level summary of your system's status. We have added summary tiles on almost every section, you reveal the information that is more important. This is a great place to start, as it can help you identify any major issues or trends at a glance.
-
-2. **Use the Search Feature**
- If you're looking for specific information, you can use the search feature to find the relevant metrics or charts. This can help you avoid scrolling through all the data.
-
-3. **Customize your Dashboards**
- Netdata allows you to create custom dashboards, which can help you focus on the metrics that are most important to you. Sign-in to Netdata and there you can have your custom dashboards. (coming soon to the Agent dashboard too)
-
-4. **Leverage Netdata's Anomaly Detection**
- Netdata uses machine learning to detect anomalies in your metrics. This can help you identify potential issues before they become major problems. We have added an `AR` button above the dashboard table of contents to reveal the anomaly rate per section so that you can spot what could need your attention.
-
-5. **Take Advantage of Netdata's Documentation and Blogs**
- Netdata has extensive documentation that can help you understand the different metrics and how to interpret them. You can also find tutorials, guides, and best practices there.
-
-Remember, it's not necessary to understand every single metric or chart right away. Netdata is a powerful tool, and it can take some time to fully explore and understand all of its features. Start with the basics and gradually delve into more complex metrics as you become more comfortable with the tool.
-
-
-### :cloud: Do I have to subscribe to Netdata Cloud?
-
-Netdata Cloud delivers the full suite of features and functionality that Netdata offers, including a free community tier.
+
+Can Netdata run alongside Nagios, Zabbix, etc.?
-While our default onboarding process encourages users to take advantage of Netdata Cloud, including a complimentary one-month trial of our full business product, it is not mandatory. Users can bypass this process entirely and still use the Netdata Agents along with the Netdata UI, without the need to sign up for Netdata Cloud.
+Yes. You can use Netdata together with traditional tools.
-Click to see detailed answer ...
-
+With Netdata you get:
-The Netdata Agent dashboard and the Netdata Cloud dashboard are the same. Still, Netdata Cloud provides additional features that the Netdata Agent is not capable of. These include:
+* Real-time, high-resolution monitoring
+* Zero configuration and auto-generated dashboards
+* Anomaly detection and advanced visualization
-1. Access your infrastructure from anywhere.
-2. Have SSO to protect sensitive features.
-3. Customizable (custom dashboards and other settings are persisted when you’re signed in to Netdata Cloud)
-4. Configuration of Alerts and Data Collection from the UI
-5. Security (Role-Based Access Control).
-6. Horizontal Scalability ("blend" multiple independent parents in one uniform infrastructure)
-7. Central Dispatch of Alert Notifications (even when multiple independent parents are involved)
-8. Mobile App for Alert Notifications
-
-We encourage you to support Netdata by buying a Netdata Cloud subscription. A successful Netdata is a Netdata that evolves and gets improved to provide simpler, faster and easier monitoring for all of us.
-
-For organizations that need a fully on-prem solution, we provide Netdata Cloud for on-prem installation. [Contact us for more information](mailto:info@netdata.cloud).
-
-
-### :mag_right: What does the anonymous telemetry collected by Netdata entail?
-
-Your privacy is our utmost priority. As part of our commitment to improving Netdata, we rely on anonymous telemetry data from our users who choose to leave it enabled. This data greatly informs our decision-making processes and contributes to the future evolution of Netdata.
-
-Should you wish to disable telemetry, instructions for doing so are provided in our installation guides.
-
-Click to see detailed answer ...
-
-
-Netdata is in a constant state of growth and evolution. The decisions that guide this development are ideally rooted in data. By analyzing anonymous telemetry data, we can answer questions such as "What features are being used frequently?", "How do we prioritize between potential new features?" and "What elements of Netdata are most important to our users?"
+
+What if I feel overwhelmed?
-By leaving anonymous telemetry enabled, users indirectly contribute to shaping Netdata's roadmap, providing invaluable information that helps us prioritize our efforts for the project and the community.
+You can start small:
-We are aware that for privacy or regulatory reasons, not all environments can allow telemetry. To cater to this, we’ve simplified the process of disabling telemetry:
+* Use the dashboard's table of contents and search
+* Explore anomaly scoring ("AR" toggle)
+* Create custom dashboards in Netdata Cloud
-- During installation, you can append `--disable-telemetry` to our `kickstart.sh` script, or
-- Create the file `/etc/netdata/.opt-out-from-anonymous-statistics` and then restart Netdata.
+> [!NOTE]
+> [Docs and guides](https://learn.netdata.cloud/guides)
-These steps will disable the anonymous telemetry for your Netdata installation.
-
-Please note, even with telemetry disabled, Netdata still requires a [Netdata Registry](https://learn.netdata.cloud/docs/configuring/securing-netdata-agents/registry) for alert notifications' Call To Action (CTA) functionality. When you click an alert notification, it redirects you to the Netdata Registry, which then directs your web browser to the specific Netdata Agent that issued the alert for further troubleshooting. The Netdata Registry learns the URLs of your Agents when you visit their dashboards.
-
-Any Netdata Agent can act as a Netdata Registry. Designate one Netdata Agent as your Registry, read more [here](https://learn.netdata.cloud/docs/netdata-agent/configuration/registry).
-
-
-### :smirk: Who uses Netdata?
-
-Netdata is a widely adopted project...
-
-Click to see detailed answer ...
-
-
-Browse the [Netdata stargazers on GitHub](https://github.com/netdata/netdata/stargazers) to discover users from renowned companies and enterprises, such as ABN AMRO Bank, AMD, Amazon, Baidu, Booking.com, Cisco, Delta, Facebook, Google, IBM, Intel, Logitech, Netflix, Nokia, Qualcomm, Realtek Semiconductor Corp, Redhat, Riot Games, SAP, Samsung, Unity, Valve, and many others.
-
-Netdata also enjoys significant usage in academia, with notable institutions including New York University, Columbia University, New Jersey University, Seoul National University, University College London, among several others.
+
+Do I have to use Netdata Cloud?
-And, Netdata is also used by many governmental organizations worldwide.
+No. Netdata Cloud is optional.
-In a nutshell, Netdata proves invaluable for:
+Netdata works without it, but with Cloud you can:
-- **Infrastructure intensive organizations**
- Such as hosting/cloud providers and companies with hundreds or thousands of nodes, who require a high-resolution, real-time monitoring solution for a comprehensive view of all their components and applications.
+* Access remotely with SSO
+* Save dashboard customizations
+* Configure alerts centrally
+* Collaborate with role-based access
-- **Technology operators**
- Those in need of a standardized, comprehensive solution for round-the-clock operations. Netdata not only facilitates operational automation and provides controlled access for their operations engineers, but also enhances skill development over time.
-
-- **Technology startups**
- Who seek a feature-rich monitoring solution from the get-go.
-
-- **Freelancers**
- Who seek a simple, efficient and straightforward solution without sacrificing performance and outcomes.
-
-- **Professional SysAdmins and DevOps**
- Who appreciate the fine details and understand the value of holistic monitoring from the ground up.
-
-- **Everyone else**
- All of us, who are tired of the inefficiency in the monitoring industry and would love a refreshing change and a breath of fresh air. :slightly_smiling_face:
-
-
-### :globe_with_meridians: Is Netdata open-source?
-
-The **Netdata Agent** is open-source, but the **overall Netdata ecosystem** is a hybrid solution, combining open-source and closed-source components.
-
-Click to see detailed answer ...
-
-
-Open-source is about sharing intellectual property with the world, and at Netdata, we embrace this philosophy wholeheartedly.
-
-The **Netdata Agent**, the core of our ecosystem and the engine behind all our observability features, is fully open-source. Licensed under GPLv3+, the Netdata Agent represents our commitment to open-sourcing innovation in a wide array of observability technologies, including data collection, database design, query engines, observability data modeling, machine learning and unsupervised anomaly detection, high-performance edge computing, real-time monitoring, and more.
-
-**The Netdata Agent is our gift to the world**, ensuring that the cutting-edge advancements we've developed are freely accessible to everyone.
-
-However, as a privately funded company, we also need to monetize our open-source software to demonstrate product-market fit and sustain our growth.
-
-Traditionally, open-source projects have often used the open-core model, where a basic version of the software is open-source, and additional features are reserved for a commercial, closed-source version. This approach can limit access to advanced innovations, as most of these remain closed-source.
-
-At Netdata, we take a slightly different path. We don't create a separate enterprise version of our product. Instead, all users - both commercial and non-commercial - use the same Netdata Agent, ensuring that all of our observability innovations are always open source.
-
-To experience the full capabilities of the Netdata ecosystem, users need to combine the open-source components with our closed-source offerings. The complete product still remains free to use.
+
+What telemetry does Netdata collect?
-The closed-source components include:
+Anonymous telemetry helps improve the product. You can disable it:
-- **Netdata UI**: This is closed-source but free to use with the Netdata Agents and Netdata Cloud. It’s also publicly available via a CDN.
-- **Netdata Cloud**: A commercial product available both as an on-premises installation and as a SaaS solution, with a free community tier.
+* Add `--disable-telemetry` to the installer, or
+* Create `/etc/netdata/.opt-out-from-anonymous-statistics` and restart Netdata
-By balancing open-source and closed-source components, we ensure that all users have access to our innovations while sustaining our ability to grow and innovate as a company.
+> [!NOTE]
+> Telemetry helps us understand usage, not track users. No private data is collected.
-
-### :moneybag: What is your monetization strategy?
+
+Who uses Netdata?
-Netdata generates revenue through subscriptions to advanced features of Netdata Cloud and sales of on-premise and private versions of Netdata Cloud.
+You'll join users including:
-Click to see detailed answer ...
-
+* Major companies (Amazon, ABN AMRO Bank, Facebook, Google, IBM, Intel, Netflix, Samsung)
+* Universities (NYU, Columbia, Seoul National, UCL)
+* Government organizations worldwide
+* Infrastructure-intensive organizations
+* Technology operators
+* Startups and freelancers
+* SysAdmins and DevOps professionals
-Netdata generates revenue from these activities:
-
-1. **Netdata Cloud Subscriptions**
- Direct funding for our project's vision comes from users subscribing to Netdata Cloud's advanced features.
-
-2. **Netdata Cloud On-Prem or Private**
- Purchasing the on-premises or private versions of Netdata Cloud supports our financial growth.
-
-Our Open-Source Community and the free access to Netdata Cloud, contribute to Netdata in the following ways:
-
-- **Netdata Cloud Community Use**
- The free usage of Netdata Cloud demonstrates its market relevance. While this doesn't generate revenue, it reinforces trust among new users and aids in securing appropriate project funding.
-
-- **User Feedback**
- Feedback, especially issues and bug reports, is invaluable. It steers us towards a more resilient and efficient product. This, too, isn't a revenue source but is pivotal for our project's evolution.
-
-- **Anonymous Telemetry Insights**
- Users who keep anonymous telemetry enabled, help us make data informed decisions on refining and enhancing Netdata. This isn't a revenue stream, but knowing which features are used and how, contributes in building a better product for everyone.
-
-We don't monetize, directly or indirectly, users' or "device heuristics" data. Any data collected from community members is exclusively used for the purposes stated above.
-
-Netdata grows financially when technology intensive organizations and operators need - due to regulatory or business requirements - the entire Netdata suite on-prem or private, bundled with top-tier support. It is a win-win case for all parties involved: these companies get a battle tested, robust and reliable solution, while the broader community that helps us build this product enjoys it at no cost.
-
-
-## :book: Documentation
+---
-Netdata's documentation is available at [**Netdata Learn**](https://learn.netdata.cloud).
+## \:book: Documentation
-This site also hosts a number of [guides](https://learn.netdata.cloud/guides) to help newer users better understand how
-to collect metrics, troubleshoot via charts, export to external databases, and more.
+Visit [Netdata Learn](https://learn.netdata.cloud) for full documentation and guides.
-## :tada: Community
+> [!NOTE]
+> Includes deployment, configuration, alerting, exporting, troubleshooting, and more.
-
-
-
-
-
+---
-Netdata is an inclusive open-source project and community. Please read our [Code of Conduct](https://github.com/netdata/.github/blob/main/CODE_OF_CONDUCT.md).
+## \:tada: Community
Join the Netdata community:
-- Chat with us and other community members on [Discord](https://discord.com/invite/2mEmfW735j).
-- Start a discussion on [GitHub discussions](https://github.com/netdata/netdata/discussions).
-- Open a topic to our [community forums](https://community.netdata.cloud).
-
-> **Meet Up** :people_holding_hands::people_holding_hands::people_holding_hands:
-> The Netdata team and community members have regular online meetups.
-> **You are welcome to join us!**
-> [Click here for the schedule](https://www.meetup.com/netdata/events/).
-
-You can also find Netdata on:
-[Twitter](https://twitter.com/netdatahq) | [YouTube](https://www.youtube.com/c/Netdata) | [Reddit](https://www.reddit.com/r/netdata/) | [LinkedIn](https://www.linkedin.com/company/netdata-cloud/) | [StackShare](https://stackshare.io/netdata) | [Product Hunt](https://www.producthunt.com/posts/netdata-monitoring-agent/) | [Repology](https://repology.org/metapackage/netdata/versions) | [Facebook](https://www.facebook.com/linuxnetdata/)
-
-## :pray: Contribute
-
-
-
-
-
-Contributions are essential to the success of open-source projects. In other words, we need your help to keep Netdata great!
-
-What is a contribution? All the following are highly valuable to Netdata:
-
-1. **Let us know of the best practices you believe should be standardized**
- Netdata should out-of-the-box detect as many infrastructure issues as possible. By sharing your knowledge and experiences, you help us build a monitoring solution that has baked into it all the best-practices about infrastructure monitoring.
-
-2. **Let us know if Netdata is not perfect for your use case**
- We aim to support as many use cases as possible and your feedback can be invaluable. Open a GitHub issue, or start a GitHub discussion about it, to discuss how you want to use Netdata and what you need.
+* [Discord](https://discord.com/invite/2mEmfW735j)
+* [Forum](https://community.netdata.cloud)
+* [GitHub Discussions](https://github.com/netdata/netdata/discussions)
- Although we can't implement everything imaginable, we try to prioritize development on use-cases that are common to our community, are in the same direction we want Netdata to evolve and are aligned with our roadmap.
+> [!NOTE]
+> [Code of Conduct](https://github.com/netdata/.github/blob/main/CODE_OF_CONDUCT.md)
-3. **Support other community members**
- Join our community on GitHub, Discord, and Reddit. Generally, Netdata is relatively easy to set up and configure, but still people may need a little push in the right direction to use it effectively. Supporting other members is a great contribution by itself!
+Follow us on:
+[Twitter](https://twitter.com/netdatahq) | [Reddit](https://www.reddit.com/r/netdata/) | [YouTube](https://www.youtube.com/c/Netdata) | [LinkedIn](https://www.linkedin.com/company/netdata-cloud/)
-4. **Add or improve integrations you need**
- Integrations tend to be easier and simpler to develop. If you would like to contribute your code to Netdata, we suggest that you start with the integrations you need, which Netdata doesn’t currently support.
-
-General information about contributions:
+---
-- Check our [Security Policy](https://github.com/netdata/netdata/security/policy).
-- Found a bug? Open a [GitHub issue](https://github.com/netdata/netdata/issues/new?assignees=&labels=bug%2Cneeds+triage&template=BUG_REPORT.yml&title=%5BBug%5D%3A+).
-- Read our [Contributing Guide](https://github.com/netdata/.github/blob/main/CONTRIBUTING.md), which contains all the information you need to contribute to Netdata, such as improving our documentation, engaging in the community, and developing new features. We've made it as frictionless as possible, but if you need help, just ping us on our community forums!
+## \:pray: Contribute
-Package maintainers should read the guide on [building Netdata from source](/packaging/installer/methods/source.md) for
-instructions on building each Netdata component from the source and preparing a package.
+We welcome your contributions.
-## License
+Ways you help us stay sharp:
-The Netdata ecosystem consists of three key parts:
+* Share best practices and monitoring insights
+* Report issues or missing features
+* Improve documentation
+* Develop new integrations or collectors
+* Help users in forums and chats
-- **Netdata Agent**: The heart of the Netdata ecosystem, the Netdata Agent is an open-source tool that must be installed on all systems monitored by Netdata. It offers a wide range of essential features, including data collection via various plugins, an embedded high-performance time-series database (dbengine), unsupervised anomaly detection powered by edge-trained machine learning, alerting and notifications, as well as query and scoring engines with associated APIs. Additionally, it supports exporting data to third-party monitoring systems, among other capabilities.
+> [!NOTE]
+> [Contribution guide](https://github.com/netdata/.github/blob/main/CONTRIBUTING.md)
- The Netdata Agent is released under the [GPLv3+ license](https://github.com/netdata/netdata/blob/master/LICENSE) and redistributes several other open-source tools and libraries, which are listed in the [Netdata Agent third-party licenses](https://github.com/netdata/netdata/blob/master/REDISTRIBUTED.md).
+---
-- **Netdata Cloud**: A commercial, closed-source component, Netdata Cloud enhances the capabilities of the open-source Netdata Agent by providing horizontal scalability, centralized alert notification dispatch (including a mobile app), user management, role-based access control, and other enterprise-grade features. It is available both as a SaaS solution and for on-premises deployment, with a free-to-use community tier also offered.
+## \:scroll: License
-- **Netdata UI**: The Netdata UI is closed-source, and handles all visualization and dashboard functionalities related to metrics, logs and other collected data, as well as the central configuration and management of the Netdata ecosystem. It serves both the Netdata Agent and Netdata Cloud. The Netdata UI is distributed in binary form with the Netdata Agent and is publicly accessible via a CDN, licensed under the [Netdata Cloud UI License 1 (NCUL1)](https://app.netdata.cloud/LICENSE.txt). It integrates third-party open-source components, detailed in the [Netdata UI third-party licenses](https://app.netdata.cloud/3D_PARTY_LICENSES.txt).
+The Netdata ecosystem includes:
-The binary installation packages provided by Netdata include the Netdata Agent and the Netdata UI. Since the Netdata Agent is open-source, it is frequently packaged by third parties (e.g., Linux Distributions) excluding the closed-source components (Netdata UI is not included). While their packages can still be useful in providing the necessary back-ends and the APIs of a fully functional monitoring solution, we recommend using the installation packages we provide to experience the full feature set of Netdata.
+* **Netdata Agent** – Open-source core (GPLv3+). **Includes** data collection, storage, ML, alerting, APIs and **redistributes** several other open-source tools and libraries.
+ * [Netdata Agent License](https://github.com/netdata/netdata/blob/master/LICENSE)
+ * [Netdata Agent Redistributed](https://github.com/netdata/netdata/blob/master/REDISTRIBUTED.md)
+* **Netdata UI** – Closed-source but free to use with Netdata Agent and Cloud. Delivered via CDN. It integrates third-party open-source components.
+ * [Netdata Cloud UI License](https://app.netdata.cloud/LICENSE.txt)
+ * [Netdata UI third-party licenses](https://app.netdata.cloud/3D_PARTY_LICENSES.txt)
+* **Netdata Cloud** – Closed-source, with free and paid tiers. Adds remote access, SSO, scalability.
From eb4b3f3d32ffcfdfe168db87d9ed1c5415f2c913 Mon Sep 17 00:00:00 2001
From: Ilya Mashchenko
Date: Wed, 7 May 2025 22:24:09 +0300
Subject: [PATCH 12/51] docs: fix license link and remove GH alerts syntax from
FAQ (#20252)
---
README.md | 8 +-------
1 file changed, 1 insertion(+), 7 deletions(-)
diff --git a/README.md b/README.md
index 21cedd16046db4..e02e985688724e 100644
--- a/README.md
+++ b/README.md
@@ -34,7 +34,7 @@
-MENU: **[GETTING STARTED](#getting-started)** | **[HOW IT WORKS](#how-it-works)** | **[FAQ](#faq)** | **[DOCS](#book-documentation)** | **[COMMUNITY](#tada-community)** | **[CONTRIBUTE](#pray-contribute)** | **[LICENSE](#license)**
+MENU: **[GETTING STARTED](#getting-started)** | **[HOW IT WORKS](#how-it-works)** | **[FAQ](#faq)** | **[DOCS](#book-documentation)** | **[COMMUNITY](#tada-community)** | **[CONTRIBUTE](#pray-contribute)** | **[LICENSE](#scroll-license)**
> [!WARNING]
> People **get addicted to Netdata.**
@@ -282,7 +282,6 @@ No. Even with ML and per-second metrics, Netdata uses minimal resources.
* <1% CPU and \~100MiB RAM when ML and alerts are disabled and using ephemeral storage
* Parents scale to millions of metrics per second with appropriate hardware
-> [!NOTE]
> You can use the **Netdata Monitoring** section in the dashboard to inspect its resource usage.
@@ -310,7 +309,6 @@ Yes. With Netdata you can:
* Scale vertically with powerful Parents
* Scale infinitely via Netdata Cloud
-> [!NOTE]
> You can use Netdata Cloud to merge many independent infrastructures into one logical view.
@@ -324,7 +322,6 @@ No. Netdata minimizes disk usage:
* Uses direct I/O and compression (ZSTD)
* Can run entirely in RAM or stream to a Parent
-> [!NOTE]
> You can use `alloc` or `ram` mode for no disk writes.
@@ -338,7 +335,6 @@ With Netdata you get a complete monitoring solution—not just tools.
* Built-in ML, alerts, dashboards, and correlations
* More efficient and easier to deploy
-> [!NOTE]
> [Performance comparison](https://blog.netdata.cloud/netdata-vs-prometheus-performance-analysis/)
@@ -376,7 +372,6 @@ You can start small:
* Explore anomaly scoring ("AR" toggle)
* Create custom dashboards in Netdata Cloud
-> [!NOTE]
> [Docs and guides](https://learn.netdata.cloud/guides)
@@ -403,7 +398,6 @@ Anonymous telemetry helps improve the product. You can disable it:
* Add `--disable-telemetry` to the installer, or
* Create `/etc/netdata/.opt-out-from-anonymous-statistics` and restart Netdata
-> [!NOTE]
> Telemetry helps us understand usage, not track users. No private data is collected.
From ef4c8d9fbc77f69ca7efb51692d4f53732bb7285 Mon Sep 17 00:00:00 2001
From: netdatabot
Date: Thu, 8 May 2025 00:23:20 +0000
Subject: [PATCH 13/51] [ci skip] Update changelog and version for nightly
build: v2.5.0-13-nightly.
---
CHANGELOG.md | 14 ++++++--------
packaging/version | 2 +-
2 files changed, 7 insertions(+), 9 deletions(-)
diff --git a/CHANGELOG.md b/CHANGELOG.md
index 80531d18e2148f..b39bcb699b0886 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -6,7 +6,13 @@
**Merged pull requests:**
+- docs: fix license link and remove GH alerts syntax from FAQ [\#20252](https://github.com/netdata/netdata/pull/20252) ([ilyam8](https://github.com/ilyam8))
+- Update Netdata README [\#20251](https://github.com/netdata/netdata/pull/20251) ([kanelatechnical](https://github.com/kanelatechnical))
+- fix\(go.d/snmp\): use 32bit counters if 64 aren't available [\#20249](https://github.com/netdata/netdata/pull/20249) ([ilyam8](https://github.com/ilyam8))
+- fix\(go.d/snmp\): use ifDescr for interface name if ifName is empty [\#20248](https://github.com/netdata/netdata/pull/20248) ([ilyam8](https://github.com/ilyam8))
+- fix\(go.d/sd/snmp\): fix snmpv3 credentials [\#20247](https://github.com/netdata/netdata/pull/20247) ([ilyam8](https://github.com/ilyam8))
- Fix build issue on old distros [\#20243](https://github.com/netdata/netdata/pull/20243) ([stelfrag](https://github.com/stelfrag))
+- Session claim id in docker [\#20240](https://github.com/netdata/netdata/pull/20240) ([stelfrag](https://github.com/stelfrag))
- Let the user override the default stack size [\#20236](https://github.com/netdata/netdata/pull/20236) ([stelfrag](https://github.com/stelfrag))
- Revert "Revert "fix\(go.d/couchdb\): correct db size charts unit"" [\#20235](https://github.com/netdata/netdata/pull/20235) ([ilyam8](https://github.com/ilyam8))
- Make all threads joinable and join on agent shutdown [\#20228](https://github.com/netdata/netdata/pull/20228) ([stelfrag](https://github.com/stelfrag))
@@ -468,14 +474,6 @@
- 4 malloc arenas for parents, not IoT [\#19711](https://github.com/netdata/netdata/pull/19711) ([ktsaou](https://github.com/ktsaou))
- Fix Fresh Installation on Microsoft [\#19710](https://github.com/netdata/netdata/pull/19710) ([thiagoftsm](https://github.com/thiagoftsm))
- Avoid post initialization errors repeateadly [\#19709](https://github.com/netdata/netdata/pull/19709) ([ktsaou](https://github.com/ktsaou))
-- Check for final step [\#19708](https://github.com/netdata/netdata/pull/19708) ([stelfrag](https://github.com/stelfrag))
-- daemon status improvements 3 [\#19707](https://github.com/netdata/netdata/pull/19707) ([ktsaou](https://github.com/ktsaou))
-- fix runtime directory; annotate daemon status file [\#19706](https://github.com/netdata/netdata/pull/19706) ([ktsaou](https://github.com/ktsaou))
-- Add repository priority configuration for DEB package repositories. [\#19705](https://github.com/netdata/netdata/pull/19705) ([Ferroin](https://github.com/Ferroin))
-- add host/os fields to status file [\#19704](https://github.com/netdata/netdata/pull/19704) ([ktsaou](https://github.com/ktsaou))
-- under MSYS2 use stat [\#19703](https://github.com/netdata/netdata/pull/19703) ([ktsaou](https://github.com/ktsaou))
-- Integrate OpenTelemetry collector build into build system. [\#19702](https://github.com/netdata/netdata/pull/19702) ([Ferroin](https://github.com/Ferroin))
-- Document journal v2 index file format. [\#19701](https://github.com/netdata/netdata/pull/19701) ([vkalintiris](https://github.com/vkalintiris))
## [v2.2.6](https://github.com/netdata/netdata/tree/v2.2.6) (2025-02-20)
diff --git a/packaging/version b/packaging/version
index 6a7fccf041b02b..dbe2d6e05f6c1e 100644
--- a/packaging/version
+++ b/packaging/version
@@ -1 +1 @@
-v2.5.0-6-nightly
+v2.5.0-13-nightly
From 07d801dd0a8b69bb2d71a1d66cdd32ac5cdabff8 Mon Sep 17 00:00:00 2001
From: Ilya Mashchenko
Date: Thu, 8 May 2025 11:27:32 +0300
Subject: [PATCH 14/51] fix obsolete chart cleanup to properly handle vnodes
(#20254)
---
src/daemon/service.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/daemon/service.c b/src/daemon/service.c
index a94478c6eccfdf..c834239c57c04a 100644
--- a/src/daemon/service.c
+++ b/src/daemon/service.c
@@ -185,7 +185,7 @@ static void svc_rrd_cleanup_obsolete_charts_from_all_hosts() {
svc_rrdhost_cleanup_charts_marked_obsolete(host);
- if (host == localhost)
+ if (rrdhost_is_local(host))
continue;
rrdhost_receiver_lock(host);
From 0e228626084a08dd6766a86dce45e7535e976c03 Mon Sep 17 00:00:00 2001
From: Ilya Mashchenko
Date: Thu, 8 May 2025 12:57:51 +0300
Subject: [PATCH 15/51] chore(go.d/snmp): make enable_profiles configurable
(needed for dev) (#20255)
---
src/go/plugin/go.d/collector/snmp/collect.go | 2 +-
src/go/plugin/go.d/collector/snmp/collector.go | 2 --
.../go.d/collector/snmp/collector_test.go | 18 +++++++++---------
src/go/plugin/go.d/collector/snmp/config.go | 1 +
.../go.d/collector/snmp/testdata/config.json | 3 ++-
.../go.d/collector/snmp/testdata/config.yaml | 1 +
6 files changed, 14 insertions(+), 13 deletions(-)
diff --git a/src/go/plugin/go.d/collector/snmp/collect.go b/src/go/plugin/go.d/collector/snmp/collect.go
index 785c3eb1404195..372d039c1e5fcc 100644
--- a/src/go/plugin/go.d/collector/snmp/collect.go
+++ b/src/go/plugin/go.d/collector/snmp/collect.go
@@ -20,7 +20,7 @@ func (c *Collector) collect() (map[string]int64, error) {
mx := make(map[string]int64)
- if c.enableProfiles {
+ if c.EnableProfiles {
sysObjectID, err := c.getSysObjectID(snmpsd.OidSysObject)
if err != nil {
return nil, err
diff --git a/src/go/plugin/go.d/collector/snmp/collector.go b/src/go/plugin/go.d/collector/snmp/collector.go
index de2942d42f79b0..353835359b1450 100644
--- a/src/go/plugin/go.d/collector/snmp/collector.go
+++ b/src/go/plugin/go.d/collector/snmp/collector.go
@@ -66,8 +66,6 @@ type Collector struct {
module.Base
Config `yaml:",inline" json:""`
- enableProfiles bool
-
vnode *vnodes.VirtualNode
charts *module.Charts
diff --git a/src/go/plugin/go.d/collector/snmp/collector_test.go b/src/go/plugin/go.d/collector/snmp/collector_test.go
index 3b1324cad36db2..8a9ebdb2d409c1 100644
--- a/src/go/plugin/go.d/collector/snmp/collector_test.go
+++ b/src/go/plugin/go.d/collector/snmp/collector_test.go
@@ -156,7 +156,7 @@ func TestCollector_Charts(t *testing.T) {
prepareSNMP: func(t *testing.T, m *snmpmock.MockHandler) *Collector {
collr := New()
collr.Config = prepareV2Config()
- if collr.enableProfiles {
+ if collr.EnableProfiles {
setMockClientSysObjectidExpect(m)
}
setMockClientSysExpect(m)
@@ -208,7 +208,7 @@ func TestCollector_Check(t *testing.T) {
prepareSNMP: func(m *snmpmock.MockHandler) *Collector {
collr := New()
collr.Config = prepareV2Config()
- if collr.enableProfiles {
+ if collr.EnableProfiles {
setMockClientSysObjectidExpect(m)
}
setMockClientIfMibExpect(m)
@@ -223,7 +223,7 @@ func TestCollector_Check(t *testing.T) {
collr.Config = prepareConfigWithUserCharts(prepareV2Config(), 0, 3)
collr.collectIfMib = false
- if collr.enableProfiles {
+ if collr.EnableProfiles {
setMockClientSysObjectidExpect(m)
}
@@ -249,7 +249,7 @@ func TestCollector_Check(t *testing.T) {
collr := New()
collr.Config = prepareConfigWithUserCharts(prepareV2Config(), 0, 3)
collr.collectIfMib = false
- if collr.enableProfiles {
+ if collr.EnableProfiles {
setMockClientSysObjectidExpect(m)
}
m.EXPECT().Get(gomock.Any()).Return(nil, errors.New("mock Get() error")).Times(1)
@@ -291,7 +291,7 @@ func TestCollector_Collect(t *testing.T) {
collr := New()
collr.Config = prepareV2Config()
- if collr.enableProfiles {
+ if collr.EnableProfiles {
setMockClientSysObjectidExpect(m)
}
setMockClientIfMibExpect(m)
@@ -397,7 +397,7 @@ func TestCollector_Collect(t *testing.T) {
collr.Config = prepareConfigWithUserCharts(prepareV2Config(), 0, 3)
collr.collectIfMib = false
- if collr.enableProfiles {
+ if collr.EnableProfiles {
setMockClientSysObjectidExpect(m)
}
@@ -435,7 +435,7 @@ func TestCollector_Collect(t *testing.T) {
collr.Config = prepareConfigWithUserCharts(prepareV2Config(), 0, 2)
collr.collectIfMib = false
- if collr.enableProfiles {
+ if collr.EnableProfiles {
setMockClientSysObjectidExpect(m)
}
@@ -466,7 +466,7 @@ func TestCollector_Collect(t *testing.T) {
collr.Config = prepareConfigWithUserCharts(prepareV2Config(), 0, 2)
collr.collectIfMib = false
- if collr.enableProfiles {
+ if collr.EnableProfiles {
setMockClientSysObjectidExpect(m)
}
@@ -505,7 +505,7 @@ func TestCollector_Collect(t *testing.T) {
mx := collr.Collect(context.Background())
- if collr.enableProfiles {
+ if collr.EnableProfiles {
mx["TestMetric"] = 1
}
diff --git a/src/go/plugin/go.d/collector/snmp/config.go b/src/go/plugin/go.d/collector/snmp/config.go
index 740244320b0387..106a1a7811db6d 100644
--- a/src/go/plugin/go.d/collector/snmp/config.go
+++ b/src/go/plugin/go.d/collector/snmp/config.go
@@ -15,6 +15,7 @@ type (
Options Options `yaml:"options,omitempty" json:"options"`
ChartsInput []ChartConfig `yaml:"charts,omitempty" json:"charts"`
NetworkInterfaceFilter NetworkInterfaceFilter `yaml:"network_interface_filter,omitempty" json:"network_interface_filter"`
+ EnableProfiles bool `yaml:"enable_profiles,omitempty" json:"enable_profiles"`
}
NetworkInterfaceFilter struct {
ByName string `yaml:"by_name,omitempty" json:"by_name"`
diff --git a/src/go/plugin/go.d/collector/snmp/testdata/config.json b/src/go/plugin/go.d/collector/snmp/testdata/config.json
index ba6c7797765cdf..9aac29603ffcd4 100644
--- a/src/go/plugin/go.d/collector/snmp/testdata/config.json
+++ b/src/go/plugin/go.d/collector/snmp/testdata/config.json
@@ -52,5 +52,6 @@
}
]
}
- ]
+ ],
+ "enable_profiles": true
}
diff --git a/src/go/plugin/go.d/collector/snmp/testdata/config.yaml b/src/go/plugin/go.d/collector/snmp/testdata/config.yaml
index 17a03354380870..0c73314b289ddd 100644
--- a/src/go/plugin/go.d/collector/snmp/testdata/config.yaml
+++ b/src/go/plugin/go.d/collector/snmp/testdata/config.yaml
@@ -1,6 +1,7 @@
update_every: 123
hostname: "ok"
create_vnode: yes
+enable_profiles: yes
vnode:
name: "ok"
guid: "ok"
From 9562586917d049b060b86003568d0212992125db Mon Sep 17 00:00:00 2001
From: Ilya Mashchenko
Date: Thu, 8 May 2025 14:41:15 +0300
Subject: [PATCH 16/51] fix(go.d/sd/snmp): fix snmnpv3 again (#20256)
---
.../discovery/sd/discoverer/snmpsd/config.go | 12 +++----
.../sd/discoverer/snmpsd/discoverer_test.go | 32 +++++++++----------
src/go/plugin/go.d/config/go.d/sd/snmp.conf | 14 ++++----
3 files changed, 29 insertions(+), 29 deletions(-)
diff --git a/src/go/plugin/go.d/agent/discovery/sd/discoverer/snmpsd/config.go b/src/go/plugin/go.d/agent/discovery/sd/discoverer/snmpsd/config.go
index 82bb8248ec8727..3b3cbf1ff9424b 100644
--- a/src/go/plugin/go.d/agent/discovery/sd/discoverer/snmpsd/config.go
+++ b/src/go/plugin/go.d/agent/discovery/sd/discoverer/snmpsd/config.go
@@ -49,12 +49,12 @@ type (
SecurityLevel string `yaml:"security_level"`
// AuthProtocol must be one of: "md5", "sha", "sha224", "sha256", "sha384", "sha512" (for SNMPv3)
AuthProtocol string `yaml:"auth_protocol"`
- // AuthPassword is the authentication passphrase (for SNMPv3)
- AuthPassword string `yaml:"auth_password"`
+ // AuthPassphrase is the authentication passphrase (for SNMPv3)
+ AuthPassphrase string `yaml:"auth_password"`
// PrivacyProtocol must be one of: "des", "aes", "aes192", "aes256", "aes192C", "aes256C" (for SNMPv3)
PrivacyProtocol string `yaml:"priv_protocol"`
- // PrivacyPassword is the privacy passphrase (for SNMPv3)
- PrivacyPassword string `yaml:"priv_password"`
+ // PrivacyPassphrase is the privacy passphrase (for SNMPv3)
+ PrivacyPassphrase string `yaml:"priv_password"`
}
)
@@ -136,9 +136,9 @@ func setCredential(client gosnmp.Handler, cred CredentialConfig) {
client.SetSecurityParameters(&gosnmp.UsmSecurityParameters{
UserName: cred.UserName,
AuthenticationProtocol: parseSNMPv3AuthProtocol(cred),
- AuthenticationPassphrase: cred.AuthPassword,
+ AuthenticationPassphrase: cred.AuthPassphrase,
PrivacyProtocol: parseSNMPv3PrivProtocol(cred),
- PrivacyPassphrase: cred.PrivacyPassword,
+ PrivacyPassphrase: cred.PrivacyPassphrase,
})
}
}
diff --git a/src/go/plugin/go.d/agent/discovery/sd/discoverer/snmpsd/discoverer_test.go b/src/go/plugin/go.d/agent/discovery/sd/discoverer/snmpsd/discoverer_test.go
index 15b08b4bbd4a5e..14c42ed863c01c 100644
--- a/src/go/plugin/go.d/agent/discovery/sd/discoverer/snmpsd/discoverer_test.go
+++ b/src/go/plugin/go.d/agent/discovery/sd/discoverer/snmpsd/discoverer_test.go
@@ -45,14 +45,14 @@ func TestNewDiscoverer(t *testing.T) {
cfg: Config{
Credentials: []CredentialConfig{
{
- Name: "v3cred",
- Version: "3",
- UserName: "user",
- SecurityLevel: "authPriv",
- AuthProtocol: "sha",
- AuthPassword: "authpass",
- PrivacyProtocol: "aes",
- PrivacyPassword: "privpass",
+ Name: "v3cred",
+ Version: "3",
+ UserName: "user",
+ SecurityLevel: "authPriv",
+ AuthProtocol: "sha",
+ AuthPassphrase: "authpass",
+ PrivacyProtocol: "aes",
+ PrivacyPassphrase: "privpass",
},
},
Networks: []NetworkConfig{
@@ -66,14 +66,14 @@ func TestNewDiscoverer(t *testing.T) {
{Name: "v1cred", Version: "1", Community: "public"},
{Name: "v2cred", Version: "2c", Community: "private"},
{
- Name: "v3cred",
- Version: "3",
- UserName: "user",
- SecurityLevel: "authPriv",
- AuthProtocol: "sha",
- AuthPassword: "authpass",
- PrivacyProtocol: "aes",
- PrivacyPassword: "privpass",
+ Name: "v3cred",
+ Version: "3",
+ UserName: "user",
+ SecurityLevel: "authPriv",
+ AuthProtocol: "sha",
+ AuthPassphrase: "authpass",
+ PrivacyProtocol: "aes",
+ PrivacyPassphrase: "privpass",
},
},
Networks: []NetworkConfig{
diff --git a/src/go/plugin/go.d/config/go.d/sd/snmp.conf b/src/go/plugin/go.d/config/go.d/sd/snmp.conf
index 0ff933400c9c1a..18e3b6b3be48c0 100644
--- a/src/go/plugin/go.d/config/go.d/sd/snmp.conf
+++ b/src/go/plugin/go.d/config/go.d/sd/snmp.conf
@@ -90,11 +90,11 @@ compose:
{{- if eq .Credential.Version "1" "2" }}
community: {{ .Credential.Community }}
{{- else }}
- user:
- name: {{ .Credential.UserName }}
- level: {{ .Credential.SecurityLevel }}
- auth_proto: {{ .Credential.AuthProtocol }}
- auth_key: {{ .Credential.AuthPassphrase }}
- priv_proto: {{ .Credential.PrivacyProtocol }}
- priv_key: {{ .Credential.PrivacyPassphrase }}
+ user:
+ name: {{ .Credential.UserName }}
+ level: {{ .Credential.SecurityLevel }}
+ auth_proto: {{ .Credential.AuthProtocol }}
+ auth_key: {{ .Credential.AuthPassphrase }}
+ priv_proto: {{ .Credential.PrivacyProtocol }}
+ priv_key: {{ .Credential.PrivacyPassphrase }}
{{- end }}
From f9794d06bfde59b63980fce42a45756ad4c05ee9 Mon Sep 17 00:00:00 2001
From: n0099
Date: Fri, 9 May 2025 00:44:29 +0800
Subject: [PATCH 17/51] Clearify the path of `plugins.d/go.d.plugin` in docs
(#20258)
Update README.md
---
src/go/plugin/go.d/README.md | 15 ++++++++++-----
1 file changed, 10 insertions(+), 5 deletions(-)
diff --git a/src/go/plugin/go.d/README.md b/src/go/plugin/go.d/README.md
index f8f32b40f4a708..c8e87392daf684 100644
--- a/src/go/plugin/go.d/README.md
+++ b/src/go/plugin/go.d/README.md
@@ -190,7 +190,7 @@ Then [restart netdata](/docs/netdata-agent/start-stop-restart.md) for the change
## Troubleshooting
-Plugin CLI:
+### Plugin CLI:
```sh
Usage:
@@ -207,14 +207,19 @@ Help Options:
-h, --help Show this help message
```
-To debug specific module:
+### To debug specific module:
-```sh
+```bash
# become user netdata
sudo su -s /bin/bash netdata
+```
-# run plugin in debug mode
-./go.d.plugin -d -m
+Depending on where Netdata was installed, execute one of the following commands to trace the execution of a python module:
+
+```bash
+# execute the plugin in debug mode, for a specific module
+/opt/netdata/usr/libexec/netdata/plugins.d/go.d.plugin -d -m
+/usr/libexec/netdata/plugins.d/go.d.plugin -d -m
```
Change `` to the [module name](#available-modules) you want to debug.
From bd64b5d599542afb100736ead4558fa6c8708526 Mon Sep 17 00:00:00 2001
From: kanelatechnical
Date: Thu, 8 May 2025 20:06:52 +0300
Subject: [PATCH 18/51] Update documentation for native DEB/RPM packages
(#20257)
Co-authored-by: Austin S. Hemmelgarn
---
packaging/installer/methods/packages.md | 214 +++++++++++++++++-------
1 file changed, 153 insertions(+), 61 deletions(-)
diff --git a/packaging/installer/methods/packages.md b/packaging/installer/methods/packages.md
index 585331ab6c5c67..f67606236244d8 100644
--- a/packaging/installer/methods/packages.md
+++ b/packaging/installer/methods/packages.md
@@ -1,24 +1,54 @@
# Install Netdata Using Native DEB/RPM Packages
+:::note
+
Netdata provides pre-built native packages for most DEB- and RPM-based Linux distributions, following our [platform support policy](/docs/netdata-agent/versions-and-platforms.md).
-Our [kickstart.sh installer](/packaging/installer/methods/kickstart.md) uses these packages by default on supported platforms.
+:::
+
+Install Netdata using our [kickstart.sh installer](/packaging/installer/methods/kickstart.md), which automatically uses native packages on supported platforms.
-Add `--native-only` when running `kickstart.sh` to force native packages. The script will fail if native packages aren’t available.
+To ensure the installer only uses native packages, add the `--native-only` option when running `kickstart.sh`.
:::note
-Until late 2024, Netdata packages were hosted on Package Cloud. All packages are now provided exclusively from our own repositories.
+Our previous PackageCloud repositories are no longer updated. All packages are now available exclusively from our own repositories.
:::
----
+## Repository Structure Overview
+
+Our repository system follows a structured organization:
+
+```
+repository.netdata.cloud/repos/
+├── stable/ # Stable Netdata Agent releases
+│ ├── debian/ # For Debian-based distributions
+│ │ ├── bullseye/ # Distribution codename directories
+│ │ ├── bookworm/
+│ │ └── ...
+│ ├── ubuntu/ # For Ubuntu-based distributions
+│ │ ├── focal/
+│ │ ├── jammy/
+│ │ └── ...
+│ ├── el/ # For RHEL-based distributions
+│ │ ├── 8/ # Version directories
+│ │ │ ├── x86_64/ # Architecture directories
+│ │ │ ├── aarch64/
+│ │ │ └── ...
+│ │ ├── 9/
+│ │ └── ...
+│ └── other distros...
+├── edge/ # Nightly builds (same structure)
+├── repoconfig/ # Configuration packages
+└── devel/ # Development builds (ignore)
+```
## Manual Setup of RPM Packages
-Repositories: [https://repository.netdata.cloud/repos/index.html](https://repository.netdata.cloud/repos/index.html)
+You can find our RPM repositories at: [https://repository.netdata.cloud/repos/index.html](https://repository.netdata.cloud/repos/index.html)
-Available groups:
+### Available Repository Groups
| Repo | Purpose |
|--------------|-------------------------------|
@@ -27,26 +57,38 @@ Available groups:
| `repoconfig` | Configuration packages |
| `devel` | Dev builds (ignore) |
-Supported distributions:
+### Supported Distributions
+
+Within each repository group, you'll find directories for specific distributions:
-- `amazonlinux`
-- `el` (RHEL, CentOS, AlmaLinux, Rocky Linux)
-- `fedora`
-- `ol` (Oracle Linux)
-- `opensuse`
+| Repository Directory | Primary Distribution | Compatible Distributions |
+|---------------------|----------------------|--------------------------|
+| `amazonlinux` | Amazon Linux | Binary-compatible Amazon Linux based distros |
+| `el` | Red Hat Enterprise Linux | CentOS, AlmaLinux, Rocky Linux, and other binary-compatible distros |
+| `fedora` | Fedora | Binary-compatible Fedora-based distros |
+| `ol` | Oracle Linux | Binary-compatible Oracle Linux based distros |
+| `opensuse` | openSUSE | Binary-compatible SUSE-based distros |
-Example repository for RHEL 9 x86_64:
+### Repository Structure
+
+Each distribution has:
+1. Directories for each supported release version
+2. Subdirectories for each supported CPU architecture containing the actual packages
+
+**Example:** For RHEL 9 on 64-bit x86, you'll find the stable repository at:
[https://repository.netdata.cloud/repos/stable/el/9/x86_64/](https://repository.netdata.cloud/repos/stable/el/9/x86_64/)
-GPG Key fingerprint:
+### Package Signing
+
+Our RPM packages and repository metadata are signed with a GPG key with a username of `Netdatabot` and the fingerprint:
`6E155DC153906B73765A74A99DD4A74CECFA8F4F`
-Public key:
+Download the public key from:
[https://repository.netdata.cloud/netdatabot.gpg.key](https://repository.netdata.cloud/netdatabot.gpg.key)
-### Steps
+### Installation Steps
-1. Download config package:
+1. Download the appropriate config package for your distribution:
[https://repository.netdata.cloud/repos/repoconfig/index.html](https://repository.netdata.cloud/repos/repoconfig/index.html)
2. Install it with your package manager:
@@ -57,17 +99,17 @@ Public key:
sudo dnf install netdata
```
- > **Note**
- > On RHEL systems, EPEL repository is required.
- > Our config packages handle this automatically — if not, install epel-release manually.
-
----
+ :::note
+
+ On RHEL and other `el` repository distributions, some Netdata dependencies are in the EPEL repository. Our config packages typically handle this automatically, but if you encounter issues, install `epel-release` manually.
+
+ :::
## Manual Setup of DEB Packages
-Repositories: [https://repository.netdata.cloud/repos/index.html](https://repository.netdata.cloud/repos/index.html)
+You can find our DEB repositories at: [https://repository.netdata.cloud/repos/index.html](https://repository.netdata.cloud/repos/index.html)
-Available groups:
+### Available Repository Groups
| Repo | Purpose |
|--------------|-------------------------------|
@@ -76,18 +118,46 @@ Available groups:
| `repoconfig` | Configuration packages |
| `devel` | Dev builds (ignore) |
-Supported distributions:
+### Supported Distributions
+
+Within each repository group, you'll find directories for specific distributions:
+
+- `debian`: For Debian Linux and binary-compatible distributions
+- `ubuntu`: For Ubuntu Linux and binary-compatible distributions
+
+### Repository Structure
+
+Our DEB repositories use a **flat repository structure** (per Debian standards) and support **by-hash** metadata retrieval for improved reliability.
+
+Each directory contains subdirectories for supported releases, named by codename (e.g., `bullseye/`, `jammy/`).
-- `debian`
-- `ubuntu`
+:::important
-APT source for Debian 11 (Bullseye):
+When configuring repository URLs, include the trailing slash (`/`) after the codename. This is required for the repository to be processed correctly.
+
+:::
+
+### Package Signing
+
+Our DEB packages and repository metadata are signed with a GPG key with a username of `Netdatabot` and the fingerprint:
+`6E155DC153906B73765A74A99DD4A74CECFA8F4F`
+
+Download the public key from:
+[https://repository.netdata.cloud/netdatabot.gpg.key](https://repository.netdata.cloud/netdatabot.gpg.key)
+
+### Example Configuration
+
+
+Click to view example APT configuration
+
+
+Here's an example APT sources entry for Debian 11 (Bullseye) stable releases:
```
deb by-hash=yes http://repository.netdata.cloud/repos/stable/debian/ bullseye/
```
-Deb822 format:
+And the equivalent Deb822 format:
```
Types: deb
@@ -96,16 +166,11 @@ Suites: bullseye/
By-Hash: Yes
Enabled: Yes
```
+
-GPG Key fingerprint:
-`6E155DC153906B73765A74A99DD4A74CECFA8F4F`
-
-Public key:
-[https://repository.netdata.cloud/netdatabot.gpg.key](https://repository.netdata.cloud/netdatabot.gpg.key)
-
-### Steps
+### Installation Steps
-1. Download config package:
+1. Download the appropriate config package for your distribution:
[https://repository.netdata.cloud/repos/repoconfig/index.html](https://repository.netdata.cloud/repos/repoconfig/index.html)
2. Install it using your package manager:
@@ -117,46 +182,73 @@ Public key:
sudo apt install netdata
```
----
+## Example: Complete Installation on Ubuntu 22.04 (Jammy)
+
+
+Click to view complete installation example
+
+
+Here's a complete example of installing Netdata on Ubuntu 22.04 using native packages:
+
+```bash
+# Step 1: Download the repository configuration package
+wget https://repository.netdata.cloud/repos/repoconfig/ubuntu/jammy/netdata-repo_latest.jammy_all.deb
+
+# Step 2: Install the repository configuration
+sudo apt install ./netdata-repo_latest.jammy_all.deb
+
+# Step 3: Update package lists
+sudo apt update
+
+# Step 4: Install Netdata
+sudo apt install netdata
+
+# Step 5: Start and enable Netdata service
+sudo systemctl enable --now netdata
+
+# Step 6: Verify installation
+curl localhost:19999/api/v1/info
+```
+
+After installation, you can access the Netdata dashboard at `http://localhost:19999`.
+
## Local Mirrors of the Official Netdata Repositories
-You can mirror Netdata’s repositories:
+You can create local mirrors of our repositories using two main approaches:
-### Recommended Methods:
+### Recommended Mirroring Methods
-| Method | Use case |
-|------------------|---------------------------------------|
-| Standard tools | e.g., Aptly (APT) or `reposync` (RPM) |
-| Simple mirroring | Use `wget --mirror` or similar tools |
+| Method | Use case | Example |
+|------------------|---------------------------------------|---------|
+| Standard tools | For formal repository mirroring | `aptly mirror create netdata-stable http://repository.netdata.cloud/repos/stable/debian/ bullseye/` |
+| Simple mirroring | For basic HTTP mirroring | `wget --mirror https://repository.netdata.cloud/repos/` |
+
+### Mirror Root URL
-Mirror root URL:
[https://repository.netdata.cloud/repos/](https://repository.netdata.cloud/repos/)
----
+### Important Mirroring Tips
-### Mirror Tips:
+:::important
-- Config packages don’t support custom mirrors — configure mirrors manually.
-- Packages are built in stages by architecture.
-- Metadata updates up to six times/hour.
-- Full mirror can require up to **100 GB**.
-- Ideal sync window: **05:00–08:00 UTC**.
-- Fetch a GPG key from:
+* **Repository config packages:** These don't support custom mirrors (except caching proxies like `apt-cacher-ng`). Configure mirrors manually.
+* **Build process:** Packages are built in stages by architecture (64-bit x86 first, then others). Full publishing takes several hours.
+* **Update frequency:** Metadata updates up to six times per hour, but syncing hourly is sufficient.
+* **Storage requirements:** A full mirror can require up to **100 GB** of space. Mirror only what you need.
+* **Recommended sync time:** For daily syncing, **05:00–08:00 UTC** is ideal, as nightly packages are typically published by then.
+* **GPG verification:** If using our GPG signatures, download our public key:
[https://repository.netdata.cloud/netdatabot.gpg.key](https://repository.netdata.cloud/netdatabot.gpg.key)
----
+:::
## Public Mirrors of the Official Netdata Repositories
-:::note
+There are currently no official public mirrors of our repositories. If you wish to provide a public mirror of our repositories, you are welcome to do so.
-**There are no official public mirrors**.
+:::important
-:::
+Please clearly inform your users that your mirror is not officially supported by Netdata. We recommend following industry best practices for repository mirroring and security.
-If you wish to provide a public mirror of Netdata repositories:
+:::
-- You’re free to do so.
-- Please clearly state to your users that it is *not* an official mirror.
-- Follow best practices for repository mirroring and security.
\ No newline at end of file
From cf5cd7c25753bd6932bc32a3e665133d5cf620b4 Mon Sep 17 00:00:00 2001
From: Ilya Mashchenko
Date: Thu, 8 May 2025 20:08:15 +0300
Subject: [PATCH 19/51] docs: reword go.d Troubleshooting section for clarity
(#20259)
---
src/go/plugin/go.d/README.md | 19 +++++++++++--------
1 file changed, 11 insertions(+), 8 deletions(-)
diff --git a/src/go/plugin/go.d/README.md b/src/go/plugin/go.d/README.md
index c8e87392daf684..61a698617bb7ae 100644
--- a/src/go/plugin/go.d/README.md
+++ b/src/go/plugin/go.d/README.md
@@ -190,7 +190,7 @@ Then [restart netdata](/docs/netdata-agent/start-stop-restart.md) for the change
## Troubleshooting
-### Plugin CLI:
+### Plugin CLI
```sh
Usage:
@@ -207,19 +207,22 @@ Help Options:
-h, --help Show this help message
```
-### To debug specific module:
+### Debugging a Specific Module
+
+To debug a particular module, first switch to the Netdata user:
```bash
-# become user netdata
sudo su -s /bin/bash netdata
```
-Depending on where Netdata was installed, execute one of the following commands to trace the execution of a python module:
+Then run the plugin in debug mode, specifying your target module:
```bash
-# execute the plugin in debug mode, for a specific module
-/opt/netdata/usr/libexec/netdata/plugins.d/go.d.plugin -d -m
-/usr/libexec/netdata/plugins.d/go.d.plugin -d -m
+# For standard installations
+/usr/libexec/netdata/plugins.d/go.d.plugin -d -m
+
+# For static installations (e.g., in /opt)
+/opt/netdata/usr/libexec/netdata/plugins.d/go.d.plugin -d -m
```
-Change `` to the [module name](#available-modules) you want to debug.
+Replace` ` with the [specific module](#available-modules) you wish to debug.
From 89e869b5ad9d24c564d8ba800eb0b2698f227881 Mon Sep 17 00:00:00 2001
From: netdatabot
Date: Fri, 9 May 2025 00:23:07 +0000
Subject: [PATCH 20/51] [ci skip] Update changelog and version for nightly
build: v2.5.0-20-nightly.
---
CHANGELOG.md | 21 +++++++++++----------
packaging/version | 2 +-
2 files changed, 12 insertions(+), 11 deletions(-)
diff --git a/CHANGELOG.md b/CHANGELOG.md
index b39bcb699b0886..97764db4da9cb7 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -2,10 +2,16 @@
## [**Next release**](https://github.com/netdata/netdata/tree/HEAD)
-[Full Changelog](https://github.com/netdata/netdata/compare/v2.5.0...HEAD)
+[Full Changelog](https://github.com/netdata/netdata/compare/v2.5.1...HEAD)
**Merged pull requests:**
+- docs: reword go.d Troubleshooting section for clarity [\#20259](https://github.com/netdata/netdata/pull/20259) ([ilyam8](https://github.com/ilyam8))
+- Clearify the path of `plugins.d/go.d.plugin` in docs [\#20258](https://github.com/netdata/netdata/pull/20258) ([n0099](https://github.com/n0099))
+- Update documentation for native DEB/RPM packages [\#20257](https://github.com/netdata/netdata/pull/20257) ([kanelatechnical](https://github.com/kanelatechnical))
+- fix\(go.d/sd/snmp\): fix snmnpv3 again [\#20256](https://github.com/netdata/netdata/pull/20256) ([ilyam8](https://github.com/ilyam8))
+- chore\(go.d/snmp\): make enable\_profiles configurable \(needed for dev\) [\#20255](https://github.com/netdata/netdata/pull/20255) ([ilyam8](https://github.com/ilyam8))
+- fix obsolete chart cleanup to properly handle vnodes [\#20254](https://github.com/netdata/netdata/pull/20254) ([ilyam8](https://github.com/ilyam8))
- docs: fix license link and remove GH alerts syntax from FAQ [\#20252](https://github.com/netdata/netdata/pull/20252) ([ilyam8](https://github.com/ilyam8))
- Update Netdata README [\#20251](https://github.com/netdata/netdata/pull/20251) ([kanelatechnical](https://github.com/kanelatechnical))
- fix\(go.d/snmp\): use 32bit counters if 64 aren't available [\#20249](https://github.com/netdata/netdata/pull/20249) ([ilyam8](https://github.com/ilyam8))
@@ -17,6 +23,10 @@
- Revert "Revert "fix\(go.d/couchdb\): correct db size charts unit"" [\#20235](https://github.com/netdata/netdata/pull/20235) ([ilyam8](https://github.com/ilyam8))
- Make all threads joinable and join on agent shutdown [\#20228](https://github.com/netdata/netdata/pull/20228) ([stelfrag](https://github.com/stelfrag))
+## [v2.5.1](https://github.com/netdata/netdata/tree/v2.5.1) (2025-05-08)
+
+[Full Changelog](https://github.com/netdata/netdata/compare/v2.5.0...v2.5.1)
+
## [v2.5.0](https://github.com/netdata/netdata/tree/v2.5.0) (2025-05-05)
[Full Changelog](https://github.com/netdata/netdata/compare/v2.4.0...v2.5.0)
@@ -465,15 +475,6 @@
- feat\(go.d\): add snmp devices discovery [\#19720](https://github.com/netdata/netdata/pull/19720) ([ilyam8](https://github.com/ilyam8))
- save status on out of memory event [\#19719](https://github.com/netdata/netdata/pull/19719) ([ktsaou](https://github.com/ktsaou))
- attempt to save status file from the signal handler [\#19718](https://github.com/netdata/netdata/pull/19718) ([ktsaou](https://github.com/ktsaou))
-- unified out of memory handling [\#19717](https://github.com/netdata/netdata/pull/19717) ([ktsaou](https://github.com/ktsaou))
-- chore\(go.d\): add file persister [\#19716](https://github.com/netdata/netdata/pull/19716) ([ilyam8](https://github.com/ilyam8))
-- do not call cleanup and exit on fatal conditions during startup [\#19715](https://github.com/netdata/netdata/pull/19715) ([ktsaou](https://github.com/ktsaou))
-- do not use mmap when the mmap limit is too low [\#19714](https://github.com/netdata/netdata/pull/19714) ([ktsaou](https://github.com/ktsaou))
-- systemd-journal: allow almost all fields to be facets [\#19713](https://github.com/netdata/netdata/pull/19713) ([ktsaou](https://github.com/ktsaou))
-- deduplicate all crash reports [\#19712](https://github.com/netdata/netdata/pull/19712) ([ktsaou](https://github.com/ktsaou))
-- 4 malloc arenas for parents, not IoT [\#19711](https://github.com/netdata/netdata/pull/19711) ([ktsaou](https://github.com/ktsaou))
-- Fix Fresh Installation on Microsoft [\#19710](https://github.com/netdata/netdata/pull/19710) ([thiagoftsm](https://github.com/thiagoftsm))
-- Avoid post initialization errors repeateadly [\#19709](https://github.com/netdata/netdata/pull/19709) ([ktsaou](https://github.com/ktsaou))
## [v2.2.6](https://github.com/netdata/netdata/tree/v2.2.6) (2025-02-20)
diff --git a/packaging/version b/packaging/version
index dbe2d6e05f6c1e..b7e831214ee891 100644
--- a/packaging/version
+++ b/packaging/version
@@ -1 +1 @@
-v2.5.0-13-nightly
+v2.5.0-20-nightly
From aef0d226ff0e7ba6dc3b3b4d9b6b9bb8804ef69e Mon Sep 17 00:00:00 2001
From: Ilya Mashchenko
Date: Fri, 9 May 2025 09:34:29 +0300
Subject: [PATCH 21/51] fix(go.d/mysql): fix MariaDB User CPU Time (#20262)
---
.../go.d/collector/mysql/collect_user_statistics.go | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/src/go/plugin/go.d/collector/mysql/collect_user_statistics.go b/src/go/plugin/go.d/collector/mysql/collect_user_statistics.go
index 3f15a27761195f..d1f46b7f368e7f 100644
--- a/src/go/plugin/go.d/collector/mysql/collect_user_statistics.go
+++ b/src/go/plugin/go.d/collector/mysql/collect_user_statistics.go
@@ -27,15 +27,13 @@ func (c *Collector) collectUserStatistics(mx map[string]int64) error {
c.addUserStatisticsCharts(user)
}
case "Cpu_time":
+ // https://jira.mariadb.org/browse/MDEV-36586
needsDivision := c.isMariaDB &&
- ((c.version.Major == 10 && c.version.GTE(semver.Version{Major: 10, Minor: 11, Patch: 11})) ||
- c.version.GTE(semver.Version{Major: 11, Minor: 4, Patch: 5}))
+ (c.version.EQ(semver.Version{Major: 10, Minor: 11, Patch: 11})) ||
+ c.version.EQ(semver.Version{Major: 11, Minor: 4, Patch: 5})
key := strings.ToLower(prefix + column)
if needsDivision {
- // TODO: theoretically should divide by 1e6 to convert to seconds,
- // but empirically need 1e7 to match pre-11.4.5 values.
- // Needs investigation - possible unit reporting inconsistency in MariaDB
mx[key] = int64(parseFloat(value) / 1e7 * 1000)
} else {
mx[key] = int64(parseFloat(value) * 1000)
From f59e75c9e7ab69be89910a5775b991bccae5c3ed Mon Sep 17 00:00:00 2001
From: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
Date: Fri, 9 May 2025 16:59:36 +0300
Subject: [PATCH 22/51] Minor fixes (#20263)
* Fix message
* Avoid division by zero (silence coverity)
* Switch to info for shutdown messages, reformat the shutdown_timing
---
src/daemon/daemon-shutdown-watcher.c | 8 +++++---
src/database/rrdset-collection.c | 2 +-
src/database/sqlite/sqlite_metadata.c | 2 ++
3 files changed, 8 insertions(+), 4 deletions(-)
diff --git a/src/daemon/daemon-shutdown-watcher.c b/src/daemon/daemon-shutdown-watcher.c
index bfa6662db1eca8..82f789cd21674a 100644
--- a/src/daemon/daemon-shutdown-watcher.c
+++ b/src/daemon/daemon-shutdown-watcher.c
@@ -120,7 +120,7 @@ void *watcher_main(void *arg)
// wait until the agent starts the shutdown process
completion_wait_for(&shutdown_begin_completion);
- netdata_log_error("Shutdown process started");
+ netdata_log_info("Shutdown process started");
usec_t shutdown_start_time = now_monotonic_usec();
@@ -150,8 +150,10 @@ void *watcher_main(void *arg)
usec_t shutdown_end_time = now_monotonic_usec();
usec_t shutdown_duration = shutdown_end_time - shutdown_start_time;
- netdata_log_error("Shutdown process ended in %llu milliseconds",
- shutdown_duration / USEC_PER_MS);
+
+ char shutdown_timing[64];
+ duration_snprintf(shutdown_timing, sizeof(shutdown_timing), (int64_t)shutdown_duration, "us", 1);
+ netdata_log_info("Shutdown process ended in %s", shutdown_timing);
daemon_status_file_shutdown_step(NULL, buffer_tostring(steps_timings));
daemon_status_file_update_status(DAEMON_STATUS_EXITED);
diff --git a/src/database/rrdset-collection.c b/src/database/rrdset-collection.c
index 7b1e045e2dc2c3..fad9e5187d89e6 100644
--- a/src/database/rrdset-collection.c
+++ b/src/database/rrdset-collection.c
@@ -682,7 +682,7 @@ void rrdset_timed_done(RRDSET *st, struct timeval now, bool pending_rrdset_next)
collected_total += rd->collector.collected_value;
if(unlikely(rrddim_flag_check(rd, RRDDIM_FLAG_OBSOLETE))) {
- netdata_log_error("Dimension %s in chart '%s' has the OBSOLETE or ARCHIVED flag set, but it is collected.", rrddim_name(rd), rrdset_id(st));
+ netdata_log_error("Dimension %s in chart '%s' has the OBSOLETE flag set, but it is collected.", rrddim_name(rd), rrdset_id(st));
if(!spinlock_trylock(&rd->destroy_lock))
fatal("RRDSET: dimension '%s' of chart '%s' of host '%s' is being collected while is being destroyed.", rrddim_id(rd), rrdset_id(st), rrdhost_hostname(st->rrdhost));
diff --git a/src/database/sqlite/sqlite_metadata.c b/src/database/sqlite/sqlite_metadata.c
index cdc06cb44b4b94..668cdc17d432de 100644
--- a/src/database/sqlite/sqlite_metadata.c
+++ b/src/database/sqlite/sqlite_metadata.c
@@ -2405,6 +2405,8 @@ static void store_hosts_metadata(BUFFER *work_buffer, bool is_worker)
host_count++;
}
dfe_done(host);
+ if (!host_count)
+ host_count = 1; // avoid division by zero
}
size_t count = 0;
From 5e6b8b26a28e1e24dfc21f9ed8ffe7dc5fcbfd8a Mon Sep 17 00:00:00 2001
From: kanelatechnical
Date: Fri, 9 May 2025 18:46:25 +0300
Subject: [PATCH 23/51] Update Netdata README with improved structure (#20265)
Co-authored-by: ilyam8
---
README.md | 66 +++++++++++++++++++++++++++++++++++++++++++------------
1 file changed, 52 insertions(+), 14 deletions(-)
diff --git a/README.md b/README.md
index e02e985688724e..047d6f49453d24 100644
--- a/README.md
+++ b/README.md
@@ -34,22 +34,15 @@
-MENU: **[GETTING STARTED](#getting-started)** | **[HOW IT WORKS](#how-it-works)** | **[FAQ](#faq)** | **[DOCS](#book-documentation)** | **[COMMUNITY](#tada-community)** | **[CONTRIBUTE](#pray-contribute)** | **[LICENSE](#scroll-license)**
+MENU: **[WHO WE ARE](#who-we-are)** | **[KEY FEATURES](#key-features)** | **[GETTING STARTED](#getting-started)** | **[HOW IT WORKS](#how-it-works)** | **[FAQ](#faq)** | **[DOCS](#book-documentation)** | **[COMMUNITY](#tada-community)** | **[CONTRIBUTE](#pray-contribute)** | **[LICENSE](#scroll-license)**
+
> [!WARNING]
> People **get addicted to Netdata.**
-> Once you use it on your systems, **there's no going back!**
+> Once you use it on your systems, *there's no going back.*
[]()
-## Most Energy-Efficient Monitoring Tool
-
-
-
-
-
-According to the [University of Amsterdam study](https://www.ivanomalavolta.com/files/papers/ICSOC_2023.pdf), Netdata is "the most energy-efficient tool" for monitoring Docker-based systems. The study also shows Netdata excels in CPU usage, RAM usage, and execution time compared to other monitoring solutions.
-
---
## WHO WE ARE
@@ -66,6 +59,34 @@ Netdata is an open-source, real-time infrastructure monitoring platform. Monitor
With Netdata, you get real-time, per-second updates. Clear **insights at a glance**, no complexity.
+
+ All heroes have a great origin story. Click to discover ours.
+
+
+In 2013, at the company where Costa Tsaousis was COO, a significant percentage of their cloud-based transactions failed silently, severely impacting business performance.
+
+Costa and his team tried every troubleshooting tool available at the time. None could identify the root cause. As Costa later wrote:
+
+“*I couldn’t believe that monitoring systems provide so few metrics and with such low resolution, scale so badly, and cost so much to run.*”
+
+Frustrated, he decided to build his own monitoring tool, starting from scratch.
+
+That decision led to countless late nights and weekends. It also sparked a fundamental shift in how infrastructure monitoring and troubleshooting are approached, both in method and in cost.
+
+
+### Most Energy-Efficient Monitoring Tool
+
+
+
+According to the [University of Amsterdam study](https://www.ivanomalavolta.com/files/papers/ICSOC_2023.pdf), Netdata is the most energy-efficient tool for monitoring Docker-based systems. The study also shows Netdata excels in CPU usage, RAM usage, and execution time compared to other monitoring solutions.
+
---
## Key Features
@@ -203,7 +224,7 @@ With Netdata you can run a modular pipeline for metrics collection, processing,
```mermaid
flowchart TB
- A[Netdata Agent]
+ A[Netdata Agent]:::mainNode
A1(Collect):::green --> A
A2(Store):::green --> A
A3(Learn):::green --> A
@@ -214,8 +235,9 @@ flowchart TB
A8(Query):::green --> A
A9(Score):::green --> A
- classDef green fill:#bbf3bb,stroke:#333,stroke-width:1px
- ```
+ classDef green fill:#bbf3bb,stroke:#333,stroke-width:1px,color:#000
+ classDef mainNode fill:#f0f0f0,stroke:#333,stroke-width:1px,color:#333
+```
With each Agent you can:
@@ -253,7 +275,11 @@ With the Netdata Agent, you can use these core capabilities out-of-the-box:
## CNCF Membership
-
+
+
+
+
+
Netdata actively supports and is a member of the Cloud Native Computing Foundation (CNCF).
It is one of the most starred projects in the CNCF landscape.
@@ -265,6 +291,7 @@ With the Netdata Agent, you can use these core capabilities out-of-the-box:
Is Netdata secure?
+
Yes. Netdata follows [OpenSSF best practices](https://bestpractices.coreinfrastructure.org/en/projects/2231), has a security-first design, and is regularly audited by the community.
@@ -275,6 +302,7 @@ Yes. Netdata follows [OpenSSF best practices](https://bestpractices.coreinfrastr
Does Netdata use a lot of resources?
+
No. Even with ML and per-second metrics, Netdata uses minimal resources.
@@ -288,6 +316,7 @@ No. Even with ML and per-second metrics, Netdata uses minimal resources.
How much data retention is possible?
+
As much as your disk allows.
@@ -302,6 +331,7 @@ These are queried automatically based on the zoom level.
Can Netdata scale to many servers?
+
Yes. With Netdata you can:
@@ -315,6 +345,7 @@ Yes. With Netdata you can:
Is disk I/O a concern?
+
No. Netdata minimizes disk usage:
@@ -328,6 +359,7 @@ No. Netdata minimizes disk usage:
How is Netdata different from Prometheus + Grafana?
+
With Netdata you get a complete monitoring solution—not just tools.
@@ -341,6 +373,7 @@ With Netdata you get a complete monitoring solution—not just tools.
How is Netdata different from commercial SaaS tools?
+
With Netdata you can store all metrics on your infrastructure—no sampling, no aggregation, no loss.
@@ -352,6 +385,7 @@ With Netdata you can store all metrics on your infrastructure—no sampling, no
Can Netdata run alongside Nagios, Zabbix, etc.?
+
Yes. You can use Netdata together with traditional tools.
@@ -365,6 +399,7 @@ With Netdata you get:
What if I feel overwhelmed?
+
You can start small:
@@ -378,6 +413,7 @@ You can start small:
Do I have to use Netdata Cloud?
+
No. Netdata Cloud is optional.
@@ -392,6 +428,7 @@ Netdata works without it, but with Cloud you can:
What telemetry does Netdata collect?
+
Anonymous telemetry helps improve the product. You can disable it:
@@ -404,6 +441,7 @@ Anonymous telemetry helps improve the product. You can disable it:
Who uses Netdata?
+
You'll join users including:
From 5af2aba6d1523ca1db34ffa31d5080215310e2b9 Mon Sep 17 00:00:00 2001
From: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
Date: Fri, 9 May 2025 20:05:47 +0300
Subject: [PATCH 24/51] Schedule journal file indexing after database file
rotation (#20264)
* Add indexing flag to track and enqueue journal indexing after database rotation
* Remove unused job
---
src/daemon/libuv_workers.c | 1 -
src/daemon/libuv_workers.h | 1 -
src/database/engine/rrdengine.c | 6 +++++-
src/database/engine/rrdengine.h | 1 +
4 files changed, 6 insertions(+), 3 deletions(-)
diff --git a/src/daemon/libuv_workers.c b/src/daemon/libuv_workers.c
index 079325e99d8c6d..5288a111316852 100644
--- a/src/daemon/libuv_workers.c
+++ b/src/daemon/libuv_workers.c
@@ -26,7 +26,6 @@ static void register_libuv_worker_jobs_internal(void) {
worker_register_job_name(UV_EVENT_DBENGINE_FLUSHED_TO_OPEN, "flushed to open");
// datafile full
- worker_register_job_name(UV_EVENT_DBENGINE_JOURNAL_INDEX_WAIT, "jv2 index wait");
worker_register_job_name(UV_EVENT_DBENGINE_JOURNAL_INDEX, "jv2 indexing");
// db rotation related
diff --git a/src/daemon/libuv_workers.h b/src/daemon/libuv_workers.h
index 695696dd92f0c4..970e52d5de8790 100644
--- a/src/daemon/libuv_workers.h
+++ b/src/daemon/libuv_workers.h
@@ -24,7 +24,6 @@ enum event_loop_job {
UV_EVENT_DBENGINE_FLUSHED_TO_OPEN,
// datafile full
- UV_EVENT_DBENGINE_JOURNAL_INDEX_WAIT,
UV_EVENT_DBENGINE_JOURNAL_INDEX,
// db rotation related
diff --git a/src/database/engine/rrdengine.c b/src/database/engine/rrdengine.c
index 7a9a9ddb57f602..0ad8127e79d4fe 100644
--- a/src/database/engine/rrdengine.c
+++ b/src/database/engine/rrdengine.c
@@ -919,6 +919,8 @@ static void *extent_write_tp_worker(
static void after_database_rotate(struct rrdengine_instance *ctx __maybe_unused, void *data __maybe_unused, struct completion *completion __maybe_unused, uv_work_t* req __maybe_unused, int status __maybe_unused) {
__atomic_store_n(&ctx->atomic.now_deleting_files, false, __ATOMIC_RELAXED);
+ if (__atomic_load_n(&ctx->atomic.needs_indexing, __ATOMIC_RELAXED))
+ rrdeng_enq_cmd(ctx, RRDENG_OPCODE_JOURNAL_INDEX, NULL, NULL, STORAGE_PRIORITY_INTERNAL_DBENGINE, NULL, NULL);
}
struct uuid_first_time_s {
@@ -1675,7 +1677,6 @@ NOT_INLINE_HOT void pdc_route_synchronously_first(struct rrdengine_instance *ctx
static void *journal_v2_indexing_tp_worker(struct rrdengine_instance *ctx __maybe_unused, void *data __maybe_unused, struct completion *completion __maybe_unused, uv_work_t *uv_work_req __maybe_unused) {
unsigned count = 0;
- worker_is_busy(UV_EVENT_DBENGINE_JOURNAL_INDEX_WAIT);
struct rrdengine_datafile *datafile = ctx->datafiles.first;
worker_is_busy(UV_EVENT_DBENGINE_JOURNAL_INDEX);
@@ -2180,8 +2181,11 @@ void dbengine_event_loop(void* arg) {
struct rrdengine_datafile *datafile = cmd.data;
if (NOT_INDEXING_OR_DELETING_FILES(ctx) && ctx_is_available_for_queries(ctx)) {
__atomic_store_n(&ctx->atomic.migration_to_v2_running, true, __ATOMIC_RELAXED);
+ __atomic_store_n(&ctx->atomic.needs_indexing, false, __ATOMIC_RELAXED);
work_dispatch(ctx, datafile, NULL, opcode, journal_v2_indexing_tp_worker, after_journal_v2_indexing);
}
+ else
+ __atomic_store_n(&ctx->atomic.needs_indexing, true, __ATOMIC_RELAXED);
break;
}
diff --git a/src/database/engine/rrdengine.h b/src/database/engine/rrdengine.h
index b73d712291b65c..6f81568f19f5df 100644
--- a/src/database/engine/rrdengine.h
+++ b/src/database/engine/rrdengine.h
@@ -396,6 +396,7 @@ struct rrdengine_instance {
PAD64(bool) migration_to_v2_running;
PAD64(bool) now_deleting_files;
+ PAD64(bool) needs_indexing;
PAD64(unsigned) extents_currently_being_flushed; // non-zero until we commit data to disk (both datafile and journal file)
PAD64(time_t) first_time_s;
From 990ec09795a2dd060af3b09a4c3c7b20dc3971e4 Mon Sep 17 00:00:00 2001
From: netdatabot
Date: Sat, 10 May 2025 00:22:06 +0000
Subject: [PATCH 25/51] [ci skip] Update changelog and version for nightly
build: v2.5.0-25-nightly.
---
CHANGELOG.md | 9 ++++-----
packaging/version | 2 +-
2 files changed, 5 insertions(+), 6 deletions(-)
diff --git a/CHANGELOG.md b/CHANGELOG.md
index 97764db4da9cb7..6ff935c219d138 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -6,6 +6,10 @@
**Merged pull requests:**
+- Update Netdata README with improved structure [\#20265](https://github.com/netdata/netdata/pull/20265) ([kanelatechnical](https://github.com/kanelatechnical))
+- Schedule journal file indexing after database file rotation [\#20264](https://github.com/netdata/netdata/pull/20264) ([stelfrag](https://github.com/stelfrag))
+- Minor fixes [\#20263](https://github.com/netdata/netdata/pull/20263) ([stelfrag](https://github.com/stelfrag))
+- fix\(go.d/mysql\): fix MariaDB User CPU Time [\#20262](https://github.com/netdata/netdata/pull/20262) ([ilyam8](https://github.com/ilyam8))
- docs: reword go.d Troubleshooting section for clarity [\#20259](https://github.com/netdata/netdata/pull/20259) ([ilyam8](https://github.com/ilyam8))
- Clearify the path of `plugins.d/go.d.plugin` in docs [\#20258](https://github.com/netdata/netdata/pull/20258) ([n0099](https://github.com/n0099))
- Update documentation for native DEB/RPM packages [\#20257](https://github.com/netdata/netdata/pull/20257) ([kanelatechnical](https://github.com/kanelatechnical))
@@ -470,11 +474,6 @@
- handle flushing state during exit [\#19725](https://github.com/netdata/netdata/pull/19725) ([ktsaou](https://github.com/ktsaou))
- allow configuring journal v2 unmount time; turn it off for parents [\#19724](https://github.com/netdata/netdata/pull/19724) ([ktsaou](https://github.com/ktsaou))
- minor status file annotation fixes [\#19723](https://github.com/netdata/netdata/pull/19723) ([ktsaou](https://github.com/ktsaou))
-- status has install type [\#19722](https://github.com/netdata/netdata/pull/19722) ([ktsaou](https://github.com/ktsaou))
-- more status file annotations [\#19721](https://github.com/netdata/netdata/pull/19721) ([ktsaou](https://github.com/ktsaou))
-- feat\(go.d\): add snmp devices discovery [\#19720](https://github.com/netdata/netdata/pull/19720) ([ilyam8](https://github.com/ilyam8))
-- save status on out of memory event [\#19719](https://github.com/netdata/netdata/pull/19719) ([ktsaou](https://github.com/ktsaou))
-- attempt to save status file from the signal handler [\#19718](https://github.com/netdata/netdata/pull/19718) ([ktsaou](https://github.com/ktsaou))
## [v2.2.6](https://github.com/netdata/netdata/tree/v2.2.6) (2025-02-20)
diff --git a/packaging/version b/packaging/version
index b7e831214ee891..c5bfaf8b595c90 100644
--- a/packaging/version
+++ b/packaging/version
@@ -1 +1 @@
-v2.5.0-20-nightly
+v2.5.0-25-nightly
From 4f32427bf3a24469420d33096b540ca1e75392d7 Mon Sep 17 00:00:00 2001
From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com>
Date: Mon, 12 May 2025 08:16:21 +0300
Subject: [PATCH 26/51] build(deps): bump golang.org/x/net from 0.39.0 to
0.40.0 in /src/go (#20270)
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
---
src/go/go.mod | 12 ++++++------
src/go/go.sum | 24 ++++++++++++------------
2 files changed, 18 insertions(+), 18 deletions(-)
diff --git a/src/go/go.mod b/src/go/go.mod
index c96c8297af8172..2e6c87841d1ddd 100644
--- a/src/go/go.mod
+++ b/src/go/go.mod
@@ -52,8 +52,8 @@ require (
github.com/vmware/govmomi v0.50.0
go.mongodb.org/mongo-driver v1.17.3
go.uber.org/automaxprocs v1.6.0
- golang.org/x/net v0.39.0
- golang.org/x/text v0.24.0
+ golang.org/x/net v0.40.0
+ golang.org/x/text v0.25.0
golang.zx2c4.com/wireguard/wgctrl v0.0.0-20220504211119-3d4a969bb56b
gopkg.in/ini.v1 v1.67.0
gopkg.in/rethinkdb/rethinkdb-go.v6 v6.2.2
@@ -151,12 +151,12 @@ require (
go.opentelemetry.io/otel/metric v1.34.0 // indirect
go.opentelemetry.io/otel/trace v1.34.0 // indirect
go.uber.org/multierr v1.11.0 // indirect
- golang.org/x/crypto v0.37.0 // indirect
+ golang.org/x/crypto v0.38.0 // indirect
golang.org/x/mod v0.23.0 // indirect
golang.org/x/oauth2 v0.27.0 // indirect
- golang.org/x/sync v0.13.0 // indirect
- golang.org/x/sys v0.32.0 // indirect
- golang.org/x/term v0.31.0 // indirect
+ golang.org/x/sync v0.14.0 // indirect
+ golang.org/x/sys v0.33.0 // indirect
+ golang.org/x/term v0.32.0 // indirect
golang.org/x/time v0.9.0 // indirect
golang.org/x/tools v0.30.0 // indirect
golang.zx2c4.com/wireguard v0.0.0-20230325221338-052af4a8072b // indirect
diff --git a/src/go/go.sum b/src/go/go.sum
index 5183729bd1b760..3e089e83c2212b 100644
--- a/src/go/go.sum
+++ b/src/go/go.sum
@@ -508,8 +508,8 @@ golang.org/x/crypto v0.0.0-20201203163018-be400aefbc4c/go.mod h1:jdWPYTVW3xRLrWP
golang.org/x/crypto v0.0.0-20210616213533-5ff15b29337e/go.mod h1:GvvjBRRGRdwPK5ydBHafDWAxML/pGHZbMvKqRZ5+Abc=
golang.org/x/crypto v0.0.0-20210711020723-a769d52b0f97/go.mod h1:GvvjBRRGRdwPK5ydBHafDWAxML/pGHZbMvKqRZ5+Abc=
golang.org/x/crypto v0.0.0-20210921155107-089bfa567519/go.mod h1:GvvjBRRGRdwPK5ydBHafDWAxML/pGHZbMvKqRZ5+Abc=
-golang.org/x/crypto v0.37.0 h1:kJNSjF/Xp7kU0iB2Z+9viTPMW4EqqsrywMXLJOOsXSE=
-golang.org/x/crypto v0.37.0/go.mod h1:vg+k43peMZ0pUMhYmVAWysMK35e6ioLh3wB8ZCAfbVc=
+golang.org/x/crypto v0.38.0 h1:jt+WWG8IZlBnVbomuhg2Mdq0+BBQaHbtqHEFEigjUV8=
+golang.org/x/crypto v0.38.0/go.mod h1:MvrbAqul58NNYPKnOra203SB9vpuZW0e+RRZV+Ggqjw=
golang.org/x/lint v0.0.0-20190930215403-16217165b5de/go.mod h1:6SW0HCj/g11FgYtHlgUYUwCkIfeOF89ocIRzGO/8vkc=
golang.org/x/mod v0.0.0-20190513183733-4bf6d317e70e/go.mod h1:mXi4GBBbnImb6dmsKGUJ2LatrhH/nqhxcFungHvyanc=
golang.org/x/mod v0.1.1-0.20191105210325-c90efee705ee/go.mod h1:QqPTAvyqsEbceGzBzNggFXnrqF1CaUcvgkdR5Ot7KZg=
@@ -529,8 +529,8 @@ golang.org/x/net v0.0.0-20201021035429-f5854403a974/go.mod h1:sp8m0HH+o8qH0wwXwY
golang.org/x/net v0.0.0-20210226172049-e18ecbb05110/go.mod h1:m0MpNAwzfU5UDzcl9v0D8zg8gWTRqZa9RBIspLL5mdg=
golang.org/x/net v0.0.0-20210405180319-a5a99cb37ef4/go.mod h1:p54w0d4576C0XHj96bSt6lcn1PtDYWL6XObtHCRCNQM=
golang.org/x/net v0.0.0-20220722155237-a158d28d115b/go.mod h1:XRhObCWvk6IyKnWLug+ECip1KBveYUHfp+8e9klMJ9c=
-golang.org/x/net v0.39.0 h1:ZCu7HMWDxpXpaiKdhzIfaltL9Lp31x/3fCP11bc6/fY=
-golang.org/x/net v0.39.0/go.mod h1:X7NRbYVEA+ewNkCNyJ513WmMdQ3BineSwVtN2zD/d+E=
+golang.org/x/net v0.40.0 h1:79Xs7wF06Gbdcg4kdCCIQArK11Z1hr5POQ6+fIYHNuY=
+golang.org/x/net v0.40.0/go.mod h1:y0hY0exeL2Pku80/zKK7tpntoX23cqL3Oa6njdgRtds=
golang.org/x/oauth2 v0.27.0 h1:da9Vo7/tDv5RH/7nZDz1eMGS/q1Vv1N/7FCrBhI9I3M=
golang.org/x/oauth2 v0.27.0/go.mod h1:onh5ek6nERTohokkhCD/y2cV4Do3fxFHFuAejCkRWT8=
golang.org/x/sync v0.0.0-20180314180146-1d60e4601c6f/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
@@ -539,8 +539,8 @@ golang.org/x/sync v0.0.0-20190911185100-cd5d95a43a6e/go.mod h1:RxMgew5VJxzue5/jJ
golang.org/x/sync v0.0.0-20201020160332-67f06af15bc9/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.0.0-20210220032951-036812b2e83c/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.0.0-20220722155255-886fb9371eb4/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
-golang.org/x/sync v0.13.0 h1:AauUjRAJ9OSnvULf/ARrrVywoJDy0YS2AwQ98I37610=
-golang.org/x/sync v0.13.0/go.mod h1:1dzgHSNfp02xaA81J2MS99Qcpr2w7fw1gpm99rleRqA=
+golang.org/x/sync v0.14.0 h1:woo0S4Yywslg6hp4eUFjTVOyKt0RookbpAHG4c1HmhQ=
+golang.org/x/sync v0.14.0/go.mod h1:1dzgHSNfp02xaA81J2MS99Qcpr2w7fw1gpm99rleRqA=
golang.org/x/sys v0.0.0-20180905080454-ebe1bf3edb33/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
golang.org/x/sys v0.0.0-20180909124046-d0be0721c37e/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
golang.org/x/sys v0.0.0-20190215142949-d0b11bdaac8a/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
@@ -563,13 +563,13 @@ golang.org/x/sys v0.0.0-20220520151302-bc2c85ada10a/go.mod h1:oPkhp1MJrh7nUepCBc
golang.org/x/sys v0.0.0-20220715151400-c0bba94af5f8/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.0.0-20220722155257-8c9f86f7a55f/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.6.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
-golang.org/x/sys v0.32.0 h1:s77OFDvIQeibCmezSnk/q6iAfkdiQaJi4VzroCFrN20=
-golang.org/x/sys v0.32.0/go.mod h1:BJP2sWEmIv4KK5OTEluFJCKSidICx8ciO85XgH3Ak8k=
+golang.org/x/sys v0.33.0 h1:q3i8TbbEz+JRD9ywIRlyRAQbM0qF7hu24q3teo2hbuw=
+golang.org/x/sys v0.33.0/go.mod h1:BJP2sWEmIv4KK5OTEluFJCKSidICx8ciO85XgH3Ak8k=
golang.org/x/term v0.0.0-20201117132131-f5c789dd3221/go.mod h1:Nr5EML6q2oocZ2LXRh80K7BxOlk5/8JxuGnuhpl+muw=
golang.org/x/term v0.0.0-20201126162022-7de9c90e9dd1/go.mod h1:bj7SfCRtBDWHUb9snDiAeCFNEtKQo2Wmx5Cou7ajbmo=
golang.org/x/term v0.0.0-20210927222741-03fcf44c2211/go.mod h1:jbD1KX2456YbFQfuXm/mYQcufACuNUgVhRMnK/tPxf8=
-golang.org/x/term v0.31.0 h1:erwDkOK1Msy6offm1mOgvspSkslFnIGsFnxOKoufg3o=
-golang.org/x/term v0.31.0/go.mod h1:R4BeIy7D95HzImkxGkTW1UQTtP54tio2RyHz7PwK0aw=
+golang.org/x/term v0.32.0 h1:DR4lr0TjUs3epypdhTOkMmuF5CDFJ/8pOnbzMZPQ7bg=
+golang.org/x/term v0.32.0/go.mod h1:uZG1FhGx848Sqfsq4/DlJr3xGGsYMu/L5GW4abiaEPQ=
golang.org/x/text v0.3.0/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ=
golang.org/x/text v0.3.2/go.mod h1:bEr9sfX3Q8Zfm5fL9x+3itogRgK3+ptLWKqgva+5dAk=
golang.org/x/text v0.3.3/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ=
@@ -577,8 +577,8 @@ golang.org/x/text v0.3.4/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ=
golang.org/x/text v0.3.6/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ=
golang.org/x/text v0.3.7/go.mod h1:u+2+/6zg+i71rQMx5EYifcz6MCKuco9NR6JIITiCfzQ=
golang.org/x/text v0.3.8/go.mod h1:E6s5w1FMmriuDzIBO73fBruAKo1PCIq6d2Q6DHfQ8WQ=
-golang.org/x/text v0.24.0 h1:dd5Bzh4yt5KYA8f9CJHCP4FB4D51c2c6JvN37xJJkJ0=
-golang.org/x/text v0.24.0/go.mod h1:L8rBsPeo2pSS+xqN0d5u2ikmjtmoJbDBT1b7nHvFCdU=
+golang.org/x/text v0.25.0 h1:qVyWApTSYLk/drJRO5mDlNYskwQznZmkpV2c8q9zls4=
+golang.org/x/text v0.25.0/go.mod h1:WEdwpYrmk1qmdHvhkSTNPm3app7v4rsT8F2UD6+VHIA=
golang.org/x/time v0.9.0 h1:EsRrnYcQiGH+5FfbgvV4AP7qEZstoyrHB0DzarOQ4ZY=
golang.org/x/time v0.9.0/go.mod h1:3BpzKBy/shNhVucY/MWOyx10tF3SFh9QdLuxbVysPQM=
golang.org/x/tools v0.0.0-20180917221912-90fa682c2a6e/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ=
From 75f7064ae04c05d3d06d59101095a218a08efaec Mon Sep 17 00:00:00 2001
From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com>
Date: Mon, 12 May 2025 08:20:34 +0300
Subject: [PATCH 27/51] build(deps): bump github.com/miekg/dns from 1.1.65 to
1.1.66 in /src/go (#20268)
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
---
src/go/go.mod | 6 +++---
src/go/go.sum | 12 ++++++------
2 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/src/go/go.mod b/src/go/go.mod
index 2e6c87841d1ddd..5443fe924f961d 100644
--- a/src/go/go.mod
+++ b/src/go/go.mod
@@ -38,7 +38,7 @@ require (
github.com/lmittmann/tint v1.0.7
github.com/mattn/go-isatty v0.0.20
github.com/mattn/go-xmlrpc v0.0.3
- github.com/miekg/dns v1.1.65
+ github.com/miekg/dns v1.1.66
github.com/mitchellh/go-homedir v1.1.0
github.com/prometheus-community/pro-bing v0.7.0
github.com/prometheus/common v0.63.0
@@ -152,13 +152,13 @@ require (
go.opentelemetry.io/otel/trace v1.34.0 // indirect
go.uber.org/multierr v1.11.0 // indirect
golang.org/x/crypto v0.38.0 // indirect
- golang.org/x/mod v0.23.0 // indirect
+ golang.org/x/mod v0.24.0 // indirect
golang.org/x/oauth2 v0.27.0 // indirect
golang.org/x/sync v0.14.0 // indirect
golang.org/x/sys v0.33.0 // indirect
golang.org/x/term v0.32.0 // indirect
golang.org/x/time v0.9.0 // indirect
- golang.org/x/tools v0.30.0 // indirect
+ golang.org/x/tools v0.32.0 // indirect
golang.zx2c4.com/wireguard v0.0.0-20230325221338-052af4a8072b // indirect
google.golang.org/protobuf v1.36.5 // indirect
gopkg.in/cenkalti/backoff.v2 v2.2.1 // indirect
diff --git a/src/go/go.sum b/src/go/go.sum
index 3e089e83c2212b..ad79aed7fc7ec4 100644
--- a/src/go/go.sum
+++ b/src/go/go.sum
@@ -320,8 +320,8 @@ github.com/mdlayher/netlink v1.7.2 h1:/UtM3ofJap7Vl4QWCPDGXY8d3GIY2UGSDbK+QWmY8/
github.com/mdlayher/netlink v1.7.2/go.mod h1:xraEF7uJbxLhc5fpHL4cPe221LI2bdttWlU+ZGLfQSw=
github.com/mdlayher/socket v0.4.1 h1:eM9y2/jlbs1M615oshPQOHZzj6R6wMT7bX5NPiQvn2U=
github.com/mdlayher/socket v0.4.1/go.mod h1:cAqeGjoufqdxWkD7DkpyS+wcefOtmu5OQ8KuoJGIReA=
-github.com/miekg/dns v1.1.65 h1:0+tIPHzUW0GCge7IiK3guGP57VAw7hoPDfApjkMD1Fc=
-github.com/miekg/dns v1.1.65/go.mod h1:Dzw9769uoKVaLuODMDZz9M6ynFU6Em65csPuoi8G0ck=
+github.com/miekg/dns v1.1.66 h1:FeZXOS3VCVsKnEAd+wBkjMC3D2K+ww66Cq3VnCINuJE=
+github.com/miekg/dns v1.1.66/go.mod h1:jGFzBsSNbJw6z1HYut1RKBKHA9PBdxeHrZG8J+gC2WE=
github.com/mikioh/ipaddr v0.0.0-20190404000644-d465c8ab6721 h1:RlZweED6sbSArvlE924+mUcZuXKLBHA35U7LN621Bws=
github.com/mikioh/ipaddr v0.0.0-20190404000644-d465c8ab6721/go.mod h1:Ickgr2WtCLZ2MDGd4Gr0geeCH5HybhRJbonOgQpvSxc=
github.com/mitchellh/copystructure v1.2.0 h1:vpKXTN4ewci03Vljg/q9QvCGUDttBOGBIa15WveJJGw=
@@ -517,8 +517,8 @@ golang.org/x/mod v0.2.0/go.mod h1:s0Qsj1ACt9ePp/hMypM3fl4fZqREWJwdYDEqhRiZZUA=
golang.org/x/mod v0.3.0/go.mod h1:s0Qsj1ACt9ePp/hMypM3fl4fZqREWJwdYDEqhRiZZUA=
golang.org/x/mod v0.4.2/go.mod h1:s0Qsj1ACt9ePp/hMypM3fl4fZqREWJwdYDEqhRiZZUA=
golang.org/x/mod v0.6.0-dev.0.20220419223038-86c51ed26bb4/go.mod h1:jJ57K6gSWd91VN4djpZkiMVwK6gcyfeH4XE8wZrZaV4=
-golang.org/x/mod v0.23.0 h1:Zb7khfcRGKk+kqfxFaP5tZqCnDZMjC5VtUBs87Hr6QM=
-golang.org/x/mod v0.23.0/go.mod h1:6SkKJ3Xj0I0BrPOZoBy3bdMptDDU9oJrpohJ3eWZ1fY=
+golang.org/x/mod v0.24.0 h1:ZfthKaKaT4NrhGVZHO1/WDTwGES4De8KtWO0SIbNJMU=
+golang.org/x/mod v0.24.0/go.mod h1:IXM97Txy2VM4PJ3gI61r1YEk/gAj6zAHN3AdZt6S9Ww=
golang.org/x/net v0.0.0-20180906233101-161cd47e91fd/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4=
golang.org/x/net v0.0.0-20190311183353-d8887717615a/go.mod h1:t9HGtf8HONx5eT2rtn7q6eTqICYqUVnKs3thJo3Qplg=
golang.org/x/net v0.0.0-20190404232315-eb5bcb51f2a3/go.mod h1:t9HGtf8HONx5eT2rtn7q6eTqICYqUVnKs3thJo3Qplg=
@@ -594,8 +594,8 @@ golang.org/x/tools v0.0.0-20200619180055-7c47624df98f/go.mod h1:EkVYQZoAsY45+roY
golang.org/x/tools v0.0.0-20210106214847-113979e3529a/go.mod h1:emZCQorbCU4vsT4fOWvOPXz4eW1wZW4PmDk9uLelYpA=
golang.org/x/tools v0.1.1/go.mod h1:o0xws9oXOQQZyjljx8fwUC0k7L1pTE6eaCbjGeHmOkk=
golang.org/x/tools v0.1.12/go.mod h1:hNGJHUnrk76NpqgfD5Aqm5Crs+Hm0VOH/i9J2+nxYbc=
-golang.org/x/tools v0.30.0 h1:BgcpHewrV5AUp2G9MebG4XPFI1E2W41zU1SaqVA9vJY=
-golang.org/x/tools v0.30.0/go.mod h1:c347cR/OJfw5TI+GfX7RUPNMdDRRbjvYTS0jPyvsVtY=
+golang.org/x/tools v0.32.0 h1:Q7N1vhpkQv7ybVzLFtTjvQya2ewbwNDZzUgfXGqtMWU=
+golang.org/x/tools v0.32.0/go.mod h1:ZxrU41P/wAbZD8EDa6dDCa6XfpkhJ7HFMjHJXfBDu8s=
golang.org/x/xerrors v0.0.0-20190410155217-1f06c39b4373/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
golang.org/x/xerrors v0.0.0-20190513163551-3ee3066db522/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
golang.org/x/xerrors v0.0.0-20190717185122-a985d3407aa7/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
From 208131592c3547f3a7cf2d7d4fcc26bee5f6576a Mon Sep 17 00:00:00 2001
From: Fotis Voutsas
Date: Mon, 12 May 2025 14:44:56 +0300
Subject: [PATCH 28/51] SNMP first cisco yaml file pass (#20246)
* first cisco yaml pass
* changes and another file
* changes to families
* changes to families
* basic changes requested
* basic changes requested
* review
* review
* review
* review
---
.../default/_base_cisco_voice.yaml | 5 -----
.../go.d/snmp.profiles/default/_cisco-asa.yaml | 17 +++++++++++++----
.../snmp.profiles/default/_cisco-catalyst.yaml | 7 ++++++-
.../default/_cisco-cpu-memory.yaml | 3 +++
4 files changed, 22 insertions(+), 10 deletions(-)
delete mode 100644 src/go/plugin/go.d/config/go.d/snmp.profiles/default/_base_cisco_voice.yaml
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/_base_cisco_voice.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/_base_cisco_voice.yaml
deleted file mode 100644
index 552683a57faf2f..00000000000000
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/_base_cisco_voice.yaml
+++ /dev/null
@@ -1,5 +0,0 @@
-# Backward compatibility shim, in case users referenced this in their own profiles before the rename.
-extends:
- - _base.yaml
- - _cisco-generic.yaml
- - _cisco-voice.yaml
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/_cisco-asa.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/_cisco-asa.yaml
index 9f38acc3ca4647..ff379c37eb4040 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/_cisco-asa.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/_cisco-asa.yaml
@@ -1,5 +1,4 @@
# Profile for Cisco ASA devices
-
extends:
- _cisco-generic.yaml
@@ -12,7 +11,9 @@ metrics:
- OID: 1.3.6.1.4.1.9.9.147.1.2.2.2.1.5
name: cfwConnectionStatValue
description: Current status of the resource statistic
- unit: "{statistic}"
+ unit: "{status}"
+ dims_for_status: true
+ family: "Firewall/Statistics"
metric_tags:
- index: 1
tag: service_type
@@ -26,6 +27,7 @@ metrics:
name: crasNumDeclinedSessions
description: Number of session setup attempts declined due to authentication or authorization failure
unit: "{session}"
+ family: "Firewall/Sessions"
- MIB: CISCO-REMOTE-ACCESS-MONITOR-MIB
symbol:
# num sessions
@@ -33,6 +35,7 @@ metrics:
name: crasNumSessions
description: Number of currently active sessions
unit: "{session}"
+ family: "Firewall/Sessions"
- MIB: CISCO-REMOTE-ACCESS-MONITOR-MIB
symbol:
# num users
@@ -40,6 +43,7 @@ metrics:
name: crasNumUsers
description: Number of users who have active sessions
unit: "{user}"
+ family: "Firewall/Users"
- MIB: CISCO-REMOTE-ACCESS-MONITOR-MIB
metric_type: monotonic_count
symbol:
@@ -47,13 +51,15 @@ metrics:
OID: 1.3.6.1.4.1.9.9.392.1.4.1.3.0
name: crasNumSetupFailInsufResources
description: Number of session setup attempts failed due to insufficient resources
- unit: "{session}"
+ unit: "{failure}"
+ family: "Firewall/Sessions"
- MIB: CISCO-IPSEC-FLOW-MONITOR-MIB
symbol:
OID: 1.3.6.1.4.1.9.9.171.1.3.1.1.0
name: cipSecGlobalActiveTunnels
description: Number of currently active IPsec Phase-2 Tunnels
unit: "{tunnel}"
+ family: "Firewall/IPsec"
- MIB: CISCO-IPSEC-FLOW-MONITOR-MIB
metric_type: monotonic_count
symbol:
@@ -61,6 +67,7 @@ metrics:
name: cipSecGlobalHcInOctets
description: High capacity count of total octets received by all current and previous IPsec Phase-2 Tunnels
unit: "By"
+ family: "Firewall/IPsec"
- MIB: CISCO-IPSEC-FLOW-MONITOR-MIB
metric_type: monotonic_count
symbol:
@@ -68,6 +75,7 @@ metrics:
name: cipSecGlobalHcOutOctets
description: High capacity count of total octets sent by all current and previous IPsec Phase-2 Tunnels
unit: "By"
+ family: "Firewall/IPsec"
- MIB: ENTITY-SENSOR-MIB
table:
OID: 1.3.6.1.2.1.99.1.1
@@ -76,7 +84,8 @@ metrics:
- OID: 1.3.6.1.2.1.99.1.1.1.4
name: entPhySensorValue
description: Most recent measurement obtained by the agent for this sensor
- unit: "TBD"
+ unit: "1"
+ family: "Hardware/Sensors"
metric_tags:
- symbol:
OID: 1.3.6.1.2.1.99.1.1.1.1
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/_cisco-catalyst.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/_cisco-catalyst.yaml
index 38ecf89e1c26c9..705c2f5ea46ca4 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/_cisco-catalyst.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/_cisco-catalyst.yaml
@@ -28,7 +28,8 @@ metrics:
- OID: 1.3.6.1.4.1.9.9.91.1.1.1.1.4
name: entSensorValue
description: "The most recent measurement seen by the sensor"
- unit: "TBD"
+ unit: "1"
+ family: "Switches/Sensors"
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.9.9.91.1.1.1.1.1
@@ -46,18 +47,22 @@ metrics:
name: cieIfLastInTime
description: "Elapsed time in milliseconds since last protocol input packet was received"
unit: "ms"
+ family: "Network/Interfaces/Packets"
- OID: 1.3.6.1.4.1.9.9.276.1.1.1.1.2
name: cieIfLastOutTime
description: "Elapsed time in milliseconds since last protocol output packet was transmitted"
unit: "ms"
+ family: "Network/Interfaces/Packets"
- OID: 1.3.6.1.4.1.9.9.276.1.1.1.1.10
name: cieIfInputQueueDrops
description: "Number of input packets which were dropped"
unit: "{packet}"
+ family: "Network/Interfaces/Packets"
- OID: 1.3.6.1.4.1.9.9.276.1.1.1.1.11
name: cieIfOutputQueueDrops
description: "Number of output packets dropped by the interface"
unit: "{packet}"
+ family: "Network/Interfaces/Packets"
metric_tags:
- MIB: IF-MIB
symbol:
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/_cisco-cpu-memory.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/_cisco-cpu-memory.yaml
index a3458c914d79c7..1cb4df4d1527cb 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/_cisco-cpu-memory.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/_cisco-cpu-memory.yaml
@@ -10,6 +10,7 @@ metrics:
name: cpu.usage
description: The overall CPU busy percentage in the last 1 minute period
unit: "%"
+ family: "CPU"
metric_tags:
- index: 1 # cpmCPUTotalIndex
tag: cpu
@@ -23,10 +24,12 @@ metrics:
name: memory.used
description: Indicates the number of bytes from the memory pool that are currently in use by applications on the managed device
unit: "By"
+ family: "Memory"
- OID: 1.3.6.1.4.1.9.9.48.1.1.1.6 # ciscoMemoryPoolFree
name: memory.free
description: Indicates the number of bytes from the memory pool that are currently unused on the managed device. Note that the sum of ciscoMemoryPoolUsed and ciscoMemoryPoolFree is the total amount of memory in the pool
unit: "By"
+ family: "Memory"
metric_tags:
- index: 1 # ciscoMemoryPoolType
tag: mem
From 897c2ba9d1ab7585f27bb19f27095c1344490ea3 Mon Sep 17 00:00:00 2001
From: Ilya Mashchenko
Date: Mon, 12 May 2025 16:20:00 +0300
Subject: [PATCH 29/51] chore(go.d/snmp): small cleanup snmp profiles code
(#20274)
---
.../sd/discoverer/snmpsd/discoverer_test.go | 1 +
.../discovery/sd/discoverer/snmpsd/sysinfo.go | 2 +
src/go/plugin/go.d/collector/snmp/collect.go | 91 ++--------
.../go.d/collector/snmp/collect_if_mib.go | 7 +-
.../go.d/collector/snmp/collect_profiles.go | 165 ++++++++++++++++++
.../plugin/go.d/collector/snmp/collector.go | 3 +
.../go.d/collector/snmp/command_func.go | 42 -----
src/go/plugin/go.d/collector/snmp/helpers.go | 73 --------
.../snmp/{parsing.go => parse_profiles.go} | 136 ++++++++++++---
src/go/plugin/go.d/collector/snmp/profile.go | 88 ----------
src/go/plugin/go.d/collector/snmp/types.go | 155 ----------------
11 files changed, 293 insertions(+), 470 deletions(-)
create mode 100644 src/go/plugin/go.d/collector/snmp/collect_profiles.go
delete mode 100644 src/go/plugin/go.d/collector/snmp/command_func.go
delete mode 100644 src/go/plugin/go.d/collector/snmp/helpers.go
rename src/go/plugin/go.d/collector/snmp/{parsing.go => parse_profiles.go} (67%)
delete mode 100644 src/go/plugin/go.d/collector/snmp/profile.go
delete mode 100644 src/go/plugin/go.d/collector/snmp/types.go
diff --git a/src/go/plugin/go.d/agent/discovery/sd/discoverer/snmpsd/discoverer_test.go b/src/go/plugin/go.d/agent/discovery/sd/discoverer/snmpsd/discoverer_test.go
index 14c42ed863c01c..908e099800456f 100644
--- a/src/go/plugin/go.d/agent/discovery/sd/discoverer/snmpsd/discoverer_test.go
+++ b/src/go/plugin/go.d/agent/discovery/sd/discoverer/snmpsd/discoverer_test.go
@@ -274,6 +274,7 @@ func prepareNewTarget(sub subnet, ip string) *target {
Contact: mockSysContact,
Name: mockSysName,
Location: mockSysLocation,
+ SysObjectID: mockSysObject[1:],
Organization: "net-snmp",
})
}
diff --git a/src/go/plugin/go.d/agent/discovery/sd/discoverer/snmpsd/sysinfo.go b/src/go/plugin/go.d/agent/discovery/sd/discoverer/snmpsd/sysinfo.go
index 0917e7299194e0..88cb4b244dad23 100644
--- a/src/go/plugin/go.d/agent/discovery/sd/discoverer/snmpsd/sysinfo.go
+++ b/src/go/plugin/go.d/agent/discovery/sd/discoverer/snmpsd/sysinfo.go
@@ -25,6 +25,7 @@ type SysInfo struct {
Name string `json:"name"`
Location string `json:"location"`
Organization string `json:"organization"`
+ SysObjectID string `json:"-"`
}
func GetSysInfo(client gosnmp.Handler) (*SysInfo, error) {
@@ -52,6 +53,7 @@ func GetSysInfo(client gosnmp.Handler) (*SysInfo, error) {
var sysObj string
if sysObj, err = PduToString(pdu); err == nil {
si.Organization = LookupBySysObject(sysObj)
+ si.SysObjectID = sysObj
}
case OidSysContact:
si.Contact, err = PduToString(pdu)
diff --git a/src/go/plugin/go.d/collector/snmp/collect.go b/src/go/plugin/go.d/collector/snmp/collect.go
index 372d039c1e5fcc..ac338e9d66f069 100644
--- a/src/go/plugin/go.d/collector/snmp/collect.go
+++ b/src/go/plugin/go.d/collector/snmp/collect.go
@@ -6,7 +6,6 @@ import (
"errors"
"fmt"
"slices"
- "strings"
"github.com/google/uuid"
"github.com/gosnmp/gosnmp"
@@ -17,25 +16,6 @@ import (
)
func (c *Collector) collect() (map[string]int64, error) {
-
- mx := make(map[string]int64)
-
- if c.EnableProfiles {
- sysObjectID, err := c.getSysObjectID(snmpsd.OidSysObject)
- if err != nil {
- return nil, err
- }
-
- matchingProfiles := ddsnmp.Find(sysObjectID)
-
- metricMap, err := c.parseMetricsFromProfiles(matchingProfiles)
- if err != nil {
- return nil, err
- }
- seen := make(map[string]bool)
- c.makeChartsFromMetricMap(mx, metricMap, seen)
- }
-
if c.sysInfo == nil {
si, err := snmpsd.GetSysInfo(c.snmpClient)
if err != nil {
@@ -48,6 +28,16 @@ func (c *Collector) collect() (map[string]int64, error) {
if c.CreateVnode {
c.vnode = c.setupVnode(si)
}
+
+ if c.EnableProfiles {
+ c.snmpProfiles = ddsnmp.Find(c.sysInfo.SysObjectID)
+ }
+ }
+
+ mx := make(map[string]int64)
+
+ if err := c.collectProfiles(mx); err != nil {
+ return nil, err
}
if err := c.collectSysUptime(mx); err != nil {
@@ -69,45 +59,6 @@ func (c *Collector) collect() (map[string]int64, error) {
return mx, nil
}
-func (c *Collector) getSysObjectID(oid string) (string, error) {
- resp, err := c.snmpClient.Get([]string{oid})
- if err != nil {
- return "", err
- }
- return strings.Replace(resp.Variables[0].Value.(string), ".", "", 1), nil
-}
-
-func (c *Collector) makeChartsFromMetricMap(mx map[string]int64, metricMap map[string]processedMetric, seen map[string]bool) error {
- for _, metric := range metricMap {
- if metric.tableName == "" {
- switch s := metric.value.(type) {
- case int:
- name := metric.name
- if name == "" {
- continue
- }
-
- seen[name] = true
-
- if !c.seenMetrics[name] {
- c.seenMetrics[name] = true
- c.addSNMPChart(metric)
- }
-
- mx[metric.name] = int64(s)
- }
- }
-
- }
- for name := range c.seenMetrics {
- if !seen[name] {
- delete(c.seenMetrics, name)
- c.removeSNMPChart(name)
- }
- }
- return nil
-}
-
func (c *Collector) collectSysUptime(mx map[string]int64) error {
resp, err := c.snmpClient.Get([]string{snmpsd.OidSysUptime})
if err != nil {
@@ -167,28 +118,6 @@ func (c *Collector) setupVnode(si *snmpsd.SysInfo) *vnodes.VirtualNode {
}
}
-func pduToString(pdu gosnmp.SnmpPDU) (string, error) {
- switch pdu.Type {
- case gosnmp.OctetString:
- // TODO: this isn't reliable (e.g. physAddress we need hex.EncodeToString())
- bs, ok := pdu.Value.([]byte)
- if !ok {
- return "", fmt.Errorf("OctetString is not a []byte but %T", pdu.Value)
- }
- return strings.ToValidUTF8(string(bs), "�"), nil
- case gosnmp.Counter32, gosnmp.Counter64, gosnmp.Integer, gosnmp.Gauge32:
- return gosnmp.ToBigInt(pdu.Value).String(), nil
- case gosnmp.ObjectIdentifier:
- v, ok := pdu.Value.(string)
- if !ok {
- return "", fmt.Errorf("ObjectIdentifier is not a string but %T", pdu.Value)
- }
- return strings.TrimPrefix(v, "."), nil
- default:
- return "", fmt.Errorf("unussported type: '%v'", pdu.Type)
- }
-}
-
func pduToInt(pdu gosnmp.SnmpPDU) (int64, error) {
switch pdu.Type {
case gosnmp.Counter32, gosnmp.Counter64, gosnmp.Integer, gosnmp.Gauge32, gosnmp.TimeTicks:
diff --git a/src/go/plugin/go.d/collector/snmp/collect_if_mib.go b/src/go/plugin/go.d/collector/snmp/collect_if_mib.go
index ed04386d9a9be1..2c75834f9d2325 100644
--- a/src/go/plugin/go.d/collector/snmp/collect_if_mib.go
+++ b/src/go/plugin/go.d/collector/snmp/collect_if_mib.go
@@ -10,6 +10,7 @@ import (
"strings"
"github.com/netdata/netdata/go/plugins/logger"
+ "github.com/netdata/netdata/go/plugins/plugin/go.d/agent/discovery/sd/discoverer/snmpsd"
"github.com/gosnmp/gosnmp"
)
@@ -82,7 +83,7 @@ func (c *Collector) collectNetworkInterfaces(mx map[string]int64) error {
case oidIfIndex:
iface.ifIndex, err = pduToInt(pdu)
case oidIfDescr:
- iface.ifDescr, err = pduToString(pdu)
+ iface.ifDescr, err = snmpsd.PduToString(pdu)
case oidIfType:
iface.ifType, err = pduToInt(pdu)
case oidIfMtu:
@@ -116,7 +117,7 @@ func (c *Collector) collectNetworkInterfaces(mx map[string]int64) error {
case oidIfOutErrors:
iface.ifOutErrors, err = pduToInt(pdu)
case oidIfName:
- iface.ifName, err = pduToString(pdu)
+ iface.ifName, err = snmpsd.PduToString(pdu)
case oidIfInMulticastPkts:
iface.ifInMulticastPkts, err = pduToInt(pdu)
case oidIfInBroadcastPkts:
@@ -144,7 +145,7 @@ func (c *Collector) collectNetworkInterfaces(mx map[string]int64) error {
case oidIfHighSpeed:
iface.ifHighSpeed, err = pduToInt(pdu)
case oidIfAlias:
- iface.ifAlias, err = pduToString(pdu)
+ iface.ifAlias, err = snmpsd.PduToString(pdu)
default:
continue
}
diff --git a/src/go/plugin/go.d/collector/snmp/collect_profiles.go b/src/go/plugin/go.d/collector/snmp/collect_profiles.go
new file mode 100644
index 00000000000000..7562e9db374974
--- /dev/null
+++ b/src/go/plugin/go.d/collector/snmp/collect_profiles.go
@@ -0,0 +1,165 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+package snmp
+
+import (
+ "fmt"
+ "log"
+ "strings"
+
+ "github.com/gosnmp/gosnmp"
+)
+
+type processedMetric struct {
+ oid string
+ name string
+ value interface{}
+ metricType gosnmp.Asn1BER
+ tableName string
+ unit string
+ description string
+}
+
+func (c *Collector) collectProfiles(mx map[string]int64) error {
+ if len(c.snmpProfiles) == 0 {
+ return nil
+ }
+
+ metricMap := map[string]processedMetric{}
+
+ for _, prof := range c.snmpProfiles {
+ results, err := parseMetrics(prof.Definition.Metrics)
+ if err != nil {
+ return err
+ }
+
+ for _, oid := range results.OIDs {
+ response, err := c.snmpClient.Get([]string{oid})
+ if err != nil {
+ return err
+ }
+ for _, metric := range results.parsedMetrics {
+ switch s := metric.(type) {
+ case parsedSymbolMetric:
+ if s.baseoid == oid {
+ metricMap[oid] = processedMetric{
+ oid: oid,
+ name: s.name,
+ value: response.Variables[0].Value,
+ metricType: response.Variables[0].Type,
+ unit: s.unit,
+ description: s.description,
+ }
+ }
+ }
+ }
+ }
+
+ for _, oid := range results.nextOIDs {
+ if len(oid) == 0 {
+ continue
+ }
+
+ tableRows, err := c.walkOIDTree(oid)
+ if err != nil {
+ return fmt.Errorf("error walking OID tree: %v, oid %s", err, oid)
+ }
+
+ for _, metric := range results.parsedMetrics {
+ switch s := metric.(type) {
+ case parsedTableMetric:
+ if s.rowOID == oid {
+ for key, value := range tableRows {
+ value.name = s.name
+ value.tableName = s.tableName
+ tableRows[key] = value
+ }
+ metricMap = mergeProcessedMetricMaps(metricMap, tableRows)
+ }
+ }
+ }
+ }
+ }
+
+ c.makeChartsFromMetricMap(mx, metricMap)
+
+ return nil
+}
+
+func (c *Collector) walkOIDTree(baseOID string) (map[string]processedMetric, error) {
+ tableRows := make(map[string]processedMetric)
+
+ currentOID := baseOID
+ for {
+ result, err := c.snmpClient.GetNext([]string{currentOID})
+ if err != nil {
+ return tableRows, fmt.Errorf("snmpgetnext failed: %v", err)
+ }
+ if len(result.Variables) == 0 {
+ log.Println("No OID returned, ending walk.")
+ return tableRows, nil
+ }
+ pdu := result.Variables[0]
+
+ nextOID := strings.Replace(pdu.Name, ".", "", 1) //remove dot at the start of the OID
+
+ // If the next OID does not start with the base OID, we've reached the end of the subtree.
+ if !strings.HasPrefix(nextOID, baseOID) {
+ return tableRows, nil
+ }
+
+ metricType := pdu.Type
+ value := fmt.Sprintf("%v", pdu.Value)
+
+ tableRows[nextOID] = processedMetric{
+ oid: nextOID,
+ value: value,
+ metricType: metricType,
+ }
+
+ currentOID = nextOID
+ }
+}
+
+func (c *Collector) makeChartsFromMetricMap(mx map[string]int64, metricMap map[string]processedMetric) {
+ seen := make(map[string]bool)
+
+ for _, metric := range metricMap {
+ if metric.tableName == "" {
+ switch s := metric.value.(type) {
+ case int:
+ name := metric.name
+ if name == "" {
+ continue
+ }
+
+ seen[name] = true
+
+ if !c.seenMetrics[name] {
+ c.seenMetrics[name] = true
+ c.addSNMPChart(metric)
+ }
+
+ mx[metric.name] = int64(s)
+ }
+ }
+
+ }
+ for name := range c.seenMetrics {
+ if !seen[name] {
+ delete(c.seenMetrics, name)
+ c.removeSNMPChart(name)
+ }
+ }
+}
+
+func mergeProcessedMetricMaps(m1 map[string]processedMetric, m2 map[string]processedMetric) map[string]processedMetric {
+ merged := make(map[string]processedMetric)
+ for k, v := range m1 {
+ merged[k] = v
+ }
+ for key, value := range m2 {
+ merged[key] = value
+ }
+ return merged
+}
diff --git a/src/go/plugin/go.d/collector/snmp/collector.go b/src/go/plugin/go.d/collector/snmp/collector.go
index 353835359b1450..a8acd2c48cc0d7 100644
--- a/src/go/plugin/go.d/collector/snmp/collector.go
+++ b/src/go/plugin/go.d/collector/snmp/collector.go
@@ -12,6 +12,7 @@ import (
"github.com/netdata/netdata/go/plugins/plugin/go.d/agent/discovery/sd/discoverer/snmpsd"
"github.com/netdata/netdata/go/plugins/plugin/go.d/agent/module"
"github.com/netdata/netdata/go/plugins/plugin/go.d/agent/vnodes"
+ "github.com/netdata/netdata/go/plugins/plugin/go.d/collector/snmp/ddsnmp"
"github.com/gosnmp/gosnmp"
)
@@ -85,6 +86,8 @@ type Collector struct {
sysInfo *snmpsd.SysInfo
customOids []string
+
+ snmpProfiles []*ddsnmp.Profile
}
func (c *Collector) Configuration() any {
diff --git a/src/go/plugin/go.d/collector/snmp/command_func.go b/src/go/plugin/go.d/collector/snmp/command_func.go
deleted file mode 100644
index b73e1e17d4cd8a..00000000000000
--- a/src/go/plugin/go.d/collector/snmp/command_func.go
+++ /dev/null
@@ -1,42 +0,0 @@
-package snmp
-
-import (
- "fmt"
- "log"
- "strings"
-)
-
-func (c *Collector) walkOIDTree(baseOID string) (map[string]processedMetric, error) {
- tableRows := make(map[string]processedMetric)
-
- currentOID := baseOID
- for {
- result, err := c.snmpClient.GetNext([]string{currentOID})
- if err != nil {
- return tableRows, fmt.Errorf("snmpgetnext failed: %v", err)
- }
- if len(result.Variables) == 0 {
- log.Println("No OID returned, ending walk.")
- return tableRows, nil
- }
- pdu := result.Variables[0]
-
- nextOID := strings.Replace(pdu.Name, ".", "", 1) //remove dot at the start of the OID
-
- // If the next OID does not start with the base OID, we've reached the end of the subtree.
- if !strings.HasPrefix(nextOID, baseOID) {
- return tableRows, nil
- }
-
- metricType := pdu.Type
- value := fmt.Sprintf("%v", pdu.Value)
-
- tableRows[nextOID] = processedMetric{
- oid: nextOID,
- value: value,
- metricType: metricType,
- }
-
- currentOID = nextOID
- }
-}
diff --git a/src/go/plugin/go.d/collector/snmp/helpers.go b/src/go/plugin/go.d/collector/snmp/helpers.go
deleted file mode 100644
index 803e18f7ba5a4b..00000000000000
--- a/src/go/plugin/go.d/collector/snmp/helpers.go
+++ /dev/null
@@ -1,73 +0,0 @@
-package snmp
-
-func sliceToStrings(items []interface{}) []string {
- var strs []string
- for _, v := range items {
- s, ok := v.(string)
- if !ok {
- // Handle error if an element is not a string.
- continue
- }
- strs = append(strs, s)
- }
- return strs
-}
-
-// func sliceToTableMetricTags(items []interface{}) []TableMetricTag {
-// var metricTag []TableMetricTag
-// for _, v := range items {
-// s, ok := v.(TableMetricTag)
-// if !ok {
-// // Handle error if an element is not a string.
-// continue
-// }
-// metricTag = append(metricTag, s)
-// }
-// return metricTag
-// }
-
-func mergeTableBatches(target tableBatches, source tableBatches) tableBatches {
- merged := tableBatches{}
-
- // Extend batches in `target` with OIDs from `source` that share the same key.
- for key, batch := range target {
-
- if srcBatch, ok := source[key]; ok {
- mergedOids := append(batch.oids, srcBatch.oids...)
- merged[key] = tableBatch{
- tableOID: batch.tableOID,
- oids: mergedOids,
- }
- }
- }
-
- for key := range source {
- if _, ok := target[key]; !ok {
- merged[key] = source[key]
- }
- }
-
- return merged
-}
-
-func mergeStringMaps(m1 map[string]string, m2 map[string]string) map[string]string {
- merged := make(map[string]string)
- for k, v := range m1 {
- merged[k] = v
- }
- for key, value := range m2 {
- merged[key] = value
- }
- return merged
-}
-
-func mergeProcessedMetricMaps(m1 map[string]processedMetric, m2 map[string]processedMetric) map[string]processedMetric {
- merged := make(map[string]processedMetric)
- for k, v := range m1 {
- merged[k] = v
- }
- for key, value := range m2 {
- merged[key] = value
- }
- return merged
-}
diff --git a/src/go/plugin/go.d/collector/snmp/parsing.go b/src/go/plugin/go.d/collector/snmp/parse_profiles.go
similarity index 67%
rename from src/go/plugin/go.d/collector/snmp/parsing.go
rename to src/go/plugin/go.d/collector/snmp/parse_profiles.go
index 42b5ef4123c2f4..bca468c9547815 100644
--- a/src/go/plugin/go.d/collector/snmp/parsing.go
+++ b/src/go/plugin/go.d/collector/snmp/parse_profiles.go
@@ -8,6 +8,99 @@ import (
"github.com/netdata/netdata/go/plugins/plugin/go.d/collector/snmp/ddsnmp/ddprofiledefinition"
)
+type (
+ parsedResult struct {
+ OIDs []string
+ nextOIDs []string
+ bulkOIDs []string
+ parsedMetrics []parsedMetric
+ }
+ parsedMetric any
+)
+
+type (
+ tableBatches map[tableBatchKey]tableBatch
+ tableBatchKey struct {
+ mib string
+ table string
+ }
+ tableBatch struct {
+ tableOID string
+ oids []string
+ }
+)
+
+type indexTag struct {
+ parsedMetricTag parsedMetricTag
+ index int
+}
+
+type columnTag struct {
+ parsedMetricTag parsedMetricTag
+ column string
+ indexSlices []indexSlice
+}
+
+type indexMapping struct {
+ tag string
+ index int
+ mapping map[int]string
+}
+
+type parsedSymbol struct {
+ name string
+ oid string
+ extractValuePattern *regexp.Regexp
+ oidsToResolve map[string]string
+}
+
+type parsedSymbolMetric struct {
+ name string
+ tags []string
+ forcedType string
+ enforceScalar bool
+ options map[string]string
+ extractValuePattern *regexp.Regexp
+ baseoid string //TODO consider changing this to OID, it will not have nested OIDs as it is a symbol
+ unit string
+ description string
+}
+
+type parsedTableMetric struct {
+ name string
+ indexTags []indexTag
+ columnTags []columnTag
+ forcedType string
+ options map[string]string
+ extractValuePattern *regexp.Regexp
+ rowOID string
+ tableName string
+ tableOID string
+}
+
+// union of two above
+
+type parsedMetricTag struct {
+ name string
+
+ tags []string
+ pattern *regexp.Regexp
+ // symbol Symbol not used yet
+}
+
+type metricParseResult struct {
+ oidsToFetch []string
+ oidsToResolve map[string]string
+ indexMappings []indexMapping
+ tableBatches tableBatches
+ parsedMetrics []parsedMetric
+}
+
+type indexSlice struct {
+ Start int
+ End int
+}
+
func parseMetrics(metrics []ddprofiledefinition.MetricsConfig) (parsedResult, error) {
var (
OIDs, nextOIDs, bulkOIDs []string
@@ -93,16 +186,17 @@ func parseMetric(metric ddprofiledefinition.MetricsConfig) (metricParseResult, e
// TODO investigate if this exists in the yamls
// return (parseOIDMetric(oidMetric{name: metric.Name, oid: metric.OID, metricTags: castedStringMetricTags, forcedType: string(metric.MetricType), options: metric.Options})), nil
return metricParseResult{}, nil
-
- } else if len(metric.MIB) == 0 {
+ }
+ if len(metric.MIB) == 0 {
return metricParseResult{}, fmt.Errorf("unsupported metric {%v}", metric)
- } else if metric.Symbol != (ddprofiledefinition.SymbolConfig{}) {
+ }
+ if metric.Symbol != (ddprofiledefinition.SymbolConfig{}) {
// Single Metric
- return (parseSymbolMetric(metric.Symbol, metric.MIB)) // TODO metric tags might be needed here.
+ return parseSymbolMetric(metric.Symbol, metric.MIB) // TODO metric tags might be needed here.
//Can't support tables at the moment
- } else {
- return metricParseResult{}, nil
}
+ return metricParseResult{}, nil
+
}
// TODO error outs on functions
@@ -169,28 +263,17 @@ func parseSymbol(symbol interface{}) (parsedSymbol, error) {
switch s := symbol.(type) {
case ddprofiledefinition.SymbolConfig:
- oid := s.OID
- name := s.Name
+ ps := parsedSymbol{
+ name: s.Name,
+ oid: s.OID,
+ oidsToResolve: map[string]string{s.Name: s.OID},
+ }
if s.ExtractValue != "" {
- extractValuePattern, err := regexp.Compile(s.ExtractValue)
- if err != nil {
-
- return parsedSymbol{}, err
+ if v, err := regexp.Compile(s.ExtractValue); err != nil {
+ ps.extractValuePattern = v
}
- return parsedSymbol{
- name,
- oid,
- extractValuePattern,
- map[string]string{name: oid},
- }, nil
- } else {
- return parsedSymbol{
- name,
- oid,
- nil,
- map[string]string{name: oid},
- }, nil
}
+ return ps, nil
case string:
return parsedSymbol{}, errors.New("string only symbol, can't support yet")
case map[string]interface{}:
@@ -198,7 +281,6 @@ func parseSymbol(symbol interface{}) (parsedSymbol, error) {
name, okName := s["name"].(string)
if !okOID || !okName {
-
return parsedSymbol{}, fmt.Errorf("invalid symbol format: %+v", s)
}
@@ -208,9 +290,7 @@ func parseSymbol(symbol interface{}) (parsedSymbol, error) {
extractValuePattern: nil,
oidsToResolve: map[string]string{name: oid},
}, nil
-
default:
return parsedSymbol{}, fmt.Errorf("unsupported symbol type: %T", symbol)
}
-
}
diff --git a/src/go/plugin/go.d/collector/snmp/profile.go b/src/go/plugin/go.d/collector/snmp/profile.go
deleted file mode 100644
index 124b8e75e85957..00000000000000
--- a/src/go/plugin/go.d/collector/snmp/profile.go
+++ /dev/null
@@ -1,88 +0,0 @@
-package snmp
-
-import (
- "fmt"
-
- "github.com/gosnmp/gosnmp"
-
- "github.com/netdata/netdata/go/plugins/plugin/go.d/collector/snmp/ddsnmp"
-)
-
-func (s *SysObjectIDs) UnmarshalYAML(unmarshal func(any) error) error {
- var single string
- if err := unmarshal(&single); err == nil {
- *s = []string{single}
- return nil
- }
-
- var multiple []string
- if err := unmarshal(&multiple); err == nil {
- *s = multiple
- return nil
- }
-
- return fmt.Errorf("invalid sysobjectid format")
-}
-
-func (c *Collector) parseMetricsFromProfiles(matchingProfiles []*ddsnmp.Profile) (map[string]processedMetric, error) {
- metricMap := map[string]processedMetric{}
- for _, profile := range matchingProfiles {
- profileDef := profile.Definition
- results, err := parseMetrics(profileDef.Metrics)
- if err != nil {
- return nil, err
- }
-
- for _, oid := range results.OIDs {
- response, err := c.snmpClient.Get([]string{oid})
- if err != nil {
- return nil, err
- }
- if (response != &gosnmp.SnmpPacket{}) {
- for _, metric := range results.parsedMetrics {
- switch s := metric.(type) {
- case parsedSymbolMetric:
- // find a matching metric
- if s.baseoid == oid {
- metricName := s.name
- metricType := response.Variables[0].Type
- metricValue := response.Variables[0].Value
- metricUnit := s.unit
- metricDescription := s.description
-
- metricMap[oid] = processedMetric{oid: oid, name: metricName, value: metricValue, metricType: metricType, unit: metricUnit, description: metricDescription}
- }
- }
- }
-
- }
- }
-
- for _, oid := range results.nextOIDs {
- if len(oid) < 1 {
- continue
- }
- if tableRows, err := c.walkOIDTree(oid); err != nil {
- return nil, fmt.Errorf("error walking OID tree: %v, oid %s", err, oid)
- } else {
- for _, metric := range results.parsedMetrics {
- switch s := metric.(type) {
- case parsedTableMetric:
- // find a matching metric
- if s.rowOID == oid {
- for key, value := range tableRows {
- value.name = s.name
- value.tableName = s.tableName
- tableRows[key] = value
- }
- metricMap = mergeProcessedMetricMaps(metricMap, tableRows)
- }
- }
- }
- }
-
- }
-
- }
- return metricMap, nil
-}
diff --git a/src/go/plugin/go.d/collector/snmp/types.go b/src/go/plugin/go.d/collector/snmp/types.go
deleted file mode 100644
index e1c30b2b73773c..00000000000000
--- a/src/go/plugin/go.d/collector/snmp/types.go
+++ /dev/null
@@ -1,155 +0,0 @@
-package snmp
-
-import (
- "regexp"
-
- "github.com/gosnmp/gosnmp"
-)
-
-type snmpPDU struct {
- value interface{}
- oid string
- metricType gosnmp.Asn1BER
-}
-
-type SysObjectIDs []string
-
-type parsedResult struct {
- OIDs []string
- nextOIDs []string
- bulkOIDs []string
- parsedMetrics []parsedMetric
-}
-
-type tableBatchKey struct {
- mib string
- table string
-}
-
-type tableBatch struct {
- tableOID string
- oids []string
-}
-
-type tableBatches map[tableBatchKey]tableBatch
-
-type indexTag struct {
- parsedMetricTag parsedMetricTag
- index int
-}
-
-type columnTag struct {
- parsedMetricTag parsedMetricTag
- column string
- indexSlices []IndexSlice
-}
-
-type indexMapping struct {
- tag string
- index int
- mapping map[int]string
-}
-
-type parsedSymbol struct {
- name string
- oid string
- extractValuePattern *regexp.Regexp
- oidsToResolve map[string]string
-}
-
-type parsedColumnMetricTag struct {
- oidsToResolve map[string]string
- tableBatches tableBatches
- columnTags []columnTag
-}
-type parsedIndexMetricTag struct {
- indexTags []indexTag
- indexMappings map[int]map[string]string
-}
-
-type parsedTableMetricTag struct {
- oidsToResolve map[string]string
- tableBatches tableBatches
- columnTags []columnTag
- indexTags []indexTag
- indexMappings map[int]map[int]string
-}
-
-type parsedSymbolMetric struct {
- name string
- tags []string
- forcedType string
- enforceScalar bool
- options map[string]string
- extractValuePattern *regexp.Regexp
- baseoid string //TODO consider changing this to OID, it will not have nested OIDs as it is a symbol
- unit string
- description string
-}
-
-type parsedTableMetric struct {
- name string
- indexTags []indexTag
- columnTags []columnTag
- forcedType string
- options map[string]string
- extractValuePattern *regexp.Regexp
- rowOID string
- tableName string
- tableOID string
-}
-
-// union of two above
-type parsedMetric any
-
-// Not supported yet
-/*type parsedSimpleMetricTag struct {
- name string
- }
-
-type parsedMatchMetricTag struct {
-tags []string
-symbol Symbol
-pattern *regexp.Regexp
-}
-
- type symbolTag struct {
- parsedMetricTag parsedMetricTag
- symbol string
- }
-
- type parsedSymbolTagsResult struct {
- oids []string
- parsedSymbolTags []symbolTag
- }
-*/
-type parsedMetricTag struct {
- name string
-
- tags []string
- pattern *regexp.Regexp
- // symbol Symbol not used yet
-}
-
-type metricParseResult struct {
- oidsToFetch []string
- oidsToResolve map[string]string
- indexMappings []indexMapping
- tableBatches tableBatches
- parsedMetrics []parsedMetric
-}
-
-type IndexSlice struct {
- Start int
- End int
-}
-
-type processedMetric struct {
- oid string
- name string
- value interface{}
- metricType gosnmp.Asn1BER
- tableName string
- unit string
- description string
-}
From 36a520c9dcede946af0c3fc8681abb7082fe0f75 Mon Sep 17 00:00:00 2001
From: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
Date: Mon, 12 May 2025 19:26:27 +0300
Subject: [PATCH 30/51] Switch to poll from epoll (#20273)
Disable epoll for now (random high cpu usage observed in kernels 6.14.4+)
---
src/libnetdata/socket/nd-poll.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/src/libnetdata/socket/nd-poll.c b/src/libnetdata/socket/nd-poll.c
index 72bb10abb23f9a..401d3098ec1480 100644
--- a/src/libnetdata/socket/nd-poll.c
+++ b/src/libnetdata/socket/nd-poll.c
@@ -6,7 +6,7 @@
#define POLLRDHUP 0
#endif
-#if defined(OS_LINUX)
+#if defined(OS_LINUX_DISABLE_EPOLL_DUE_TO_BUG)
#include
struct fd_info {
From 584d7d52215f386a933efbd7b9f48ec41c33775d Mon Sep 17 00:00:00 2001
From: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
Date: Mon, 12 May 2025 19:26:55 +0300
Subject: [PATCH 31/51] Switch to uv threads (#20250)
* Use nd_thread
Switch to uv_thread_create
* Remove NETDATA_THREAD_OPTION_JOINABLE option
* Remove unused and commented-out code
---
.../cgroups.plugin/cgroup-discovery.c | 5 +-
.../cgroups.plugin/cgroup-internals.h | 4 +-
src/collectors/cgroups.plugin/sys_fs_cgroup.c | 7 ++-
.../debugfs.plugin/module-libsensors.c | 2 +-
.../diskspace.plugin/plugin_diskspace.c | 2 +-
src/collectors/ebpf.plugin/ebpf.c | 6 +-
src/collectors/ebpf.plugin/ebpf_cachestat.c | 2 +-
src/collectors/ebpf.plugin/ebpf_dcstat.c | 2 +-
src/collectors/ebpf.plugin/ebpf_fd.c | 2 +-
src/collectors/ebpf.plugin/ebpf_functions.c | 2 +-
src/collectors/ebpf.plugin/ebpf_shm.c | 2 +-
src/collectors/ebpf.plugin/ebpf_socket.c | 2 +-
src/collectors/ebpf.plugin/ebpf_swap.c | 2 +-
src/collectors/ebpf.plugin/ebpf_vfs.c | 2 +-
.../freeipmi.plugin/freeipmi_plugin.c | 6 +-
src/collectors/proc.plugin/plugin_proc.c | 2 +-
.../profile.plugin/plugin_profile.cc | 2 +-
src/collectors/statsd.plugin/statsd.c | 2 +-
.../systemd-journal.plugin/systemd-main.c | 2 +-
src/daemon/config/netdata-conf-db.c | 2 +-
src/daemon/daemon-service.c | 61 +++----------------
src/daemon/daemon-service.h | 8 +--
src/daemon/daemon-shutdown-watcher.c | 2 +-
src/daemon/daemon-shutdown.c | 2 +-
src/daemon/daemon-systemd-watcher.c | 2 +-
src/daemon/dyncfg/dyncfg-unittest.c | 2 +-
src/daemon/libuv_workers.c | 18 ------
src/daemon/libuv_workers.h | 1 -
src/daemon/main.c | 4 +-
src/daemon/winsvc.cc | 2 +-
src/database/engine/cache.c | 8 +--
src/database/engine/mrg-unittest.c | 2 +-
src/database/engine/rrdengine.c | 35 ++++++-----
src/database/engine/rrdengine.h | 2 +-
src/database/sqlite/sqlite_aclk.c | 22 +++----
src/database/sqlite/sqlite_metadata.c | 44 ++++++-------
src/exporting/init_connectors.c | 2 +-
src/libnetdata/aral/aral.c | 6 +-
.../dictionary/dictionary-unittest.c | 20 ++----
.../functions_evloop/functions_evloop.c | 4 +-
src/libnetdata/local-sockets/local-sockets.h | 2 +-
src/libnetdata/locks/benchmark-rw.c | 14 ++---
src/libnetdata/locks/benchmark.c | 2 +-
src/libnetdata/locks/waitq.c | 5 +-
src/libnetdata/spawn_server/log-forwarder.c | 2 +-
src/libnetdata/string/string.c | 2 +-
src/libnetdata/threads/threads.c | 60 +++++++++---------
src/libnetdata/threads/threads.h | 13 ++--
src/libnetdata/uuid/uuidmap.c | 2 +-
src/ml/ml_public.cc | 6 +-
src/plugins.d/plugins_d.c | 2 +-
src/streaming/stream-connector.c | 2 +-
src/streaming/stream-replication-sender.c | 2 +-
src/streaming/stream-thread.c | 2 +-
src/web/api/queries/backfill.c | 2 +-
src/web/server/static/static-threaded.c | 2 +-
56 files changed, 161 insertions(+), 264 deletions(-)
diff --git a/src/collectors/cgroups.plugin/cgroup-discovery.c b/src/collectors/cgroups.plugin/cgroup-discovery.c
index 6de9800292fc40..0c6a7b07c21f65 100644
--- a/src/collectors/cgroups.plugin/cgroup-discovery.c
+++ b/src/collectors/cgroups.plugin/cgroup-discovery.c
@@ -1287,7 +1287,7 @@ static inline void discovery_find_all_cgroups() {
netdata_log_debug(D_CGROUP, "done searching for cgroups");
}
-void cgroup_discovery_worker(void *ptr)
+void *cgroup_discovery_worker(void *ptr)
{
UNUSED(ptr);
uv_thread_set_name_np("P[cgroupsdisc]");
@@ -1311,7 +1311,7 @@ void cgroup_discovery_worker(void *ptr)
NULL,
SIMPLE_PATTERN_EXACT, true);
- service_register(SERVICE_THREAD_TYPE_LIBUV, NULL, NULL, NULL, false);
+ service_register(NULL, NULL, NULL);
netdata_cgroup_ebpf_initialize_shm();
@@ -1342,4 +1342,5 @@ void cgroup_discovery_worker(void *ptr)
worker_unregister();
service_exits();
__atomic_store_n(&discovery_thread.exited,1,__ATOMIC_RELAXED);
+ return NULL;
}
diff --git a/src/collectors/cgroups.plugin/cgroup-internals.h b/src/collectors/cgroups.plugin/cgroup-internals.h
index 39584678dab20a..3e029d9bf08d93 100644
--- a/src/collectors/cgroups.plugin/cgroup-internals.h
+++ b/src/collectors/cgroups.plugin/cgroup-internals.h
@@ -261,7 +261,7 @@ struct cgroup {
};
struct discovery_thread {
- uv_thread_t thread;
+ ND_THREAD *thread;
uv_mutex_t mutex;
uv_cond_t cond_var;
int exited;
@@ -274,7 +274,7 @@ extern char cgroup_chart_id_prefix[];
extern char services_chart_id_prefix[];
extern uv_mutex_t cgroup_root_mutex;
-void cgroup_discovery_worker(void *ptr);
+void *cgroup_discovery_worker(void *ptr);
extern bool is_inside_k8s;
extern long system_page_size;
diff --git a/src/collectors/cgroups.plugin/sys_fs_cgroup.c b/src/collectors/cgroups.plugin/sys_fs_cgroup.c
index 5de53d71c03b42..7b1f13056ffc46 100644
--- a/src/collectors/cgroups.plugin/sys_fs_cgroup.c
+++ b/src/collectors/cgroups.plugin/sys_fs_cgroup.c
@@ -1398,9 +1398,10 @@ void *cgroups_main(void *ptr) {
goto exit;
}
- int error = uv_thread_create(&discovery_thread.thread, cgroup_discovery_worker, NULL);
- if (error) {
- collector_error("CGROUP: cannot create thread worker. uv_thread_create(): %s", uv_strerror(error));
+ discovery_thread.thread = nd_thread_create("CGDISCOVER", NETDATA_THREAD_OPTION_DEFAULT, cgroup_discovery_worker, NULL);
+
+ if (!discovery_thread.thread) {
+ collector_error("CGROUP: cannot create thread worker");
goto exit;
}
diff --git a/src/collectors/debugfs.plugin/module-libsensors.c b/src/collectors/debugfs.plugin/module-libsensors.c
index 72f438bafe92cc..ce29dfc39d3f05 100644
--- a/src/collectors/debugfs.plugin/module-libsensors.c
+++ b/src/collectors/debugfs.plugin/module-libsensors.c
@@ -1321,7 +1321,7 @@ int do_module_libsensors(int update_every, const char *name __maybe_unused) {
if(!libsensors) {
libsensors_update_every = update_every;
libsensors_running = true;
- libsensors = nd_thread_create("LIBSENSORS", NETDATA_THREAD_OPTION_JOINABLE, libsensors_thread, NULL);
+ libsensors = nd_thread_create("LIBSENSORS", NETDATA_THREAD_OPTION_DEFAULT, libsensors_thread, NULL);
}
return libsensors && libsensors_running ? 0 : 1;
diff --git a/src/collectors/diskspace.plugin/plugin_diskspace.c b/src/collectors/diskspace.plugin/plugin_diskspace.c
index a987992c8e0e42..fbd84e2c7032a9 100644
--- a/src/collectors/diskspace.plugin/plugin_diskspace.c
+++ b/src/collectors/diskspace.plugin/plugin_diskspace.c
@@ -873,7 +873,7 @@ void *diskspace_main(void *ptr) {
diskspace_slow_thread = nd_thread_create(
"P[diskspace slow]",
- NETDATA_THREAD_OPTION_JOINABLE,
+ NETDATA_THREAD_OPTION_DEFAULT,
diskspace_slow_worker,
&slow_worker_data);
diff --git a/src/collectors/ebpf.plugin/ebpf.c b/src/collectors/ebpf.plugin/ebpf.c
index 57c4aa64a30657..e6ee8308ae146c 100644
--- a/src/collectors/ebpf.plugin/ebpf.c
+++ b/src/collectors/ebpf.plugin/ebpf.c
@@ -4244,7 +4244,7 @@ static void ebpf_initialize_data_sharing()
switch (integration_with_collectors) {
case NETDATA_EBPF_INTEGRATION_SOCKET: {
socket_ipc =
- nd_thread_create("ebpf_socket_ipc", NETDATA_THREAD_OPTION_JOINABLE, ebpf_socket_thread_ipc, NULL);
+ nd_thread_create("ebpf_socket_ipc", NETDATA_THREAD_OPTION_DEFAULT, ebpf_socket_thread_ipc, NULL);
break;
}
case NETDATA_EBPF_INTEGRATION_SHM:
@@ -4389,7 +4389,7 @@ int main(int argc, char **argv)
cgroup_integration_thread.start_routine = ebpf_cgroup_integration;
cgroup_integration_thread.thread =
- nd_thread_create(cgroup_integration_thread.name, NETDATA_THREAD_OPTION_JOINABLE, ebpf_cgroup_integration, NULL);
+ nd_thread_create(cgroup_integration_thread.name, NETDATA_THREAD_OPTION_DEFAULT, ebpf_cgroup_integration, NULL);
ebpf_initialize_data_sharing();
@@ -4407,7 +4407,7 @@ int main(int argc, char **argv)
if (em->functions.apps_routine && (em->apps_charts || em->cgroup_charts)) {
collect_pids |= 1 << i;
}
- st->thread = nd_thread_create(st->name, NETDATA_THREAD_OPTION_JOINABLE, st->start_routine, em);
+ st->thread = nd_thread_create(st->name, NETDATA_THREAD_OPTION_DEFAULT, st->start_routine, em);
} else {
em->lifetime = EBPF_DEFAULT_LIFETIME;
}
diff --git a/src/collectors/ebpf.plugin/ebpf_cachestat.c b/src/collectors/ebpf.plugin/ebpf_cachestat.c
index e694c8aa8e3714..4e8ff3937ce3fa 100644
--- a/src/collectors/ebpf.plugin/ebpf_cachestat.c
+++ b/src/collectors/ebpf.plugin/ebpf_cachestat.c
@@ -1760,7 +1760,7 @@ void *ebpf_cachestat_thread(void *ptr)
pthread_mutex_unlock(&lock);
ebpf_read_cachestat.thread =
- nd_thread_create(ebpf_read_cachestat.name, NETDATA_THREAD_OPTION_JOINABLE, ebpf_read_cachestat_thread, em);
+ nd_thread_create(ebpf_read_cachestat.name, NETDATA_THREAD_OPTION_DEFAULT, ebpf_read_cachestat_thread, em);
cachestat_collector(em);
diff --git a/src/collectors/ebpf.plugin/ebpf_dcstat.c b/src/collectors/ebpf.plugin/ebpf_dcstat.c
index 6509507948b419..8ae133474b463a 100644
--- a/src/collectors/ebpf.plugin/ebpf_dcstat.c
+++ b/src/collectors/ebpf.plugin/ebpf_dcstat.c
@@ -1532,7 +1532,7 @@ void *ebpf_dcstat_thread(void *ptr)
pthread_mutex_unlock(&lock);
ebpf_read_dcstat.thread =
- nd_thread_create(ebpf_read_dcstat.name, NETDATA_THREAD_OPTION_JOINABLE, ebpf_read_dcstat_thread, em);
+ nd_thread_create(ebpf_read_dcstat.name, NETDATA_THREAD_OPTION_DEFAULT, ebpf_read_dcstat_thread, em);
dcstat_collector(em);
diff --git a/src/collectors/ebpf.plugin/ebpf_fd.c b/src/collectors/ebpf.plugin/ebpf_fd.c
index 609757abd62351..823773a67ef606 100644
--- a/src/collectors/ebpf.plugin/ebpf_fd.c
+++ b/src/collectors/ebpf.plugin/ebpf_fd.c
@@ -1558,7 +1558,7 @@ void *ebpf_fd_thread(void *ptr)
pthread_mutex_unlock(&lock);
- ebpf_read_fd.thread = nd_thread_create(ebpf_read_fd.name, NETDATA_THREAD_OPTION_JOINABLE, ebpf_read_fd_thread, em);
+ ebpf_read_fd.thread = nd_thread_create(ebpf_read_fd.name, NETDATA_THREAD_OPTION_DEFAULT, ebpf_read_fd_thread, em);
fd_collector(em);
diff --git a/src/collectors/ebpf.plugin/ebpf_functions.c b/src/collectors/ebpf.plugin/ebpf_functions.c
index a16f9784c76d2b..27c1bb91db15b6 100644
--- a/src/collectors/ebpf.plugin/ebpf_functions.c
+++ b/src/collectors/ebpf.plugin/ebpf_functions.c
@@ -31,7 +31,7 @@ static int ebpf_function_start_thread(ebpf_module_t *em, int period)
netdata_log_info("Starting thread %s with lifetime = %d", em->info.thread_name, period);
#endif
- st->thread = nd_thread_create(st->name, NETDATA_THREAD_OPTION_JOINABLE, st->start_routine, em);
+ st->thread = nd_thread_create(st->name, NETDATA_THREAD_OPTION_DEFAULT, st->start_routine, em);
return st->thread ? 0 : 1;
}
diff --git a/src/collectors/ebpf.plugin/ebpf_shm.c b/src/collectors/ebpf.plugin/ebpf_shm.c
index 6d439aed082ab5..f437842f7e7555 100644
--- a/src/collectors/ebpf.plugin/ebpf_shm.c
+++ b/src/collectors/ebpf.plugin/ebpf_shm.c
@@ -1395,7 +1395,7 @@ void *ebpf_shm_thread(void *ptr)
pthread_mutex_unlock(&lock);
ebpf_read_shm.thread =
- nd_thread_create(ebpf_read_shm.name, NETDATA_THREAD_OPTION_JOINABLE, ebpf_read_shm_thread, em);
+ nd_thread_create(ebpf_read_shm.name, NETDATA_THREAD_OPTION_DEFAULT, ebpf_read_shm_thread, em);
shm_collector(em);
diff --git a/src/collectors/ebpf.plugin/ebpf_socket.c b/src/collectors/ebpf.plugin/ebpf_socket.c
index a670f68e273a7a..80e37c9fb9c220 100644
--- a/src/collectors/ebpf.plugin/ebpf_socket.c
+++ b/src/collectors/ebpf.plugin/ebpf_socket.c
@@ -3051,7 +3051,7 @@ void *ebpf_socket_thread(void *ptr)
NETDATA_MAX_SOCKET_VECTOR);
ebpf_read_socket.thread =
- nd_thread_create(ebpf_read_socket.name, NETDATA_THREAD_OPTION_JOINABLE, ebpf_read_socket_thread, em);
+ nd_thread_create(ebpf_read_socket.name, NETDATA_THREAD_OPTION_DEFAULT, ebpf_read_socket_thread, em);
pthread_mutex_lock(&lock);
ebpf_socket_create_global_charts(em);
diff --git a/src/collectors/ebpf.plugin/ebpf_swap.c b/src/collectors/ebpf.plugin/ebpf_swap.c
index 25d18ce1f95f8c..d569cae95f6cea 100644
--- a/src/collectors/ebpf.plugin/ebpf_swap.c
+++ b/src/collectors/ebpf.plugin/ebpf_swap.c
@@ -1221,7 +1221,7 @@ void *ebpf_swap_thread(void *ptr)
pthread_mutex_unlock(&lock);
ebpf_read_swap.thread =
- nd_thread_create(ebpf_read_swap.name, NETDATA_THREAD_OPTION_JOINABLE, ebpf_read_swap_thread, em);
+ nd_thread_create(ebpf_read_swap.name, NETDATA_THREAD_OPTION_DEFAULT, ebpf_read_swap_thread, em);
swap_collector(em);
diff --git a/src/collectors/ebpf.plugin/ebpf_vfs.c b/src/collectors/ebpf.plugin/ebpf_vfs.c
index afcab371dc0317..14eff08aa02547 100644
--- a/src/collectors/ebpf.plugin/ebpf_vfs.c
+++ b/src/collectors/ebpf.plugin/ebpf_vfs.c
@@ -2958,7 +2958,7 @@ void *ebpf_vfs_thread(void *ptr)
pthread_mutex_unlock(&lock);
ebpf_read_vfs.thread =
- nd_thread_create(ebpf_read_vfs.name, NETDATA_THREAD_OPTION_JOINABLE, ebpf_read_vfs_thread, em);
+ nd_thread_create(ebpf_read_vfs.name, NETDATA_THREAD_OPTION_DEFAULT, ebpf_read_vfs_thread, em);
vfs_collector(em);
diff --git a/src/collectors/freeipmi.plugin/freeipmi_plugin.c b/src/collectors/freeipmi.plugin/freeipmi_plugin.c
index fffb4c6ae32cb1..8be6e4fa0aecfe 100644
--- a/src/collectors/freeipmi.plugin/freeipmi_plugin.c
+++ b/src/collectors/freeipmi.plugin/freeipmi_plugin.c
@@ -1982,9 +1982,9 @@ int main (int argc, char **argv) {
},
};
- nd_thread_create("IPMI[sensors]", NETDATA_THREAD_OPTION_DONT_LOG | NETDATA_THREAD_OPTION_JOINABLE, netdata_ipmi_collection_thread, &sensors_data);
- if(netdata_do_sel)
- nd_thread_create("IPMI[sel]", NETDATA_THREAD_OPTION_DONT_LOG | NETDATA_THREAD_OPTION_JOINABLE, netdata_ipmi_collection_thread, &sel_data);
+ nd_thread_create("IPMI[sensors]", NETDATA_THREAD_OPTION_DONT_LOG, netdata_ipmi_collection_thread, &sensors_data);
+ if (netdata_do_sel)
+ nd_thread_create("IPMI[sel]", NETDATA_THREAD_OPTION_DONT_LOG, netdata_ipmi_collection_thread, &sel_data);
// ------------------------------------------------------------------------
// the main loop
diff --git a/src/collectors/proc.plugin/plugin_proc.c b/src/collectors/proc.plugin/plugin_proc.c
index 9015aee58908e2..6ee06f9b1769d8 100644
--- a/src/collectors/proc.plugin/plugin_proc.c
+++ b/src/collectors/proc.plugin/plugin_proc.c
@@ -217,7 +217,7 @@ void *proc_main(void *ptr)
if (inicfg_get_boolean(&netdata_config, "plugin:proc", "/proc/net/dev", CONFIG_BOOLEAN_YES)) {
netdata_log_debug(D_SYSTEM, "Starting thread %s.", THREAD_NETDEV_NAME);
- netdev_thread = nd_thread_create(THREAD_NETDEV_NAME, NETDATA_THREAD_OPTION_JOINABLE, netdev_main, NULL);
+ netdev_thread = nd_thread_create(THREAD_NETDEV_NAME, NETDATA_THREAD_OPTION_DEFAULT, netdev_main, NULL);
}
inicfg_get_boolean(&netdata_config, "plugin:proc", "/proc/pagetypeinfo", CONFIG_BOOLEAN_NO);
diff --git a/src/collectors/profile.plugin/plugin_profile.cc b/src/collectors/profile.plugin/plugin_profile.cc
index ffbcbbdf2d7e57..94e55bee0ecb07 100644
--- a/src/collectors/profile.plugin/plugin_profile.cc
+++ b/src/collectors/profile.plugin/plugin_profile.cc
@@ -217,7 +217,7 @@ extern "C" void *profile_main(void *ptr) {
char Tag[NETDATA_THREAD_TAG_MAX + 1];
snprintfz(Tag, NETDATA_THREAD_TAG_MAX, "PROFILER[%zu]", Idx);
- Threads[Idx] = nd_thread_create(Tag, NETDATA_THREAD_OPTION_JOINABLE,
+ Threads[Idx] = nd_thread_create(Tag, NETDATA_THREAD_OPTION_DEFAULT,
subprofile_main, static_cast(&Profilers[Idx]));
}
diff --git a/src/collectors/statsd.plugin/statsd.c b/src/collectors/statsd.plugin/statsd.c
index 4d69572f3eaf08..4a80f67fd6739e 100644
--- a/src/collectors/statsd.plugin/statsd.c
+++ b/src/collectors/statsd.plugin/statsd.c
@@ -2613,7 +2613,7 @@ void *statsd_main(void *ptr) {
char tag[NETDATA_THREAD_TAG_MAX + 1];
snprintfz(tag, NETDATA_THREAD_TAG_MAX, "STATSD_IN[%d]", i + 1);
spinlock_init(&statsd.collection_threads_status[i].spinlock);
- statsd.collection_threads_status[i].thread = nd_thread_create(tag, NETDATA_THREAD_OPTION_JOINABLE,
+ statsd.collection_threads_status[i].thread = nd_thread_create(tag, NETDATA_THREAD_OPTION_DEFAULT,
statsd_collector_thread, &statsd.collection_threads_status[i]);
}
diff --git a/src/collectors/systemd-journal.plugin/systemd-main.c b/src/collectors/systemd-journal.plugin/systemd-main.c
index 56389e5a35b444..2e90ec97986ed3 100644
--- a/src/collectors/systemd-journal.plugin/systemd-main.c
+++ b/src/collectors/systemd-journal.plugin/systemd-main.c
@@ -67,7 +67,7 @@ int main(int argc __maybe_unused, char **argv __maybe_unused)
// ------------------------------------------------------------------------
// watcher thread
- nd_thread_create("SDWATCH", NETDATA_THREAD_OPTION_DONT_LOG | NETDATA_THREAD_OPTION_JOINABLE, nd_journal_watcher_main, NULL);
+ nd_thread_create("SDWATCH", NETDATA_THREAD_OPTION_DONT_LOG, nd_journal_watcher_main, NULL);
// ------------------------------------------------------------------------
// the event loop for functions
diff --git a/src/daemon/config/netdata-conf-db.c b/src/daemon/config/netdata-conf-db.c
index 6e85496afc73c5..ed8ded3aa9d5d1 100644
--- a/src/daemon/config/netdata-conf-db.c
+++ b/src/daemon/config/netdata-conf-db.c
@@ -290,7 +290,7 @@ void netdata_conf_dbengine_init(const char *hostname) {
if(parallel_initialization) {
char tag[NETDATA_THREAD_TAG_MAX + 1];
snprintfz(tag, NETDATA_THREAD_TAG_MAX, "DBENGINIT[%zu]", tier);
- tiers_init[tier].thread = nd_thread_create(tag, NETDATA_THREAD_OPTION_JOINABLE, dbengine_tier_init, &tiers_init[tier]);
+ tiers_init[tier].thread = nd_thread_create(tag, NETDATA_THREAD_OPTION_DEFAULT, dbengine_tier_init, &tiers_init[tier]);
}
else
dbengine_tier_init(&tiers_init[tier]);
diff --git a/src/daemon/daemon-service.c b/src/daemon/daemon-service.c
index 8e0c68b2916b04..77bf3247140f94 100644
--- a/src/daemon/daemon-service.c
+++ b/src/daemon/daemon-service.c
@@ -4,16 +4,11 @@
typedef struct service_thread {
pid_t tid;
- SERVICE_THREAD_TYPE type;
SERVICE_TYPE services;
char name[ND_THREAD_TAG_MAX + 1];
- bool stop_immediately;
bool cancelled;
- union {
- ND_THREAD *netdata_thread;
- uv_thread_t uv_thread;
- };
+ ND_THREAD *netdata_thread;
force_quit_t force_quit_callback;
request_quit_t request_quit_callback;
@@ -27,7 +22,8 @@ struct service_globals {
.pid_judy = NULL,
};
-SERVICE_THREAD *service_register(SERVICE_THREAD_TYPE thread_type, request_quit_t request_quit_callback, force_quit_t force_quit_callback, void *data, bool update __maybe_unused) {
+SERVICE_THREAD *service_register(request_quit_t request_quit_callback, force_quit_t force_quit_callback, void *data)
+{
SERVICE_THREAD *sth = NULL;
pid_t tid = gettid_cached();
@@ -36,23 +32,11 @@ SERVICE_THREAD *service_register(SERVICE_THREAD_TYPE thread_type, request_quit_t
if(!*PValue) {
sth = callocz(1, sizeof(SERVICE_THREAD));
sth->tid = tid;
- sth->type = thread_type;
sth->request_quit_callback = request_quit_callback;
sth->force_quit_callback = force_quit_callback;
sth->data = data;
*PValue = sth;
-
- switch(thread_type) {
- default:
- case SERVICE_THREAD_TYPE_NETDATA:
- sth->netdata_thread = nd_thread_self();
- break;
-
- case SERVICE_THREAD_TYPE_EVENT_LOOP:
- case SERVICE_THREAD_TYPE_LIBUV:
- sth->uv_thread = uv_thread_self();
- break;
- }
+ sth->netdata_thread = nd_thread_self();
const char *name = nd_thread_tag();
if(!name) name = "";
@@ -82,15 +66,11 @@ bool service_running(SERVICE_TYPE service) {
static __thread SERVICE_THREAD *sth = NULL;
if(unlikely(!sth))
- sth = service_register(SERVICE_THREAD_TYPE_NETDATA, NULL, NULL, NULL, false);
+ sth = service_register(NULL, NULL, NULL);
sth->services |= service;
- bool cancelled = false;
- if (sth->type == SERVICE_THREAD_TYPE_NETDATA)
- cancelled = nd_thread_signaled_to_cancel();
-
- return !sth->stop_immediately && !exit_initiated_get() && !cancelled;
+ return !nd_thread_signaled_to_cancel() && !exit_initiated_get();
}
void service_signal_exit(SERVICE_TYPE service) {
@@ -103,19 +83,8 @@ void service_signal_exit(SERVICE_TYPE service) {
SERVICE_THREAD *sth = *PValue;
if((sth->services & service)) {
- sth->stop_immediately = true;
-
- switch(sth->type) {
- default:
- case SERVICE_THREAD_TYPE_NETDATA:
- nd_thread_signal_cancel(sth->netdata_thread);
- break;
-
- case SERVICE_THREAD_TYPE_EVENT_LOOP:
- case SERVICE_THREAD_TYPE_LIBUV:
- break;
- }
-
+ nd_thread_signal_cancel(sth->netdata_thread);
+ nd_log_daemon(NDLP_DEBUG, "SERVICE: Signal to stop : %s", sth->name);
if(sth->request_quit_callback) {
spinlock_unlock(&service_globals.lock);
sth->request_quit_callback(sth->data);
@@ -123,7 +92,6 @@ void service_signal_exit(SERVICE_TYPE service) {
}
}
}
-
spinlock_unlock(&service_globals.lock);
}
@@ -180,18 +148,7 @@ bool service_wait_exit(SERVICE_TYPE service, usec_t timeout_ut) {
SERVICE_THREAD *sth = *PValue;
if(sth->services & service && sth->tid != gettid_cached() && !sth->cancelled) {
sth->cancelled = true;
-
- switch(sth->type) {
- default:
- case SERVICE_THREAD_TYPE_NETDATA:
- nd_thread_signal_cancel(sth->netdata_thread);
- break;
-
- case SERVICE_THREAD_TYPE_EVENT_LOOP:
- case SERVICE_THREAD_TYPE_LIBUV:
- break;
- }
-
+ nd_thread_signal_cancel(sth->netdata_thread);
if(running)
buffer_strcat(thread_list, ", ");
diff --git a/src/daemon/daemon-service.h b/src/daemon/daemon-service.h
index c82a40ea5f2266..f7809524a2b864 100644
--- a/src/daemon/daemon-service.h
+++ b/src/daemon/daemon-service.h
@@ -24,18 +24,12 @@ typedef enum {
SERVICE_SYSTEMD = (1 << 15),
} SERVICE_TYPE;
-typedef enum {
- SERVICE_THREAD_TYPE_NETDATA,
- SERVICE_THREAD_TYPE_LIBUV,
- SERVICE_THREAD_TYPE_EVENT_LOOP,
-} SERVICE_THREAD_TYPE;
-
typedef void (*force_quit_t)(void *data);
typedef void (*request_quit_t)(void *data);
void service_exits(void);
bool service_running(SERVICE_TYPE service);
-struct service_thread *service_register(SERVICE_THREAD_TYPE thread_type, request_quit_t request_quit_callback, force_quit_t force_quit_callback, void *data, bool update __maybe_unused);
+struct service_thread *service_register(request_quit_t request_quit_callback, force_quit_t force_quit_callback, void *data);
void service_signal_exit(SERVICE_TYPE service);
bool service_wait_exit(SERVICE_TYPE service, usec_t timeout_ut);
diff --git a/src/daemon/daemon-shutdown-watcher.c b/src/daemon/daemon-shutdown-watcher.c
index 82f789cd21674a..c3f3cb6d6442b0 100644
--- a/src/daemon/daemon-shutdown-watcher.c
+++ b/src/daemon/daemon-shutdown-watcher.c
@@ -195,7 +195,7 @@ void watcher_thread_start() {
completion_init(&shutdown_begin_completion);
completion_init(&shutdown_end_completion);
- watcher_thread = nd_thread_create("EXIT_WATCHER", NETDATA_THREAD_OPTION_JOINABLE, watcher_main, NULL);
+ watcher_thread = nd_thread_create("EXIT_WATCHER", NETDATA_THREAD_OPTION_DEFAULT, watcher_main, NULL);
}
void watcher_thread_stop() {
diff --git a/src/daemon/daemon-shutdown.c b/src/daemon/daemon-shutdown.c
index ac3352341bd24f..7403e3335d5ed4 100644
--- a/src/daemon/daemon-shutdown.c
+++ b/src/daemon/daemon-shutdown.c
@@ -264,7 +264,7 @@ static void netdata_cleanup_and_exit(EXIT_REASON reason, bool abnormal, bool exi
ND_THREAD *th[nd_profile.storage_tiers];
for (size_t tier = 0; tier < nd_profile.storage_tiers; tier++)
- th[tier] = nd_thread_create("rrdeng-exit", NETDATA_THREAD_OPTION_JOINABLE, rrdeng_exit_background, multidb_ctx[tier]);
+ th[tier] = nd_thread_create("rrdeng-exit", NETDATA_THREAD_OPTION_DEFAULT, rrdeng_exit_background, multidb_ctx[tier]);
// flush anything remaining again - just in case
rrdeng_flush_everything_and_wait(true, true, false);
diff --git a/src/daemon/daemon-systemd-watcher.c b/src/daemon/daemon-systemd-watcher.c
index c375b97d24c296..ed4d66cc634966 100644
--- a/src/daemon/daemon-systemd-watcher.c
+++ b/src/daemon/daemon-systemd-watcher.c
@@ -138,7 +138,7 @@ static void listen_for_systemd_dbus_events(void) {
void *systemd_watcher_thread(void *arg) {
struct netdata_static_thread *static_thread = arg;
- service_register(SERVICE_THREAD_TYPE_NETDATA, NULL, NULL, NULL, false);
+ service_register(NULL, NULL, NULL);
listen_for_systemd_dbus_events();
diff --git a/src/daemon/dyncfg/dyncfg-unittest.c b/src/daemon/dyncfg/dyncfg-unittest.c
index 763451501e5e42..14544acb7aa320 100644
--- a/src/daemon/dyncfg/dyncfg-unittest.c
+++ b/src/daemon/dyncfg/dyncfg-unittest.c
@@ -580,7 +580,7 @@ int dyncfg_unittest(void) {
// ------------------------------------------------------------------------
// create the thread for testing async communication
- ND_THREAD *thread = nd_thread_create("unittest", NETDATA_THREAD_OPTION_JOINABLE, dyncfg_unittest_thread_action, NULL);
+ ND_THREAD *thread = nd_thread_create("unittest", NETDATA_THREAD_OPTION_DEFAULT, dyncfg_unittest_thread_action, NULL);
// ------------------------------------------------------------------------
// single
diff --git a/src/daemon/libuv_workers.c b/src/daemon/libuv_workers.c
index 5288a111316852..a9c87d48de3911 100644
--- a/src/daemon/libuv_workers.c
+++ b/src/daemon/libuv_workers.c
@@ -109,24 +109,6 @@ void register_libuv_worker_jobs() {
register_libuv_worker_jobs_internal();
}
-// utils
-#define MAX_THREAD_CREATE_RETRIES (10)
-#define MAX_THREAD_CREATE_WAIT_MS (1000)
-
-int create_uv_thread(uv_thread_t *thread, uv_thread_cb thread_func, void *arg, int *retries)
-{
- int err;
-
- do {
- err = uv_thread_create(thread, thread_func, arg);
- if (err == 0)
- break;
- sleep_usec(MAX_THREAD_CREATE_WAIT_MS * USEC_PER_MS);
- } while (err == UV_EAGAIN && ++(*retries) < MAX_THREAD_CREATE_RETRIES);
-
- return err;
-}
-
void libuv_close_callback(uv_handle_t *handle, void *data __maybe_unused)
{
// Only close handles that aren't already closing
diff --git a/src/daemon/libuv_workers.h b/src/daemon/libuv_workers.h
index 970e52d5de8790..364f19ae565e9b 100644
--- a/src/daemon/libuv_workers.h
+++ b/src/daemon/libuv_workers.h
@@ -87,7 +87,6 @@ enum event_loop_job {
};
void register_libuv_worker_jobs();
-int create_uv_thread(uv_thread_t *thread, uv_thread_cb thread_func, void *arg, int *retries);
void libuv_close_callback(uv_handle_t *handle, void *data __maybe_unused);
#endif //NETDATA_EVENT_LOOP_H
diff --git a/src/daemon/main.c b/src/daemon/main.c
index f7e61101a86e80..82349d9aa6a960 100644
--- a/src/daemon/main.c
+++ b/src/daemon/main.c
@@ -1069,7 +1069,7 @@ int netdata_main(int argc, char **argv) {
if(st->enabled) {
netdata_log_debug(D_SYSTEM, "Starting thread %s.", st->name);
- st->thread = nd_thread_create(st->name, NETDATA_THREAD_OPTION_JOINABLE, st->start_routine, st);
+ st->thread = nd_thread_create(st->name, NETDATA_THREAD_OPTION_DEFAULT, st->start_routine, st);
}
else
netdata_log_debug(D_SYSTEM, "Not starting thread %s.", st->name);
@@ -1120,7 +1120,7 @@ int netdata_main(int argc, char **argv) {
struct netdata_static_thread *st = &static_threads[i];
st->enabled = 1;
netdata_log_debug(D_SYSTEM, "Starting thread %s.", st->name);
- st->thread = nd_thread_create(st->name, NETDATA_THREAD_OPTION_JOINABLE, st->start_routine, st);
+ st->thread = nd_thread_create(st->name, NETDATA_THREAD_OPTION_DEFAULT, st->start_routine, st);
}
}
}
diff --git a/src/daemon/winsvc.cc b/src/daemon/winsvc.cc
index d963de299bdb02..8eef26ea13389a 100644
--- a/src/daemon/winsvc.cc
+++ b/src/daemon/winsvc.cc
@@ -141,7 +141,7 @@ static void WINAPI ServiceControlHandler(DWORD controlCode)
netdata_service_log("Creating cleanup thread...");
char tag[NETDATA_THREAD_TAG_MAX + 1];
snprintfz(tag, NETDATA_THREAD_TAG_MAX, "%s", "CLEANUP");
- cleanup_thread = nd_thread_create(tag, NETDATA_THREAD_OPTION_JOINABLE, call_netdata_cleanup, &controlCode);
+ cleanup_thread = nd_thread_create(tag, NETDATA_THREAD_OPTION_DEFAULT, call_netdata_cleanup, &controlCode);
// Signal the stop request
netdata_service_log("Signalling the cleanup thread...");
diff --git a/src/database/engine/cache.c b/src/database/engine/cache.c
index b16cb57ea74e11..6ad579dd43aef2 100644
--- a/src/database/engine/cache.c
+++ b/src/database/engine/cache.c
@@ -2085,7 +2085,7 @@ PGC *pgc_create(const char *name,
// last create the eviction thread
{
completion_init(&cache->evictor.completion);
- cache->evictor.thread = nd_thread_create(name, NETDATA_THREAD_OPTION_JOINABLE, pgc_evict_thread, cache);
+ cache->evictor.thread = nd_thread_create(name, NETDATA_THREAD_OPTION_DEFAULT, pgc_evict_thread, cache);
}
return cache;
@@ -2872,7 +2872,7 @@ void unittest_stress_test(void) {
pthread_t service_thread;
nd_thread_create(&service_thread, "SERVICE",
- NETDATA_THREAD_OPTION_JOINABLE | NETDATA_THREAD_OPTION_DONT_LOG,
+ NETDATA_THREAD_OPTION_DONT_LOG,
unittest_stress_test_service, NULL);
pthread_t collect_threads[pgc_uts.collect_threads];
@@ -2882,7 +2882,7 @@ void unittest_stress_test(void) {
char buffer[100 + 1];
snprintfz(buffer, sizeof(buffer) - 1, "COLLECT_%zu", i);
nd_thread_create(&collect_threads[i], buffer,
- NETDATA_THREAD_OPTION_JOINABLE | NETDATA_THREAD_OPTION_DONT_LOG,
+ NETDATA_THREAD_OPTION_DONT_LOG,
unittest_stress_test_collector, &collect_thread_ids[i]);
}
@@ -2895,7 +2895,7 @@ void unittest_stress_test(void) {
snprintfz(buffer, sizeof(buffer) - 1, "QUERY_%zu", i);
initstate_r(1, pgc_uts.rand_statebufs, 1024, &pgc_uts.random_data[i]);
nd_thread_create(&queries_threads[i], buffer,
- NETDATA_THREAD_OPTION_JOINABLE | NETDATA_THREAD_OPTION_DONT_LOG,
+ NETDATA_THREAD_OPTION_DONT_LOG,
unittest_stress_test_queries, &query_thread_ids[i]);
}
diff --git a/src/database/engine/mrg-unittest.c b/src/database/engine/mrg-unittest.c
index 55b633f9d4bb29..57a7701742b129 100644
--- a/src/database/engine/mrg-unittest.c
+++ b/src/database/engine/mrg-unittest.c
@@ -173,7 +173,7 @@ int mrg_unittest(void) {
for(size_t i = 0; i < threads ; i++) {
char buf[15 + 1];
snprintfz(buf, sizeof(buf) - 1, "TH[%zu]", i);
- th[i] = nd_thread_create(buf, NETDATA_THREAD_OPTION_JOINABLE | NETDATA_THREAD_OPTION_DONT_LOG, mrg_stress, &t);
+ th[i] = nd_thread_create(buf, NETDATA_THREAD_OPTION_DONT_LOG, mrg_stress, &t);
}
sleep_usec(run_for_secs * USEC_PER_SEC);
diff --git a/src/database/engine/rrdengine.c b/src/database/engine/rrdengine.c
index 0ad8127e79d4fe..68ef0b2d33bbf0 100644
--- a/src/database/engine/rrdengine.c
+++ b/src/database/engine/rrdengine.c
@@ -31,7 +31,7 @@ static inline void worker_dispatch_extent_read(struct rrdeng_cmd cmd, bool from_
static inline void worker_dispatch_query_prep(struct rrdeng_cmd cmd, bool from_worker);
struct rrdeng_main {
- uv_thread_t thread;
+ ND_THREAD *thread;
uv_loop_t loop;
uv_async_t async;
uv_timer_t timer;
@@ -1339,7 +1339,7 @@ static void *flush_dirty_pages_of_section_tp_worker(struct rrdengine_instance *c
struct mrg_load_thread {
int max_threads;
- uv_thread_t thread;
+ ND_THREAD *thread;
uv_sem_t *sem;
int tier;
struct rrdengine_datafile *datafile;
@@ -1350,7 +1350,7 @@ struct mrg_load_thread {
size_t max_running_threads = 0;
size_t running_threads = 0;
-void journalfile_v2_populate_retention_to_mrg_worker(void *arg)
+void *journalfile_v2_populate_retention_to_mrg_worker(void *arg)
{
struct mrg_load_thread *mlt = arg;
uv_sem_wait(mlt->sem);
@@ -1374,6 +1374,7 @@ void journalfile_v2_populate_retention_to_mrg_worker(void *arg)
// Signal completion - this needs to be last
__atomic_store_n(&mlt->finished, true, __ATOMIC_RELEASE);
+ return NULL;
}
static void after_populate_mrg(struct rrdengine_instance *ctx __maybe_unused, void *data __maybe_unused, struct completion *completion __maybe_unused, uv_work_t* req __maybe_unused, int status __maybe_unused) {
@@ -1444,7 +1445,7 @@ static void *populate_mrg_tp_worker(
if (__atomic_load_n(&mlt[index].finished, __ATOMIC_RELAXED) &&
__atomic_load_n(&mlt[index].tier, __ATOMIC_ACQUIRE) == tier) {
- rc = uv_thread_join(&(mlt[index].thread));
+ rc = nd_thread_join(mlt[index].thread);
if (rc)
nd_log_daemon(NDLP_WARNING, "Failed to join thread, rc = %d", rc);
@@ -1479,11 +1480,10 @@ static void *populate_mrg_tp_worker(
__atomic_store_n(&mlt[thread_index].tier, tier, __ATOMIC_RELAXED);
mlt[thread_index].datafile = datafile;
- rc = uv_thread_create(&mlt[thread_index].thread,
- journalfile_v2_populate_retention_to_mrg_worker,
- &mlt[thread_index]);
+ mlt[thread_index].thread = nd_thread_create("MRGLOAD", NETDATA_THREAD_OPTION_DEFAULT, journalfile_v2_populate_retention_to_mrg_worker,
+ &mlt[thread_index]);
- if (rc) {
+ if (!mlt[thread_index].thread) {
nd_log_daemon(NDLP_WARNING, "Failed to create thread, rc = %d", rc);
__atomic_store_n(&mlt[thread_index].busy, false, __ATOMIC_RELEASE);
spinlock_unlock(&datafile->populate_mrg.spinlock);
@@ -1504,7 +1504,7 @@ static void *populate_mrg_tp_worker(
if (__atomic_load_n(&mlt[index].finished, __ATOMIC_RELAXED)) {
// Thread is finished, join it
- rc = uv_thread_join(&(mlt[index].thread));
+ rc = nd_thread_join((mlt[index].thread));
if (rc)
nd_log_daemon(NDLP_WARNING, "Failed to join thread, rc = %d", rc);
@@ -1958,11 +1958,13 @@ bool rrdeng_dbengine_spawn(struct rrdengine_instance *ctx __maybe_unused) {
dbengine_initialize_structures();
int retries = 0;
- int create_uv_thread_rc = create_uv_thread(&rrdeng_main.thread, dbengine_event_loop, &rrdeng_main, &retries);
- if (create_uv_thread_rc)
- nd_log_daemon(NDLP_ERR, "Failed to create DBENGINE thread, error %s, after %d retries", uv_err_name(create_uv_thread_rc), retries);
+// int create_uv_thread_rc = create_uv_thread(&rrdeng_main.thread, dbengine_event_loop, &rrdeng_main, &retries);
+ rrdeng_main.thread = nd_thread_create("DBEV", NETDATA_THREAD_OPTION_DEFAULT, dbengine_event_loop, &rrdeng_main);
+
+// if (!rrdeng_main.thread)
+// nd_log_daemon(NDLP_ERR, "Failed to create DBENGINE thread, error %s, after %d retries", uv_err_name(create_uv_thread_rc), retries);
- fatal_assert(0 == create_uv_thread_rc);
+ fatal_assert(0 != rrdeng_main.thread);
if (retries)
nd_log_daemon(NDLP_WARNING, "DBENGINE thread was created after %d attempts", retries);
@@ -2036,10 +2038,10 @@ void rrdeng_calculate_tier_disk_space_percentage(void)
(!__atomic_load_n(&(ctx)->atomic.migration_to_v2_running, __ATOMIC_RELAXED) && \
!__atomic_load_n(&(ctx)->atomic.now_deleting_files, __ATOMIC_RELAXED))
-void dbengine_event_loop(void* arg) {
+void *dbengine_event_loop(void* arg) {
sanity_check();
uv_thread_set_name_np("DBENGINE");
- service_register(SERVICE_THREAD_TYPE_EVENT_LOOP, NULL, NULL, NULL, true);
+ service_register(NULL, NULL, NULL);
worker_register("DBENGINE");
@@ -2265,13 +2267,14 @@ void dbengine_event_loop(void* arg) {
nd_log(NDLS_DAEMON, NDLP_DEBUG, "Shutting down dbengine thread");
(void) uv_loop_close(&main->loop);
worker_unregister();
+ return NULL;
}
void dbengine_shutdown()
{
rrdeng_enq_cmd(NULL, RRDENG_OPCODE_SHUTDOWN_EVLOOP, NULL, NULL, STORAGE_PRIORITY_INTERNAL_DBENGINE, NULL, NULL);
- int rc = uv_thread_join(&rrdeng_main.thread);
+ int rc = nd_thread_join(rrdeng_main.thread);
if (rc)
nd_log_daemon(NDLP_ERR, "DBENGINE: Failed to join thread, error %s", uv_err_name(rc));
else
diff --git a/src/database/engine/rrdengine.h b/src/database/engine/rrdengine.h
index 6f81568f19f5df..9046ee9d2374e6 100644
--- a/src/database/engine/rrdengine.h
+++ b/src/database/engine/rrdengine.h
@@ -465,7 +465,7 @@ bool rrdeng_ctx_tier_cap_exceeded(struct rrdengine_instance *ctx);
int init_rrd_files(struct rrdengine_instance *ctx);
void finalize_rrd_files(struct rrdengine_instance *ctx);
bool rrdeng_dbengine_spawn(struct rrdengine_instance *ctx);
-void dbengine_event_loop(void *arg);
+void *dbengine_event_loop(void *arg);
typedef void (*enqueue_callback_t)(struct rrdeng_cmd *cmd);
typedef void (*dequeue_callback_t)(struct rrdeng_cmd *cmd);
diff --git a/src/database/sqlite/sqlite_aclk.c b/src/database/sqlite/sqlite_aclk.c
index beff86a77756bf..a00f7bca9433f8 100644
--- a/src/database/sqlite/sqlite_aclk.c
+++ b/src/database/sqlite/sqlite_aclk.c
@@ -37,7 +37,7 @@ static void create_node_instance_result_job(const char *machine_guid, const char
}
struct aclk_sync_config_s {
- uv_thread_t thread;
+ ND_THREAD *thread;
uv_loop_t loop;
uv_timer_t timer_req;
uv_async_t async;
@@ -617,14 +617,14 @@ static void free_query_list(Pvoid_t JudyL)
(!shutdown_requested || config->aclk_queries_running || config->alert_push_running || \
config->aclk_batch_job_is_running)
-static void aclk_synchronization_event_loop(void *arg)
+static void *aclk_synchronization_event_loop(void *arg)
{
struct aclk_sync_config_s *config = arg;
uv_thread_set_name_np("ACLKSYNC");
config->ar = aral_by_size_acquire(sizeof(struct aclk_database_cmd));
worker_register("ACLKSYNC");
- service_register(SERVICE_THREAD_TYPE_EVENT_LOOP, NULL, NULL, NULL, true);
+ service_register(NULL, NULL, NULL);
worker_register_job_name(ACLK_DATABASE_NOOP, "noop");
worker_register_job_name(ACLK_DATABASE_NODE_STATE, "node state");
@@ -955,6 +955,7 @@ static void aclk_synchronization_event_loop(void *arg)
worker_unregister();
service_exits();
netdata_log_info("ACLK SYNC: Shutting down ACLK synchronization event loop");
+ return NULL;
}
static void aclk_initialize_event_loop(void)
@@ -962,17 +963,10 @@ static void aclk_initialize_event_loop(void)
memset(&aclk_sync_config, 0, sizeof(aclk_sync_config));
completion_init(&aclk_sync_config.start_stop_complete);
- int retries = 0;
- int create_uv_thread_rc = create_uv_thread(&aclk_sync_config.thread, aclk_synchronization_event_loop, &aclk_sync_config, &retries);
- if (create_uv_thread_rc)
- nd_log_daemon(NDLP_ERR, "Failed to create ACLK synchronization thread, error %s, after %d retries", uv_err_name(create_uv_thread_rc), retries);
+ aclk_sync_config.thread = nd_thread_create("ACLKSYNC", NETDATA_THREAD_OPTION_DEFAULT, aclk_synchronization_event_loop, &aclk_sync_config);
+ fatal_assert(NULL != aclk_sync_config.thread);
- fatal_assert(0 == create_uv_thread_rc);
-
- if (retries)
- nd_log_daemon(NDLP_WARNING, "ACLK synchronization thread was created after %d attempts", retries);
completion_wait_for(&aclk_sync_config.start_stop_complete);
-
// Keep completion, just reset it for next use during shutdown
completion_reset(&aclk_sync_config.start_stop_complete);
}
@@ -1074,9 +1068,9 @@ void aclk_synchronization_shutdown(void)
completion_wait_for(&aclk_sync_config.start_stop_complete);
completion_destroy(&aclk_sync_config.start_stop_complete);
- int rc = uv_thread_join(&aclk_sync_config.thread);
+ int rc = nd_thread_join(aclk_sync_config.thread);
if (rc)
- nd_log_daemon(NDLP_ERR, "ACLK: Failed to join synchronization thread, error %s", uv_err_name(rc));
+ nd_log_daemon(NDLP_ERR, "ACLK: Failed to join synchronization thread");
else
nd_log_daemon(NDLP_INFO, "ACLK: synchronization thread shutdown completed");
}
diff --git a/src/database/sqlite/sqlite_metadata.c b/src/database/sqlite/sqlite_metadata.c
index 668cdc17d432de..4ae08afe2754f6 100644
--- a/src/database/sqlite/sqlite_metadata.c
+++ b/src/database/sqlite/sqlite_metadata.c
@@ -223,7 +223,7 @@ struct metadata_cmd {
};
struct meta_config_s {
- uv_thread_t thread;
+ ND_THREAD *thread;
uv_loop_t loop;
uv_async_t async;
uv_timer_t timer_req;
@@ -1772,7 +1772,7 @@ struct work_payload {
};
struct host_context_load_thread {
- uv_thread_t thread;
+ ND_THREAD *thread;
RRDHOST *host;
sqlite3 *db_meta_thread;
sqlite3 *db_context_thread;
@@ -1784,13 +1784,13 @@ __thread sqlite3 *db_meta_thread = NULL;
__thread sqlite3 *db_context_thread = NULL;
__thread bool main_context_thread = false;
-static void restore_host_context(void *arg)
+static void *restore_host_context(void *arg)
{
struct host_context_load_thread *hclt = arg;
RRDHOST *host = hclt->host;
if (!host)
- return;
+ return NULL;
if (!db_meta_thread) {
if (hclt->db_meta_thread) {
@@ -1837,6 +1837,7 @@ static void restore_host_context(void *arg)
}
__atomic_store_n(&hclt->finished, true, __ATOMIC_RELEASE);
+ return NULL;
}
// Callback after scan of hosts is done
@@ -1866,7 +1867,7 @@ static bool cleanup_finished_threads(struct host_context_load_thread *hclt, size
if (__atomic_load_n(&(hclt[index].finished), __ATOMIC_RELAXED) ||
(wait && __atomic_load_n(&(hclt[index].busy), __ATOMIC_ACQUIRE))) {
- int rc = uv_thread_join(&(hclt[index].thread));
+ int rc = nd_thread_join(hclt[index].thread);
if (rc)
nd_log_daemon(NDLP_WARNING, "Failed to join thread, rc = %d", rc);
__atomic_store_n(&(hclt[index].busy), false, __ATOMIC_RELEASE);
@@ -1933,8 +1934,8 @@ static void ctx_hosts_load(uv_work_t *req)
if (thread_found) {
__atomic_store_n(&hclt[thread_index].busy, true, __ATOMIC_RELAXED);
hclt[thread_index].host = host;
- rc = uv_thread_create(&hclt[thread_index].thread, restore_host_context, &hclt[thread_index]);
- async_exec += (rc == 0);
+ hclt[thread_index].thread = nd_thread_create("CTXLOAD", NETDATA_THREAD_OPTION_DEFAULT, restore_host_context, &hclt[thread_index]);
+ async_exec += (hclt[thread_index].thread != NULL);
// if it failed, mark the thread slot as free
if (rc)
__atomic_store_n(&hclt[thread_index].busy, false, __ATOMIC_RELAXED);
@@ -2153,8 +2154,6 @@ static void metadata_scan_host(RRDHOST *host, BUFFER *work_buffer, bool is_worke
SQLITE_FINALIZE(ml_load_stmt);
SQLITE_FINALIZE(store_dimension);
SQLITE_FINALIZE(store_chart);
-
- return;
}
@@ -2481,10 +2480,11 @@ static void start_metadata_hosts(uv_work_t *req)
#define MAX_SHUTDOWN_TIMEOUT_SECONDS (10)
#define SHUTDOWN_SLEEP_INTERVAL_MS (100)
-static void metadata_event_loop(void *arg)
+static void *metadata_event_loop(void *arg)
{
struct meta_config_s *config = arg;
uv_thread_set_name_np(EVENT_LOOP_NAME);
+ service_register(NULL, NULL, NULL);
worker_register(EVENT_LOOP_NAME);
config->ar = aral_by_size_acquire(sizeof(struct metadata_cmd));
@@ -2714,6 +2714,8 @@ static void metadata_event_loop(void *arg)
worker_unregister();
completion_mark_complete(&config->start_stop_complete);
+
+ return NULL;
}
void metadata_sync_shutdown(void)
@@ -2728,9 +2730,10 @@ void metadata_sync_shutdown(void)
nd_log_daemon(NDLP_DEBUG, "METADATA: Waiting for shutdown ACK");
completion_wait_for(&meta_config.start_stop_complete);
completion_destroy(&meta_config.start_stop_complete);
- int rc = uv_thread_join(&meta_config.thread);
+
+ int rc = nd_thread_join(meta_config.thread);
if (rc)
- nd_log_daemon(NDLP_ERR, "METADATA: Failed to join synchronization thread, error %s", uv_err_name(rc));
+ nd_log_daemon(NDLP_ERR, "METADATA: Failed to join synchronization thread");
else
nd_log_daemon(NDLP_INFO, "METADATA: synchronization thread shutdown completed");
}
@@ -2743,15 +2746,8 @@ void metadata_sync_init(void)
memset(&meta_config, 0, sizeof(meta_config));
completion_init(&meta_config.start_stop_complete);
- int retries = 0;
- int create_uv_thread_rc = create_uv_thread(&meta_config.thread, metadata_event_loop, &meta_config, &retries);
- if (create_uv_thread_rc)
- nd_log_daemon(NDLP_ERR, "Failed to create SQLite metadata sync thread, error %s, after %d retries", uv_err_name(create_uv_thread_rc), retries);
-
- fatal_assert(0 == create_uv_thread_rc);
-
- if (retries)
- nd_log_daemon(NDLP_WARNING, "SQLite metadata sync thread was created after %d attempts", retries);
+ meta_config.thread = nd_thread_create("METASYNC", NETDATA_THREAD_OPTION_DEFAULT, metadata_event_loop, &meta_config);
+ fatal_assert(NULL != meta_config.thread);
// Wait for initialization
completion_wait_for(&meta_config.start_stop_complete);
@@ -2991,11 +2987,7 @@ static void *metadata_unittest_threads(void)
for (int i = 0; i < threads_to_create; i++) {
char buf[100 + 1];
snprintf(buf, sizeof(buf) - 1, "META[%d]", i);
- threads[i] = nd_thread_create(
- buf,
- NETDATA_THREAD_OPTION_DONT_LOG | NETDATA_THREAD_OPTION_JOINABLE,
- unittest_queue_metadata,
- &tu);
+ threads[i] = nd_thread_create(buf, NETDATA_THREAD_OPTION_DONT_LOG, unittest_queue_metadata, &tu);
}
(void) uv_async_send(&meta_config.async);
sleep_usec(seconds_to_run * USEC_PER_SEC);
diff --git a/src/exporting/init_connectors.c b/src/exporting/init_connectors.c
index d83fee0bc9c091..5c9b698bac0f05 100644
--- a/src/exporting/init_connectors.c
+++ b/src/exporting/init_connectors.c
@@ -105,7 +105,7 @@ int init_connectors(struct engine *engine)
char threadname[ND_THREAD_TAG_MAX + 1];
snprintfz(threadname, ND_THREAD_TAG_MAX, "%s[%zu]", instance->config.thread_tag, instance->index);
- instance->thread = nd_thread_create(threadname, NETDATA_THREAD_OPTION_JOINABLE, instance->worker, instance);
+ instance->thread = nd_thread_create(threadname, NETDATA_THREAD_OPTION_DEFAULT, instance->worker, instance);
if (!instance->thread) {
netdata_log_error("EXPORTING: cannot create thread worker for instance %s", instance->config.name);
instance->exited = 1;
diff --git a/src/libnetdata/aral/aral.c b/src/libnetdata/aral/aral.c
index d9f1e2e41cbcd1..2316dc8d087d32 100644
--- a/src/libnetdata/aral/aral.c
+++ b/src/libnetdata/aral/aral.c
@@ -1532,11 +1532,7 @@ int aral_stress_test(size_t threads, size_t elements, size_t seconds) {
for(size_t i = 0; i < threads ; i++) {
char tag[ND_THREAD_TAG_MAX + 1];
snprintfz(tag, ND_THREAD_TAG_MAX, "TH[%zu]", i);
- thread_ptrs[i] = nd_thread_create(
- tag,
- NETDATA_THREAD_OPTION_JOINABLE | NETDATA_THREAD_OPTION_DONT_LOG,
- aral_test_thread,
- &auc);
+ thread_ptrs[i] = nd_thread_create(tag, NETDATA_THREAD_OPTION_DONT_LOG, aral_test_thread, &auc);
}
size_t malloc_done = 0;
diff --git a/src/libnetdata/dictionary/dictionary-unittest.c b/src/libnetdata/dictionary/dictionary-unittest.c
index 41fbbf71165f46..1c6ec5bf093c48 100644
--- a/src/libnetdata/dictionary/dictionary-unittest.c
+++ b/src/libnetdata/dictionary/dictionary-unittest.c
@@ -690,11 +690,7 @@ static int dictionary_unittest_threads() {
char buf[100 + 1];
snprintf(buf, 100, "dict%d", i);
- tu[i].thread = nd_thread_create(
- buf,
- NETDATA_THREAD_OPTION_DONT_LOG | NETDATA_THREAD_OPTION_JOINABLE,
- unittest_dict_thread,
- &tu[i]);
+ tu[i].thread = nd_thread_create(buf, NETDATA_THREAD_OPTION_DONT_LOG, unittest_dict_thread, &tu[i]);
}
sleep_usec(seconds_to_run * USEC_PER_SEC);
@@ -871,17 +867,9 @@ static int dictionary_unittest_view_threads() {
ND_THREAD *master_thread, *view_thread;
tv.join = 0;
- master_thread = nd_thread_create(
- "master",
- NETDATA_THREAD_OPTION_DONT_LOG | NETDATA_THREAD_OPTION_JOINABLE,
- unittest_dict_master_thread,
- &tv);
-
- view_thread = nd_thread_create(
- "view",
- NETDATA_THREAD_OPTION_DONT_LOG | NETDATA_THREAD_OPTION_JOINABLE,
- unittest_dict_view_thread,
- &tv);
+ master_thread = nd_thread_create("master", NETDATA_THREAD_OPTION_DONT_LOG, unittest_dict_master_thread, &tv);
+
+ view_thread = nd_thread_create("view", NETDATA_THREAD_OPTION_DONT_LOG, unittest_dict_view_thread, &tv);
sleep_usec(seconds_to_run * USEC_PER_SEC);
diff --git a/src/libnetdata/functions_evloop/functions_evloop.c b/src/libnetdata/functions_evloop/functions_evloop.c
index 8ffeac73e8d0be..23729ea1aa97ea 100644
--- a/src/libnetdata/functions_evloop/functions_evloop.c
+++ b/src/libnetdata/functions_evloop/functions_evloop.c
@@ -362,12 +362,12 @@ struct functions_evloop_globals *functions_evloop_init(size_t worker_threads, co
char tag_buffer[NETDATA_THREAD_TAG_MAX + 1];
snprintfz(tag_buffer, NETDATA_THREAD_TAG_MAX, "%s_READER", wg->tag);
- wg->reader_thread = nd_thread_create(tag_buffer, NETDATA_THREAD_OPTION_DONT_LOG | NETDATA_THREAD_OPTION_JOINABLE,
+ wg->reader_thread = nd_thread_create(tag_buffer, NETDATA_THREAD_OPTION_DONT_LOG,
rrd_functions_worker_globals_reader_main, wg);
for(size_t i = 0; i < wg->workers ; i++) {
snprintfz(tag_buffer, NETDATA_THREAD_TAG_MAX, "%s_WORK[%zu]", wg->tag, i+1);
- wg->worker_threads[i] = nd_thread_create(tag_buffer, NETDATA_THREAD_OPTION_DONT_LOG | NETDATA_THREAD_OPTION_JOINABLE,
+ wg->worker_threads[i] = nd_thread_create(tag_buffer, NETDATA_THREAD_OPTION_DONT_LOG,
rrd_functions_worker_globals_worker_main, wg);
}
diff --git a/src/libnetdata/local-sockets/local-sockets.h b/src/libnetdata/local-sockets/local-sockets.h
index f6cb237d68a286..a93290ff18013f 100644
--- a/src/libnetdata/local-sockets/local-sockets.h
+++ b/src/libnetdata/local-sockets/local-sockets.h
@@ -1671,7 +1671,7 @@ static inline void local_sockets_namespaces(LS_STATE *ls) {
workers_data[last_thread].inode = inode;
workers[last_thread] = nd_thread_create(
"local-sockets-worker",
- NETDATA_THREAD_OPTION_JOINABLE,
+ NETDATA_THREAD_OPTION_DEFAULT,
local_sockets_get_namespace_sockets_worker,
&workers_data[last_thread]);
diff --git a/src/libnetdata/locks/benchmark-rw.c b/src/libnetdata/locks/benchmark-rw.c
index deb19d05d3374e..028b852b5e9923 100644
--- a/src/libnetdata/locks/benchmark-rw.c
+++ b/src/libnetdata/locks/benchmark-rw.c
@@ -346,11 +346,8 @@ int rwlocks_stress_test(void) {
};
snprintf(thr_name, sizeof(thr_name), "pthread_rw%d", i);
- pthread_contexts[i].thread = nd_thread_create(
- thr_name,
- NETDATA_THREAD_OPTION_DONT_LOG | NETDATA_THREAD_OPTION_JOINABLE,
- benchmark_thread,
- &pthread_contexts[i]);
+ pthread_contexts[i].thread =
+ nd_thread_create(thr_name, NETDATA_THREAD_OPTION_DONT_LOG, benchmark_thread, &pthread_contexts[i]);
// Initialize spinlock contexts
spinlock_contexts[i] = (thread_context_t){
@@ -362,11 +359,8 @@ int rwlocks_stress_test(void) {
};
snprintf(thr_name, sizeof(thr_name), "spin_rw%d", i);
- spinlock_contexts[i].thread = nd_thread_create(
- thr_name,
- NETDATA_THREAD_OPTION_DONT_LOG | NETDATA_THREAD_OPTION_JOINABLE,
- benchmark_thread,
- &spinlock_contexts[i]);
+ spinlock_contexts[i].thread =
+ nd_thread_create(thr_name, NETDATA_THREAD_OPTION_DONT_LOG, benchmark_thread, &spinlock_contexts[i]);
}
// Run all configurations
diff --git a/src/libnetdata/locks/benchmark.c b/src/libnetdata/locks/benchmark.c
index e207763f8799bf..17f28ca0c13000 100644
--- a/src/libnetdata/locks/benchmark.c
+++ b/src/libnetdata/locks/benchmark.c
@@ -374,7 +374,7 @@ int locks_stress_test(void) {
snprintf(thr_name, sizeof(thr_name), "%s%d", lock_names[type], i);
threads[type][i].thread = nd_thread_create(
thr_name,
- NETDATA_THREAD_OPTION_DONT_LOG | NETDATA_THREAD_OPTION_JOINABLE,
+ NETDATA_THREAD_OPTION_DONT_LOG,
benchmark_thread,
&threads[type][i]);
}
diff --git a/src/libnetdata/locks/waitq.c b/src/libnetdata/locks/waitq.c
index a5ca235335388d..d7970be6bb9f9d 100644
--- a/src/libnetdata/locks/waitq.c
+++ b/src/libnetdata/locks/waitq.c
@@ -247,10 +247,7 @@ static int unittest_stress(void) {
char thread_name[32];
snprintf(thread_name, sizeof(thread_name), "STRESS%d-%d", prio, t);
threads[thread_idx] = nd_thread_create(
- thread_name,
- NETDATA_THREAD_OPTION_DONT_LOG | NETDATA_THREAD_OPTION_JOINABLE,
- stress_thread,
- &thread_args[thread_idx]);
+ thread_name, NETDATA_THREAD_OPTION_DONT_LOG, stress_thread, &thread_args[thread_idx]);
thread_idx++;
}
}
diff --git a/src/libnetdata/spawn_server/log-forwarder.c b/src/libnetdata/spawn_server/log-forwarder.c
index 1e2958a09b0afc..7bfbe39942263e 100644
--- a/src/libnetdata/spawn_server/log-forwarder.c
+++ b/src/libnetdata/spawn_server/log-forwarder.c
@@ -71,7 +71,7 @@ LOG_FORWARDER *log_forwarder_start(void) {
nd_log(NDLS_COLLECTORS, NDLP_ERR, "Log forwarder: Failed to set non-blocking mode");
lf->running = true;
- lf->thread = nd_thread_create("log-fw", NETDATA_THREAD_OPTION_JOINABLE, log_forwarder_thread_func, lf);
+ lf->thread = nd_thread_create("log-fw", NETDATA_THREAD_OPTION_DEFAULT, log_forwarder_thread_func, lf);
nd_log(NDLS_COLLECTORS, NDLP_INFO, "Log forwarder: created thread pointer: %p", lf->thread);
diff --git a/src/libnetdata/string/string.c b/src/libnetdata/string/string.c
index caae6dcc7f0e59..20a543a9a79135 100644
--- a/src/libnetdata/string/string.c
+++ b/src/libnetdata/string/string.c
@@ -876,7 +876,7 @@ int string_unittest(size_t entries) {
for (int i = 0; i < threads_to_create; i++) {
char buf[100 + 1];
snprintf(buf, 100, "string%d", i);
- threads[i] = nd_thread_create(buf, NETDATA_THREAD_OPTION_DONT_LOG | NETDATA_THREAD_OPTION_JOINABLE, string_thread, &tu);
+ threads[i] = nd_thread_create(buf, NETDATA_THREAD_OPTION_DONT_LOG, string_thread, &tu);
}
sleep_usec(seconds_to_run * USEC_PER_SEC);
diff --git a/src/libnetdata/threads/threads.c b/src/libnetdata/threads/threads.c
index 82cc6f8902a80a..3a556ab0e34eef 100644
--- a/src/libnetdata/threads/threads.c
+++ b/src/libnetdata/threads/threads.c
@@ -22,7 +22,7 @@ struct nd_thread {
void *ret; // the return value of start routine
void *(*start_routine) (void *);
NETDATA_THREAD_OPTIONS options;
- pthread_t thread;
+ uv_thread_t thread;
bool cancel_atomic;
#ifdef NETDATA_INTERNAL_CHECKS
@@ -339,15 +339,13 @@ static void nd_thread_exit(ND_THREAD *nti) {
}
spinlock_unlock(&threads_globals.running.spinlock);
- //if (nd_thread_status_check(nti, NETDATA_THREAD_OPTION_JOINABLE) != NETDATA_THREAD_OPTION_JOINABLE) {
spinlock_lock(&threads_globals.exited.spinlock);
DOUBLE_LINKED_LIST_APPEND_ITEM_UNSAFE(threads_globals.exited.list, nti, prev, next);
nti->list = ND_THREAD_LIST_EXITED;
spinlock_unlock(&threads_globals.exited.spinlock);
- //}
}
-static void *nd_thread_starting_point(void *ptr) {
+static void nd_thread_starting_point(void *ptr) {
ND_THREAD *nti = _nd_thread_info = (ND_THREAD *)ptr;
nd_thread_status_set(nti, NETDATA_THREAD_STATUS_STARTED);
@@ -357,12 +355,6 @@ static void *nd_thread_starting_point(void *ptr) {
if(nd_thread_status_check(nti, NETDATA_THREAD_OPTION_DONT_LOG_STARTUP) != NETDATA_THREAD_OPTION_DONT_LOG_STARTUP)
nd_log(NDLS_DAEMON, NDLP_DEBUG, "thread created with task id %d", gettid_cached());
- if(pthread_setcanceltype(PTHREAD_CANCEL_DEFERRED, NULL) != 0)
- nd_log(NDLS_DAEMON, NDLP_WARNING, "cannot set pthread cancel type to DEFERRED.");
-
- if(pthread_setcancelstate(PTHREAD_CANCEL_ENABLE, NULL) != 0)
- nd_log(NDLS_DAEMON, NDLP_WARNING, "cannot set pthread cancel state to ENABLE.");
-
signals_block_all_except_deadly();
spinlock_lock(&threads_globals.running.spinlock);
@@ -374,7 +366,6 @@ static void *nd_thread_starting_point(void *ptr) {
nti->ret = nti->start_routine(nti->arg);
nd_thread_exit(nti);
- return nti;
}
ND_THREAD *nd_thread_self(void) {
@@ -385,6 +376,26 @@ bool nd_thread_is_me(ND_THREAD *nti) {
return nti && nti->thread == pthread_self();
}
+
+// utils
+#define MAX_THREAD_CREATE_RETRIES (10)
+#define MAX_THREAD_CREATE_WAIT_MS (1000)
+
+static int create_uv_thread(uv_thread_t *thread, uv_thread_cb thread_func, void *arg, int *retries)
+{
+ int err;
+
+ do {
+ err = uv_thread_create(thread, thread_func, arg);
+ if (err == 0)
+ break;
+
+ sleep_usec(MAX_THREAD_CREATE_WAIT_MS * USEC_PER_MS);
+ } while (err == UV_EAGAIN && ++(*retries) < MAX_THREAD_CREATE_RETRIES);
+
+ return err;
+}
+
ND_THREAD *nd_thread_create(const char *tag, NETDATA_THREAD_OPTIONS options, void *(*start_routine)(void *), void *arg)
{
ND_THREAD *nti = callocz(1, sizeof(*nti));
@@ -394,18 +405,18 @@ ND_THREAD *nd_thread_create(const char *tag, NETDATA_THREAD_OPTIONS options, voi
nti->options = (options & NETDATA_THREAD_OPTIONS_ALL);
strncpyz(nti->tag, tag, ND_THREAD_TAG_MAX);
- if ((options & NETDATA_THREAD_OPTION_JOINABLE) == 0)
- nd_log_daemon(NDLP_INFO, "WARNING: Creating detached thread '%s'", tag);
-
- int ret = pthread_create(&nti->thread, &threads_globals.attr, nd_thread_starting_point, nti);
+ int retries = 0;
+ int ret = create_uv_thread(&nti->thread, nd_thread_starting_point, nti, &retries);
if(ret != 0) {
nd_log(NDLS_DAEMON, NDLP_ERR,
- "failed to create new thread for %s. pthread_create() failed with code %d",
+ "failed to create new thread for %s. uv_thread_create() failed with code %d",
tag, ret);
freez(nti);
return NULL;
}
+ if (retries)
+ nd_log_daemon(NDLP_WARNING, "nd_thread_create required %d attempts", retries);
return nti;
}
@@ -451,22 +462,16 @@ int nd_thread_join(ND_THREAD *nti) {
return 0;
}
- int ret = 0;
- bool joinable = nd_thread_status_check(nti, NETDATA_THREAD_OPTION_JOINABLE);
- if (joinable)
- ret = pthread_join(nti->thread, NULL);
- if(ret != 0) {
+ int ret;
+ if((ret = uv_thread_join(&nti->thread))) {
// we can't join the thread
nd_log(NDLS_DAEMON, NDLP_WARNING,
- "cannot join thread. pthread_join() failed with code %d. (tag=%s)",
+ "cannot join thread. uv_thread_join() failed with code %d. (tag=%s)",
ret, nti->tag);
}
else {
// we successfully joined the thread
- if (joinable)
- nd_log(NDLS_DAEMON, NDLP_DEBUG, "Joining thread '%s', tid %d", nti->tag, nti->tid);
-
nd_thread_status_set(nti, NETDATA_THREAD_STATUS_JOINED);
spinlock_lock(&threads_globals.running.spinlock);
@@ -483,10 +488,7 @@ int nd_thread_join(ND_THREAD *nti) {
}
spinlock_unlock(&threads_globals.exited.spinlock);
- if (joinable)
- freez(nti);
- else
- nti->thread = 0;
+ freez(nti);
}
return ret;
diff --git a/src/libnetdata/threads/threads.h b/src/libnetdata/threads/threads.h
index 2da1399220761e..117a8c83c02041 100644
--- a/src/libnetdata/threads/threads.h
+++ b/src/libnetdata/threads/threads.h
@@ -7,15 +7,14 @@
typedef enum __attribute__((packed)) {
NETDATA_THREAD_OPTION_DEFAULT = 0 << 0,
- NETDATA_THREAD_OPTION_JOINABLE = 1 << 0,
- NETDATA_THREAD_OPTION_DONT_LOG_STARTUP = 1 << 1,
- NETDATA_THREAD_OPTION_DONT_LOG_CLEANUP = 1 << 2,
- NETDATA_THREAD_STATUS_STARTED = 1 << 3,
- NETDATA_THREAD_STATUS_FINISHED = 1 << 4,
- NETDATA_THREAD_STATUS_JOINED = 1 << 5,
+ NETDATA_THREAD_OPTION_DONT_LOG_STARTUP = 1 << 0,
+ NETDATA_THREAD_OPTION_DONT_LOG_CLEANUP = 1 << 1,
+ NETDATA_THREAD_STATUS_STARTED = 1 << 2,
+ NETDATA_THREAD_STATUS_FINISHED = 1 << 3,
+ NETDATA_THREAD_STATUS_JOINED = 1 << 4,
} NETDATA_THREAD_OPTIONS;
-#define NETDATA_THREAD_OPTIONS_ALL (NETDATA_THREAD_OPTION_JOINABLE | NETDATA_THREAD_OPTION_DONT_LOG_STARTUP | NETDATA_THREAD_OPTION_DONT_LOG_CLEANUP)
+#define NETDATA_THREAD_OPTIONS_ALL (NETDATA_THREAD_OPTION_DONT_LOG_STARTUP | NETDATA_THREAD_OPTION_DONT_LOG_CLEANUP)
#define NETDATA_THREAD_OPTION_DONT_LOG (NETDATA_THREAD_OPTION_DONT_LOG_STARTUP | NETDATA_THREAD_OPTION_DONT_LOG_CLEANUP)
#define netdata_thread_cleanup_push(func, arg) pthread_cleanup_push(func, arg)
diff --git a/src/libnetdata/uuid/uuidmap.c b/src/libnetdata/uuid/uuidmap.c
index b45d0b949fb27d..6877d6dd4c4469 100644
--- a/src/libnetdata/uuid/uuidmap.c
+++ b/src/libnetdata/uuid/uuidmap.c
@@ -382,7 +382,7 @@ static int uuidmap_concurrent_unittest(void) {
snprintf(thread_name, sizeof(thread_name), "UUID-TEST-%d", i);
threads[i] = nd_thread_create(
thread_name,
- NETDATA_THREAD_OPTION_DONT_LOG | NETDATA_THREAD_OPTION_JOINABLE,
+ NETDATA_THREAD_OPTION_DONT_LOG,
concurrent_test_thread,
&stats[i]);
}
diff --git a/src/ml/ml_public.cc b/src/ml/ml_public.cc
index 1f8a172f18c858..14899645ea4bb5 100644
--- a/src/ml/ml_public.cc
+++ b/src/ml/ml_public.cc
@@ -443,14 +443,12 @@ void ml_start_threads() {
char tag[NETDATA_THREAD_TAG_MAX + 1];
snprintfz(tag, NETDATA_THREAD_TAG_MAX, "%s", "PREDICT");
- Cfg.detection_thread = nd_thread_create(tag, NETDATA_THREAD_OPTION_JOINABLE,
- ml_detect_main, NULL);
+ Cfg.detection_thread = nd_thread_create(tag, NETDATA_THREAD_OPTION_DEFAULT, ml_detect_main, NULL);
for (size_t idx = 0; idx != Cfg.num_worker_threads; idx++) {
ml_worker_t *worker = &Cfg.workers[idx];
snprintfz(tag, NETDATA_THREAD_TAG_MAX, "TRAIN[%zu]", worker->id);
- worker->nd_thread = nd_thread_create(tag, NETDATA_THREAD_OPTION_JOINABLE,
- ml_train_main, worker);
+ worker->nd_thread = nd_thread_create(tag, NETDATA_THREAD_OPTION_DEFAULT, ml_train_main, worker);
}
}
diff --git a/src/plugins.d/plugins_d.c b/src/plugins.d/plugins_d.c
index b8ba446596cba5..51e062696a2a25 100644
--- a/src/plugins.d/plugins_d.c
+++ b/src/plugins.d/plugins_d.c
@@ -400,7 +400,7 @@ void *pluginsd_main(void *ptr) {
snprintfz(tag, NETDATA_THREAD_TAG_MAX, "PD[%s]", pluginname);
// spawn a new thread for it
- cd->unsafe.thread = nd_thread_create(tag, NETDATA_THREAD_OPTION_JOINABLE,
+ cd->unsafe.thread = nd_thread_create(tag, NETDATA_THREAD_OPTION_DEFAULT,
pluginsd_worker_thread, cd);
}
}
diff --git a/src/streaming/stream-connector.c b/src/streaming/stream-connector.c
index dca444ea59a79f..e88c83a46dd595 100644
--- a/src/streaming/stream-connector.c
+++ b/src/streaming/stream-connector.c
@@ -684,7 +684,7 @@ bool stream_connector_init(struct sender_state *s) {
snprintfz(tag, NETDATA_THREAD_TAG_MAX, THREAD_TAG_STREAM_SENDER "-CN" "[%d]",
sc->id);
- sc->thread = nd_thread_create(tag, NETDATA_THREAD_OPTION_JOINABLE, stream_connector_thread, sc);
+ sc->thread = nd_thread_create(tag, NETDATA_THREAD_OPTION_DEFAULT, stream_connector_thread, sc);
if (!sc->thread)
nd_log_daemon(NDLP_ERR,
"STREAM CONNECT '%s': failed to create new thread for client.",
diff --git a/src/streaming/stream-replication-sender.c b/src/streaming/stream-replication-sender.c
index 3a09fdace0abf1..ee2d3b0f920b74 100644
--- a/src/streaming/stream-replication-sender.c
+++ b/src/streaming/stream-replication-sender.c
@@ -1719,7 +1719,7 @@ void *replication_thread_main(void *ptr) {
char tag[NETDATA_THREAD_TAG_MAX + 1];
snprintfz(tag, NETDATA_THREAD_TAG_MAX, "REPLAY[%zu]", i + 2);
__atomic_add_fetch(&replication_buffers_allocated, sizeof(ND_THREAD *), __ATOMIC_RELAXED);
- replication_globals.main_thread.threads_ptrs[i] = nd_thread_create(tag, NETDATA_THREAD_OPTION_JOINABLE,
+ replication_globals.main_thread.threads_ptrs[i] = nd_thread_create(tag, NETDATA_THREAD_OPTION_DEFAULT,
replication_worker_thread, NULL);
}
}
diff --git a/src/streaming/stream-thread.c b/src/streaming/stream-thread.c
index 857ae37050612c..88980745c848e0 100644
--- a/src/streaming/stream-thread.c
+++ b/src/streaming/stream-thread.c
@@ -728,7 +728,7 @@ static struct stream_thread * stream_thread_assign_and_start(RRDHOST *host) {
char tag[NETDATA_THREAD_TAG_MAX + 1];
snprintfz(tag, NETDATA_THREAD_TAG_MAX, THREAD_TAG_STREAM "[%zu]", sth->id);
- sth->thread = nd_thread_create(tag, NETDATA_THREAD_OPTION_JOINABLE, stream_thread, sth);
+ sth->thread = nd_thread_create(tag, NETDATA_THREAD_OPTION_DEFAULT, stream_thread, sth);
if (!sth->thread)
nd_log(NDLS_DAEMON, NDLP_ERR, "STREAM THREAD[%zu]: failed to create new thread for client.", sth->id);
}
diff --git a/src/web/api/queries/backfill.c b/src/web/api/queries/backfill.c
index 28ea7a3be4d0f7..496e2ea715b122 100644
--- a/src/web/api/queries/backfill.c
+++ b/src/web/api/queries/backfill.c
@@ -232,7 +232,7 @@ void *backfill_thread(void *ptr) {
for(size_t t = 0; t < threads - 1 ;t++) {
char tag[15];
snprintfz(tag, sizeof(tag), "BACKFILL[%zu]", t + 1);
- th[t] = nd_thread_create(tag, NETDATA_THREAD_OPTION_JOINABLE, backfill_worker_thread, NULL);
+ th[t] = nd_thread_create(tag, NETDATA_THREAD_OPTION_DEFAULT, backfill_worker_thread, NULL);
}
backfill_worker_thread((void *)0x01);
diff --git a/src/web/server/static/static-threaded.c b/src/web/server/static/static-threaded.c
index 2cd6475b3b202f..963aae58e05dac 100644
--- a/src/web/server/static/static-threaded.c
+++ b/src/web/server/static/static-threaded.c
@@ -384,7 +384,7 @@ void *socket_listen_main_static_threaded(void *ptr) {
char tag[50 + 1];
snprintfz(tag, sizeof(tag) - 1, "WEB[%d]", i+1);
- static_workers_private_data[i].thread = nd_thread_create(tag, NETDATA_THREAD_OPTION_JOINABLE,
+ static_workers_private_data[i].thread = nd_thread_create(tag, NETDATA_THREAD_OPTION_DEFAULT,
socket_listen_main_static_threaded_worker,
(void *)&static_workers_private_data[i]);
}
From c8bede3533b35e5bd4189cf255f695cf980b401e Mon Sep 17 00:00:00 2001
From: netdatabot
Date: Tue, 13 May 2025 00:36:21 +0000
Subject: [PATCH 32/51] [ci skip] Update changelog and version for nightly
build: v2.5.0-32-nightly.
---
CHANGELOG.md | 15 ++++++---------
packaging/version | 2 +-
2 files changed, 7 insertions(+), 10 deletions(-)
diff --git a/CHANGELOG.md b/CHANGELOG.md
index 6ff935c219d138..598ec0d6658c79 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -6,6 +6,10 @@
**Merged pull requests:**
+- chore\(go.d/snmp\): small cleanup snmp profiles code [\#20274](https://github.com/netdata/netdata/pull/20274) ([ilyam8](https://github.com/ilyam8))
+- Switch to poll from epoll [\#20273](https://github.com/netdata/netdata/pull/20273) ([stelfrag](https://github.com/stelfrag))
+- build\(deps\): bump golang.org/x/net from 0.39.0 to 0.40.0 in /src/go [\#20270](https://github.com/netdata/netdata/pull/20270) ([dependabot[bot]](https://github.com/apps/dependabot))
+- build\(deps\): bump github.com/miekg/dns from 1.1.65 to 1.1.66 in /src/go [\#20268](https://github.com/netdata/netdata/pull/20268) ([dependabot[bot]](https://github.com/apps/dependabot))
- Update Netdata README with improved structure [\#20265](https://github.com/netdata/netdata/pull/20265) ([kanelatechnical](https://github.com/kanelatechnical))
- Schedule journal file indexing after database file rotation [\#20264](https://github.com/netdata/netdata/pull/20264) ([stelfrag](https://github.com/stelfrag))
- Minor fixes [\#20263](https://github.com/netdata/netdata/pull/20263) ([stelfrag](https://github.com/stelfrag))
@@ -18,9 +22,11 @@
- fix obsolete chart cleanup to properly handle vnodes [\#20254](https://github.com/netdata/netdata/pull/20254) ([ilyam8](https://github.com/ilyam8))
- docs: fix license link and remove GH alerts syntax from FAQ [\#20252](https://github.com/netdata/netdata/pull/20252) ([ilyam8](https://github.com/ilyam8))
- Update Netdata README [\#20251](https://github.com/netdata/netdata/pull/20251) ([kanelatechnical](https://github.com/kanelatechnical))
+- Switch to uv threads [\#20250](https://github.com/netdata/netdata/pull/20250) ([stelfrag](https://github.com/stelfrag))
- fix\(go.d/snmp\): use 32bit counters if 64 aren't available [\#20249](https://github.com/netdata/netdata/pull/20249) ([ilyam8](https://github.com/ilyam8))
- fix\(go.d/snmp\): use ifDescr for interface name if ifName is empty [\#20248](https://github.com/netdata/netdata/pull/20248) ([ilyam8](https://github.com/ilyam8))
- fix\(go.d/sd/snmp\): fix snmpv3 credentials [\#20247](https://github.com/netdata/netdata/pull/20247) ([ilyam8](https://github.com/ilyam8))
+- SNMP first cisco yaml file pass [\#20246](https://github.com/netdata/netdata/pull/20246) ([Ancairon](https://github.com/Ancairon))
- Fix build issue on old distros [\#20243](https://github.com/netdata/netdata/pull/20243) ([stelfrag](https://github.com/stelfrag))
- Session claim id in docker [\#20240](https://github.com/netdata/netdata/pull/20240) ([stelfrag](https://github.com/stelfrag))
- Let the user override the default stack size [\#20236](https://github.com/netdata/netdata/pull/20236) ([stelfrag](https://github.com/stelfrag))
@@ -465,15 +471,6 @@
- allow insecure cloud connections [\#19736](https://github.com/netdata/netdata/pull/19736) ([ktsaou](https://github.com/ktsaou))
- add more information about claiming failures [\#19735](https://github.com/netdata/netdata/pull/19735) ([ktsaou](https://github.com/ktsaou))
- support https\_proxy too [\#19733](https://github.com/netdata/netdata/pull/19733) ([ktsaou](https://github.com/ktsaou))
-- fix json generation of apps.plugin processes function info [\#19732](https://github.com/netdata/netdata/pull/19732) ([ktsaou](https://github.com/ktsaou))
-- add another step when initializing web [\#19731](https://github.com/netdata/netdata/pull/19731) ([ktsaou](https://github.com/ktsaou))
-- improved descriptions of exit reasons [\#19730](https://github.com/netdata/netdata/pull/19730) ([ktsaou](https://github.com/ktsaou))
-- do not post empty reports [\#19729](https://github.com/netdata/netdata/pull/19729) ([ktsaou](https://github.com/ktsaou))
-- docs: clarify Windows Agent limits on free plans [\#19727](https://github.com/netdata/netdata/pull/19727) ([ilyam8](https://github.com/ilyam8))
-- improve status file deduplication [\#19726](https://github.com/netdata/netdata/pull/19726) ([ktsaou](https://github.com/ktsaou))
-- handle flushing state during exit [\#19725](https://github.com/netdata/netdata/pull/19725) ([ktsaou](https://github.com/ktsaou))
-- allow configuring journal v2 unmount time; turn it off for parents [\#19724](https://github.com/netdata/netdata/pull/19724) ([ktsaou](https://github.com/ktsaou))
-- minor status file annotation fixes [\#19723](https://github.com/netdata/netdata/pull/19723) ([ktsaou](https://github.com/ktsaou))
## [v2.2.6](https://github.com/netdata/netdata/tree/v2.2.6) (2025-02-20)
diff --git a/packaging/version b/packaging/version
index c5bfaf8b595c90..de65169d7d0633 100644
--- a/packaging/version
+++ b/packaging/version
@@ -1 +1 @@
-v2.5.0-25-nightly
+v2.5.0-32-nightly
From a78abaa1999b6d40e53dc60824881924235f39f3 Mon Sep 17 00:00:00 2001
From: thiagoftsm
Date: Tue, 13 May 2025 22:29:33 +0000
Subject: [PATCH 33/51] Improve MSSQL (Part III) (#20230)
---
src/collectors/windows.plugin/metadata.yaml | 99 ++++-
src/collectors/windows.plugin/perflib-mssql.c | 372 ++++++++++--------
2 files changed, 288 insertions(+), 183 deletions(-)
diff --git a/src/collectors/windows.plugin/metadata.yaml b/src/collectors/windows.plugin/metadata.yaml
index 23ae5df92bb65e..b5638df2d5e09c 100644
--- a/src/collectors/windows.plugin/metadata.yaml
+++ b/src/collectors/windows.plugin/metadata.yaml
@@ -1609,34 +1609,117 @@ modules:
default_behavior:
auto_detection:
description: |
- The collector automatically detects all of the metrics, no further configuration is required.
+ The collector automatically detects some metrics, but transaction metrics require configuration.
limits:
description: ""
performance_impact:
description: ""
setup:
prerequisites:
- list: []
+ list:
+ - title: Create netdata user
+ description: |
+ Create an SQL Server user with the necessary permissions to collect monitoring data:
+
+ ```tsql
+ USE master;
+ CREATE LOGIN netdata_user WITH PASSWORD = '1ReallyStrongPasswordShouldBeInsertedHere';
+ CREATE USER netdata_user FOR LOGIN netdata_user;
+ GRANT CONNECT SQL TO netdata_user;
+ GRANT VIEW SERVER STATE TO netdata_user;
+ GO
+ ```
+
+ Additionally, enable the [Query Store](https://learn.microsoft.com/en-us/sql/relational-databases/performance/monitoring-performance-by-using-the-query-store?view=sql-server-ver16)
+ on each database you want to monitor:
+
+ ```tsql
+ DECLARE @dbname NVARCHAR(max)
+ DECLARE nd_user_cursor CURSOR FOR SELECT name
+ FROM master.dbo.sysdatabases
+ WHERE name NOT IN ('master', 'tempdb')
+
+ OPEN nd_user_cursor
+ FETCH NEXT FROM nd_user_cursor INTO @dbname
+ WHILE @@FETCH_STATUS = 0
+ BEGIN
+ EXECUTE ("USE "+ @dbname+"; CREATE USER netdata_user FOR LOGIN netdata_user; ALTER DATABASE "+@dbname+" SET QUERY_STORE = ON ( QUERY_CAPTURE_MODE = ALL, DATA_FLUSH_INTERVAL_SECONDS = 900 )");
+ FETCH next FROM nd_user_cursor INTO @dbname;
+ END
+ CLOSE nd_user_cursor
+ DEALLOCATE nd_user_cursor
+ GO
+ ```
configuration:
file:
name: "netdata.conf"
- section_name: "[plugin:windows]"
+ section_name: "[plugin:windows:PerflibMSSQL]"
description: "The Netdata main configuration file"
options:
- description: ""
+ description: "These options allow the collector to connect to your MSSQL instance and collect transaction data from it."
folding:
title: "Config option"
enabled: false
list:
- - name: PerflibMSSQL
- description: An option to enable or disable the data collection.
- default_value: yes
+ - name: driver
+ description: ODBC driver used to connect to the SQL Server.
+ default_value: SQL Server
+ required: false
+ - name: server
+ description: Server address or instance name.
+ default_value: empty
+ required: true
+ - name: address
+ description: Alternative to `server`; supports named pipes if the server supports them.
+ default_value: empty
+ required: true
+ - name: uid
+ description: SQL Server user identifier.
+ default_value: empty
+ required: true
+ - name: pwd
+ description: Password for the specified user.
+ default_value: empty
+ required: true
+ - name: additional instances
+ description: Number of additional SQL Server instances to monitor.
+ default_value: 0
+ required: false
+ - name: windows authentication
+ description: Set to yes to use Windows credentials instead of SQL Server authentication.
+ default_value: no
required: false
examples:
folding:
enabled: true
title: ""
- list: []
+ list:
+ - name: One Instance
+ description: An example configuration.
+ folding:
+ enabled: false
+ config: |
+ [plugin:windows:PerflibMSSQL]
+ driver = SQL Server
+ server = 127.0.0.1\\Dev, 1433
+ uid = netdata_user
+ pwd = 1ReallyStrongPasswordShouldBeInsertedHere
+ - name: Two Instances
+ description: An example configuration with two instances.
+ folding:
+ enabled: false
+ config: |
+ [plugin:windows:PerflibMSSQL]
+ driver = SQL Server
+ server = 127.0.0.1\\Dev, 1433
+ uid = netdata_user
+ pwd = 1ReallyStrongPasswordShouldBeInsertedHere
+ additional instances = 1
+ [plugin:windows:PerflibMSSQL1]
+ driver = SQL Server
+ server = 127.0.0.1\\Production, 1434
+ uid = netdata_user
+ pwd = AnotherReallyStrongPasswordShouldBeInsertedHere2
troubleshooting:
problems:
list: []
diff --git a/src/collectors/windows.plugin/perflib-mssql.c b/src/collectors/windows.plugin/perflib-mssql.c
index 17b31f75165413..de8df1268c6f33 100644
--- a/src/collectors/windows.plugin/perflib-mssql.c
+++ b/src/collectors/windows.plugin/perflib-mssql.c
@@ -16,6 +16,7 @@
#define NETDATA_MSSQL_NEXT_TRY (60)
BOOL is_sqlexpress = FALSE;
+ND_THREAD *mssql_query_thread = NULL;
struct netdata_mssql_conn {
const char *driver;
@@ -319,8 +320,8 @@ static ULONGLONG netdata_MSSQL_fill_long_value(SQLHSTMT *stmt, const char *mask,
void dict_mssql_fill_transactions(struct mssql_db_instance *mdi, const char *dbname)
{
- char object_name[NETDATA_MAX_INSTANCE_OBJECT + 1];
- long value;
+ char object_name[NETDATA_MAX_INSTANCE_OBJECT + 1] = {};
+ long value = 0;
SQLLEN col_object_len = 0, col_value_len = 0;
SQLCHAR query[sizeof(NETDATA_QUERY_TRANSACTIONS_MASK) + 2 * NETDATA_MAX_INSTANCE_OBJECT + 1];
@@ -745,8 +746,6 @@ void dict_mssql_insert_locks_cb(const DICTIONARY_ITEM *item __maybe_unused, void
// https://learn.microsoft.com/en-us/sql/relational-databases/performance-monitor/sql-server-locks-object
struct mssql_lock_instance *ptr = value;
ptr->resourceID = strdupz(resource);
- ptr->deadLocks.key = "Number of Deadlocks/sec";
- ptr->lockWait.key = "Lock Waits/sec";
}
void dict_mssql_insert_databases_cb(const DICTIONARY_ITEM *item __maybe_unused, void *value, void *data __maybe_unused)
@@ -846,6 +845,7 @@ void dict_mssql_insert_cb(const DICTIONARY_ITEM *item __maybe_unused, void *valu
{
struct mssql_instance *mi = value;
const char *instance = dictionary_acquired_item_name((DICTIONARY_ITEM *)item);
+ bool *create_thread = data;
if (!mi->locks_instances) {
mi->locks_instances = dictionary_create_advanced(
@@ -863,8 +863,11 @@ void dict_mssql_insert_cb(const DICTIONARY_ITEM *item __maybe_unused, void *valu
initialize_mssql_keys(mi);
netdata_read_config_options(&mi->conn);
- if (mi->conn.connectionString)
+ if (mi->conn.connectionString) {
mi->conn.is_connected = netdata_MSSQL_initialize_conection(&mi->conn);
+ if (mi->conn.is_connected)
+ *create_thread = true;
+ }
}
static int mssql_fill_dictionary()
@@ -913,17 +916,75 @@ static int mssql_fill_dictionary()
return (ret == ERROR_SUCCESS) ? 0 : -1;
}
-static int initialize(void)
+int netdata_mssql_reset_value(const DICTIONARY_ITEM *item __maybe_unused, void *value, void *data __maybe_unused)
+{
+ struct mssql_db_instance *mdi = value;
+
+ mdi->collecting_data = false;
+
+ return 1;
+}
+
+int dict_mssql_query_cb(const DICTIONARY_ITEM *item __maybe_unused, void *value, void *data __maybe_unused)
+{
+ struct mssql_instance *mi = value;
+ static long have_perm = 1;
+
+ if (mi->conn.is_connected && have_perm) {
+ have_perm = metdata_mssql_check_permission(mi);
+ if (!have_perm) {
+ nd_log(
+ NDLS_COLLECTORS,
+ NDLP_ERR,
+ "User %s does not have permission to run queries on %s",
+ mi->conn.username,
+ mi->instanceID);
+ } else {
+ metdata_mssql_fill_dictionary_from_db(mi);
+ dictionary_sorted_walkthrough_read(mi->databases, dict_mssql_databases_run_queries, NULL);
+ }
+ } else {
+ dictionary_sorted_walkthrough_read(mi->databases, netdata_mssql_reset_value, NULL);
+ }
+
+ return 1;
+}
+
+void *netdata_mssql_queries(void *ptr __maybe_unused)
+{
+ heartbeat_t hb;
+ int update_every = *((int *)ptr);
+ heartbeat_init(&hb, update_every * USEC_PER_SEC);
+
+ while (service_running(SERVICE_COLLECTORS)) {
+ (void)heartbeat_next(&hb);
+
+ if (unlikely(!service_running(SERVICE_COLLECTORS)))
+ break;
+
+ dictionary_sorted_walkthrough_read(mssql_instances, dict_mssql_query_cb, &update_every);
+ }
+
+ return NULL;
+}
+
+static int initialize(int update_every)
{
+ static bool create_thread = false;
mssql_instances = dictionary_create_advanced(
DICT_OPTION_DONT_OVERWRITE_VALUE | DICT_OPTION_FIXED_SIZE, NULL, sizeof(struct mssql_instance));
- dictionary_register_insert_callback(mssql_instances, dict_mssql_insert_cb, NULL);
+ dictionary_register_insert_callback(mssql_instances, dict_mssql_insert_cb, &create_thread);
if (mssql_fill_dictionary()) {
return -1;
}
+ if (create_thread)
+ mssql_query_thread = nd_thread_create("mssql_queries",
+ NETDATA_THREAD_OPTION_DEFAULT,
+ netdata_mssql_queries, &update_every);
+
return 0;
}
@@ -1437,17 +1498,17 @@ static void do_mssql_locks(PERF_DATA_BLOCK *pDataBlock, struct mssql_instance *m
rrdset_done(mi->st_deadLocks);
}
-static void mssql_database_backup_restore_chart(struct mssql_db_instance *mli, const char *db, int update_every)
+static void mssql_database_backup_restore_chart(struct mssql_db_instance *mdi, const char *db, int update_every)
{
- if (unlikely(!mli->parent->conn.is_connected))
+ if (unlikely(!mdi->parent->conn.is_connected))
return;
char id[RRD_ID_LENGTH_MAX + 1];
- if (!mli->st_db_backup_restore_operations) {
- snprintfz(id, RRD_ID_LENGTH_MAX, "db_%s_instance_%s_backup_restore_operations", db, mli->parent->instanceID);
+ if (!mdi->st_db_backup_restore_operations) {
+ snprintfz(id, RRD_ID_LENGTH_MAX, "db_%s_instance_%s_backup_restore_operations", db, mdi->parent->instanceID);
netdata_fix_chart_name(id);
- mli->st_db_backup_restore_operations = rrdset_create_localhost(
+ mdi->st_db_backup_restore_operations = rrdset_create_localhost(
"mssql",
id,
NULL,
@@ -1462,37 +1523,35 @@ static void mssql_database_backup_restore_chart(struct mssql_db_instance *mli, c
RRDSET_TYPE_LINE);
rrdlabels_add(
- mli->st_db_backup_restore_operations->rrdlabels,
+ mdi->st_db_backup_restore_operations->rrdlabels,
"mssql_instance",
- mli->parent->instanceID,
+ mdi->parent->instanceID,
RRDLABEL_SRC_AUTO);
- rrdlabels_add(mli->st_db_backup_restore_operations->rrdlabels, "database", db, RRDLABEL_SRC_AUTO);
- }
+ rrdlabels_add(mdi->st_db_backup_restore_operations->rrdlabels, "database", db, RRDLABEL_SRC_AUTO);
- if (!mli->rd_db_backup_restore_operations) {
- mli->rd_db_backup_restore_operations =
- rrddim_add(mli->st_db_backup_restore_operations, "backup", NULL, 1, 1, RRD_ALGORITHM_INCREMENTAL);
+ mdi->rd_db_backup_restore_operations =
+ rrddim_add(mdi->st_db_backup_restore_operations, "backup", NULL, 1, 1, RRD_ALGORITHM_INCREMENTAL);
}
rrddim_set_by_pointer(
- mli->st_db_backup_restore_operations,
- mli->rd_db_backup_restore_operations,
- (collected_number)mli->MSSQLDatabaseBackupRestoreOperations.current.Data);
+ mdi->st_db_backup_restore_operations,
+ mdi->rd_db_backup_restore_operations,
+ (collected_number)mdi->MSSQLDatabaseBackupRestoreOperations.current.Data);
- rrdset_done(mli->st_db_backup_restore_operations);
+ rrdset_done(mdi->st_db_backup_restore_operations);
}
-static void mssql_database_log_flushes_chart(struct mssql_db_instance *mli, const char *db, int update_every)
+static void mssql_database_log_flushes_chart(struct mssql_db_instance *mdi, const char *db, int update_every)
{
- if (unlikely(!mli->parent->conn.is_connected))
+ if (unlikely(!mdi->parent->conn.is_connected))
return;
char id[RRD_ID_LENGTH_MAX + 1];
- if (!mli->st_db_log_flushes) {
- snprintfz(id, RRD_ID_LENGTH_MAX, "db_%s_instance_%s_log_flushes", db, mli->parent->instanceID);
+ if (!mdi->st_db_log_flushes) {
+ snprintfz(id, RRD_ID_LENGTH_MAX, "db_%s_instance_%s_log_flushes", db, mdi->parent->instanceID);
netdata_fix_chart_name(id);
- mli->st_db_log_flushes = rrdset_create_localhost(
+ mdi->st_db_log_flushes = rrdset_create_localhost(
"mssql",
id,
NULL,
@@ -1506,31 +1565,31 @@ static void mssql_database_log_flushes_chart(struct mssql_db_instance *mli, cons
update_every,
RRDSET_TYPE_LINE);
- rrdlabels_add(mli->st_db_log_flushes->rrdlabels, "mssql_instance", mli->parent->instanceID, RRDLABEL_SRC_AUTO);
- rrdlabels_add(mli->st_db_log_flushes->rrdlabels, "database", db, RRDLABEL_SRC_AUTO);
+ rrdlabels_add(mdi->st_db_log_flushes->rrdlabels, "mssql_instance", mdi->parent->instanceID, RRDLABEL_SRC_AUTO);
+ rrdlabels_add(mdi->st_db_log_flushes->rrdlabels, "database", db, RRDLABEL_SRC_AUTO);
}
- if (!mli->rd_db_log_flushes) {
- mli->rd_db_log_flushes = rrddim_add(mli->st_db_log_flushes, "flushes", NULL, 1, 1, RRD_ALGORITHM_INCREMENTAL);
+ if (!mdi->rd_db_log_flushes) {
+ mdi->rd_db_log_flushes = rrddim_add(mdi->st_db_log_flushes, "flushes", NULL, 1, 1, RRD_ALGORITHM_INCREMENTAL);
}
rrddim_set_by_pointer(
- mli->st_db_log_flushes, mli->rd_db_log_flushes, (collected_number)mli->MSSQLDatabaseLogFlushes.current.Data);
+ mdi->st_db_log_flushes, mdi->rd_db_log_flushes, (collected_number)mdi->MSSQLDatabaseLogFlushes.current.Data);
- rrdset_done(mli->st_db_log_flushes);
+ rrdset_done(mdi->st_db_log_flushes);
}
-static void mssql_database_log_flushed_chart(struct mssql_db_instance *mli, const char *db, int update_every)
+static void mssql_database_log_flushed_chart(struct mssql_db_instance *mdi, const char *db, int update_every)
{
- if (unlikely(!mli->parent->conn.is_connected))
+ if (unlikely(!mdi->parent->conn.is_connected))
return;
char id[RRD_ID_LENGTH_MAX + 1];
- if (!mli->st_db_log_flushed) {
- snprintfz(id, RRD_ID_LENGTH_MAX, "db_%s_instance_%s_log_flushed", db, mli->parent->instanceID);
+ if (!mdi->st_db_log_flushed) {
+ snprintfz(id, RRD_ID_LENGTH_MAX, "db_%s_instance_%s_log_flushed", db, mdi->parent->instanceID);
netdata_fix_chart_name(id);
- mli->st_db_log_flushed = rrdset_create_localhost(
+ mdi->st_db_log_flushed = rrdset_create_localhost(
"mssql",
id,
NULL,
@@ -1544,31 +1603,29 @@ static void mssql_database_log_flushed_chart(struct mssql_db_instance *mli, cons
update_every,
RRDSET_TYPE_LINE);
- rrdlabels_add(mli->st_db_log_flushed->rrdlabels, "mssql_instance", mli->parent->instanceID, RRDLABEL_SRC_AUTO);
- rrdlabels_add(mli->st_db_log_flushed->rrdlabels, "database", db, RRDLABEL_SRC_AUTO);
- }
+ rrdlabels_add(mdi->st_db_log_flushed->rrdlabels, "mssql_instance", mdi->parent->instanceID, RRDLABEL_SRC_AUTO);
+ rrdlabels_add(mdi->st_db_log_flushed->rrdlabels, "database", db, RRDLABEL_SRC_AUTO);
- if (!mli->rd_db_log_flushed) {
- mli->rd_db_log_flushed = rrddim_add(mli->st_db_log_flushed, "flushed", NULL, 1, 1, RRD_ALGORITHM_INCREMENTAL);
+ mdi->rd_db_log_flushed = rrddim_add(mdi->st_db_log_flushed, "flushed", NULL, 1, 1, RRD_ALGORITHM_INCREMENTAL);
}
rrddim_set_by_pointer(
- mli->st_db_log_flushed, mli->rd_db_log_flushed, (collected_number)mli->MSSQLDatabaseLogFlushed.current.Data);
+ mdi->st_db_log_flushed, mdi->rd_db_log_flushed, (collected_number)mdi->MSSQLDatabaseLogFlushed.current.Data);
- rrdset_done(mli->st_db_log_flushed);
+ rrdset_done(mdi->st_db_log_flushed);
}
-static void mssql_transactions_chart(struct mssql_db_instance *mli, const char *db, int update_every)
+static void mssql_transactions_chart(struct mssql_db_instance *mdi, const char *db, int update_every)
{
- if (unlikely(!mli->parent->conn.is_connected))
+ if (unlikely(!mdi->parent->conn.is_connected))
return;
char id[RRD_ID_LENGTH_MAX + 1];
- if (!mli->st_db_transactions) {
- snprintfz(id, RRD_ID_LENGTH_MAX, "db_%s_instance_%s_transactions", db, mli->parent->instanceID);
+ if (!mdi->st_db_transactions) {
+ snprintfz(id, RRD_ID_LENGTH_MAX, "db_%s_instance_%s_transactions", db, mdi->parent->instanceID);
netdata_fix_chart_name(id);
- mli->st_db_transactions = rrdset_create_localhost(
+ mdi->st_db_transactions = rrdset_create_localhost(
"mssql",
id,
NULL,
@@ -1582,34 +1639,32 @@ static void mssql_transactions_chart(struct mssql_db_instance *mli, const char *
update_every,
RRDSET_TYPE_LINE);
- rrdlabels_add(mli->st_db_transactions->rrdlabels, "mssql_instance", mli->parent->instanceID, RRDLABEL_SRC_AUTO);
- rrdlabels_add(mli->st_db_transactions->rrdlabels, "database", db, RRDLABEL_SRC_AUTO);
- }
+ rrdlabels_add(mdi->st_db_transactions->rrdlabels, "mssql_instance", mdi->parent->instanceID, RRDLABEL_SRC_AUTO);
+ rrdlabels_add(mdi->st_db_transactions->rrdlabels, "database", db, RRDLABEL_SRC_AUTO);
- if (!mli->rd_db_transactions) {
- mli->rd_db_transactions =
- rrddim_add(mli->st_db_transactions, "transactions", NULL, 1, 1, RRD_ALGORITHM_INCREMENTAL);
+ mdi->rd_db_transactions =
+ rrddim_add(mdi->st_db_transactions, "transactions", NULL, 1, 1, RRD_ALGORITHM_INCREMENTAL);
}
rrddim_set_by_pointer(
- mli->st_db_transactions,
- mli->rd_db_transactions,
- (collected_number)mli->MSSQLDatabaseTransactions.current.Data);
+ mdi->st_db_transactions,
+ mdi->rd_db_transactions,
+ (collected_number)mdi->MSSQLDatabaseTransactions.current.Data);
- rrdset_done(mli->st_db_transactions);
+ rrdset_done(mdi->st_db_transactions);
}
-static void mssql_write_transactions_chart(struct mssql_db_instance *mli, const char *db, int update_every)
+static void mssql_write_transactions_chart(struct mssql_db_instance *mdi, const char *db, int update_every)
{
- if (unlikely(!mli->parent->conn.is_connected))
+ if (unlikely(!mdi->parent->conn.is_connected))
return;
char id[RRD_ID_LENGTH_MAX + 1];
- if (!mli->st_db_write_transactions) {
- snprintfz(id, RRD_ID_LENGTH_MAX, "db_%s_instance_%s_write_transactions", db, mli->parent->instanceID);
+ if (!mdi->st_db_write_transactions) {
+ snprintfz(id, RRD_ID_LENGTH_MAX, "db_%s_instance_%s_write_transactions", db, mdi->parent->instanceID);
netdata_fix_chart_name(id);
- mli->st_db_write_transactions = rrdset_create_localhost(
+ mdi->st_db_write_transactions = rrdset_create_localhost(
"mssql",
id,
NULL,
@@ -1624,34 +1679,32 @@ static void mssql_write_transactions_chart(struct mssql_db_instance *mli, const
RRDSET_TYPE_LINE);
rrdlabels_add(
- mli->st_db_write_transactions->rrdlabels, "mssql_instance", mli->parent->instanceID, RRDLABEL_SRC_AUTO);
- rrdlabels_add(mli->st_db_write_transactions->rrdlabels, "database", db, RRDLABEL_SRC_AUTO);
- }
+ mdi->st_db_write_transactions->rrdlabels, "mssql_instance", mdi->parent->instanceID, RRDLABEL_SRC_AUTO);
+ rrdlabels_add(mdi->st_db_write_transactions->rrdlabels, "database", db, RRDLABEL_SRC_AUTO);
- if (!mli->rd_db_write_transactions) {
- mli->rd_db_write_transactions =
- rrddim_add(mli->st_db_write_transactions, "write", NULL, 1, 1, RRD_ALGORITHM_INCREMENTAL);
+ mdi->rd_db_write_transactions =
+ rrddim_add(mdi->st_db_write_transactions, "write", NULL, 1, 1, RRD_ALGORITHM_INCREMENTAL);
}
rrddim_set_by_pointer(
- mli->st_db_write_transactions,
- mli->rd_db_write_transactions,
- (collected_number)mli->MSSQLDatabaseWriteTransactions.current.Data);
+ mdi->st_db_write_transactions,
+ mdi->rd_db_write_transactions,
+ (collected_number)mdi->MSSQLDatabaseWriteTransactions.current.Data);
- rrdset_done(mli->st_db_write_transactions);
+ rrdset_done(mdi->st_db_write_transactions);
}
-static void mssql_lockwait_chart(struct mssql_db_instance *mli, const char *db, int update_every)
+static void mssql_lockwait_chart(struct mssql_db_instance *mdi, const char *db, int update_every)
{
- if (unlikely(!mli->parent->conn.is_connected))
+ if (unlikely(!mdi->parent->conn.is_connected))
return;
char id[RRD_ID_LENGTH_MAX + 1];
- if (!mli->st_db_lockwait) {
- snprintfz(id, RRD_ID_LENGTH_MAX, "db_%s_instance_%s_lockwait", db, mli->parent->instanceID);
+ if (!mdi->st_db_lockwait) {
+ snprintfz(id, RRD_ID_LENGTH_MAX, "db_%s_instance_%s_lockwait", db, mdi->parent->instanceID);
netdata_fix_chart_name(id);
- mli->st_db_lockwait = rrdset_create_localhost(
+ mdi->st_db_lockwait = rrdset_create_localhost(
"mssql",
id,
NULL,
@@ -1665,29 +1718,29 @@ static void mssql_lockwait_chart(struct mssql_db_instance *mli, const char *db,
update_every,
RRDSET_TYPE_LINE);
- rrdlabels_add(mli->st_db_lockwait->rrdlabels, "mssql_instance", mli->parent->instanceID, RRDLABEL_SRC_AUTO);
- rrdlabels_add(mli->st_db_lockwait->rrdlabels, "database", db, RRDLABEL_SRC_AUTO);
+ rrdlabels_add(mdi->st_db_lockwait->rrdlabels, "mssql_instance", mdi->parent->instanceID, RRDLABEL_SRC_AUTO);
+ rrdlabels_add(mdi->st_db_lockwait->rrdlabels, "database", db, RRDLABEL_SRC_AUTO);
- mli->rd_db_lockwait = rrddim_add(mli->st_db_lockwait, "lock", NULL, 1, 1, RRD_ALGORITHM_INCREMENTAL);
+ mdi->rd_db_lockwait = rrddim_add(mdi->st_db_lockwait, "lock", NULL, 1, 1, RRD_ALGORITHM_INCREMENTAL);
}
rrddim_set_by_pointer(
- mli->st_db_lockwait, mli->rd_db_lockwait, (collected_number)mli->MSSQLDatabaseLockWaitSec.current.Data);
+ mdi->st_db_lockwait, mdi->rd_db_lockwait, (collected_number)mdi->MSSQLDatabaseLockWaitSec.current.Data);
- rrdset_done(mli->st_db_lockwait);
+ rrdset_done(mdi->st_db_lockwait);
}
-static void mssql_deadlock_chart(struct mssql_db_instance *mli, const char *db, int update_every)
+static void mssql_deadlock_chart(struct mssql_db_instance *mdi, const char *db, int update_every)
{
- if (unlikely(!mli->parent->conn.is_connected))
+ if (unlikely(!mdi->parent->conn.is_connected))
return;
char id[RRD_ID_LENGTH_MAX + 1];
- if (!mli->st_db_deadlock) {
- snprintfz(id, RRD_ID_LENGTH_MAX, "db_%s_instance_%s_deadlocks", db, mli->parent->instanceID);
+ if (!mdi->st_db_deadlock) {
+ snprintfz(id, RRD_ID_LENGTH_MAX, "db_%s_instance_%s_deadlocks", db, mdi->parent->instanceID);
netdata_fix_chart_name(id);
- mli->st_db_deadlock = rrdset_create_localhost(
+ mdi->st_db_deadlock = rrdset_create_localhost(
"mssql",
id,
NULL,
@@ -1701,29 +1754,29 @@ static void mssql_deadlock_chart(struct mssql_db_instance *mli, const char *db,
update_every,
RRDSET_TYPE_LINE);
- rrdlabels_add(mli->st_db_deadlock->rrdlabels, "mssql_instance", mli->parent->instanceID, RRDLABEL_SRC_AUTO);
- rrdlabels_add(mli->st_db_deadlock->rrdlabels, "database", db, RRDLABEL_SRC_AUTO);
+ rrdlabels_add(mdi->st_db_deadlock->rrdlabels, "mssql_instance", mdi->parent->instanceID, RRDLABEL_SRC_AUTO);
+ rrdlabels_add(mdi->st_db_deadlock->rrdlabels, "database", db, RRDLABEL_SRC_AUTO);
- mli->rd_db_deadlock = rrddim_add(mli->st_db_deadlock, "deadlocks", NULL, 1, 1, RRD_ALGORITHM_INCREMENTAL);
+ mdi->rd_db_deadlock = rrddim_add(mdi->st_db_deadlock, "deadlocks", NULL, 1, 1, RRD_ALGORITHM_INCREMENTAL);
}
rrddim_set_by_pointer(
- mli->st_db_deadlock, mli->rd_db_deadlock, (collected_number)mli->MSSQLDatabaseDeadLockSec.current.Data);
+ mdi->st_db_deadlock, mdi->rd_db_deadlock, (collected_number)mdi->MSSQLDatabaseDeadLockSec.current.Data);
- rrdset_done(mli->st_db_deadlock);
+ rrdset_done(mdi->st_db_deadlock);
}
-static void mssql_lock_request_chart(struct mssql_db_instance *mli, const char *db, int update_every)
+static void mssql_lock_request_chart(struct mssql_db_instance *mdi, const char *db, int update_every)
{
- if (unlikely(!mli->parent->conn.is_connected))
+ if (unlikely(!mdi->parent->conn.is_connected))
return;
char id[RRD_ID_LENGTH_MAX + 1];
- if (!mli->st_lock_requests) {
- snprintfz(id, RRD_ID_LENGTH_MAX, "db_%s_instance_%s_lock_requests", db, mli->parent->instanceID);
+ if (!mdi->st_lock_requests) {
+ snprintfz(id, RRD_ID_LENGTH_MAX, "db_%s_instance_%s_lock_requests", db, mdi->parent->instanceID);
netdata_fix_chart_name(id);
- mli->st_lock_requests = rrdset_create_localhost(
+ mdi->st_lock_requests = rrdset_create_localhost(
"mssql",
id,
NULL,
@@ -1737,29 +1790,29 @@ static void mssql_lock_request_chart(struct mssql_db_instance *mli, const char *
update_every,
RRDSET_TYPE_LINE);
- rrdlabels_add(mli->st_lock_requests->rrdlabels, "mssql_instance", mli->parent->instanceID, RRDLABEL_SRC_AUTO);
- rrdlabels_add(mli->st_lock_requests->rrdlabels, "database", db, RRDLABEL_SRC_AUTO);
+ rrdlabels_add(mdi->st_lock_requests->rrdlabels, "mssql_instance", mdi->parent->instanceID, RRDLABEL_SRC_AUTO);
+ rrdlabels_add(mdi->st_lock_requests->rrdlabels, "database", db, RRDLABEL_SRC_AUTO);
- mli->rd_lock_requests = rrddim_add(mli->st_lock_requests, "requests", NULL, 1, 1, RRD_ALGORITHM_INCREMENTAL);
+ mdi->rd_lock_requests = rrddim_add(mdi->st_lock_requests, "requests", NULL, 1, 1, RRD_ALGORITHM_INCREMENTAL);
}
rrddim_set_by_pointer(
- mli->st_lock_requests, mli->rd_lock_requests, (collected_number)mli->MSSQLDatabaseLockRequestsSec.current.Data);
+ mdi->st_lock_requests, mdi->rd_lock_requests, (collected_number)mdi->MSSQLDatabaseLockRequestsSec.current.Data);
- rrdset_done(mli->st_lock_requests);
+ rrdset_done(mdi->st_lock_requests);
}
-static void mssql_lock_timeout_chart(struct mssql_db_instance *mli, const char *db, int update_every)
+static void mssql_lock_timeout_chart(struct mssql_db_instance *mdi, const char *db, int update_every)
{
- if (unlikely(!mli->parent->conn.is_connected))
+ if (unlikely(!mdi->parent->conn.is_connected))
return;
char id[RRD_ID_LENGTH_MAX + 1];
- if (!mli->st_lock_timeouts) {
- snprintfz(id, RRD_ID_LENGTH_MAX, "db_%s_instance_%s_lock_timeouts", db, mli->parent->instanceID);
+ if (!mdi->st_lock_timeouts) {
+ snprintfz(id, RRD_ID_LENGTH_MAX, "db_%s_instance_%s_lock_timeouts", db, mdi->parent->instanceID);
netdata_fix_chart_name(id);
- mli->st_lock_timeouts = rrdset_create_localhost(
+ mdi->st_lock_timeouts = rrdset_create_localhost(
"mssql",
id,
NULL,
@@ -1773,29 +1826,29 @@ static void mssql_lock_timeout_chart(struct mssql_db_instance *mli, const char *
update_every,
RRDSET_TYPE_LINE);
- rrdlabels_add(mli->st_lock_timeouts->rrdlabels, "mssql_instance", mli->parent->instanceID, RRDLABEL_SRC_AUTO);
- rrdlabels_add(mli->st_lock_timeouts->rrdlabels, "database", db, RRDLABEL_SRC_AUTO);
+ rrdlabels_add(mdi->st_lock_timeouts->rrdlabels, "mssql_instance", mdi->parent->instanceID, RRDLABEL_SRC_AUTO);
+ rrdlabels_add(mdi->st_lock_timeouts->rrdlabels, "database", db, RRDLABEL_SRC_AUTO);
- mli->rd_lock_timeouts = rrddim_add(mli->st_lock_timeouts, "timeouts", NULL, 1, 1, RRD_ALGORITHM_INCREMENTAL);
+ mdi->rd_lock_timeouts = rrddim_add(mdi->st_lock_timeouts, "timeouts", NULL, 1, 1, RRD_ALGORITHM_INCREMENTAL);
}
rrddim_set_by_pointer(
- mli->st_lock_timeouts, mli->rd_lock_timeouts, (collected_number)mli->MSSQLDatabaseLockTimeoutsSec.current.Data);
+ mdi->st_lock_timeouts, mdi->rd_lock_timeouts, (collected_number)mdi->MSSQLDatabaseLockTimeoutsSec.current.Data);
- rrdset_done(mli->st_lock_timeouts);
+ rrdset_done(mdi->st_lock_timeouts);
}
-static void mssql_active_transactions_chart(struct mssql_db_instance *mli, const char *db, int update_every)
+static void mssql_active_transactions_chart(struct mssql_db_instance *mdi, const char *db, int update_every)
{
- if (unlikely(!mli->parent->conn.is_connected))
+ if (unlikely(!mdi->parent->conn.is_connected))
return;
char id[RRD_ID_LENGTH_MAX + 1];
- if (!mli->st_db_active_transactions) {
- snprintfz(id, RRD_ID_LENGTH_MAX, "db_%s_instance_%s_active_transactions", db, mli->parent->instanceID);
+ if (!mdi->st_db_active_transactions) {
+ snprintfz(id, RRD_ID_LENGTH_MAX, "db_%s_instance_%s_active_transactions", db, mdi->parent->instanceID);
netdata_fix_chart_name(id);
- mli->st_db_active_transactions = rrdset_create_localhost(
+ mdi->st_db_active_transactions = rrdset_create_localhost(
"mssql",
id,
NULL,
@@ -1810,34 +1863,32 @@ static void mssql_active_transactions_chart(struct mssql_db_instance *mli, const
RRDSET_TYPE_LINE);
rrdlabels_add(
- mli->st_db_active_transactions->rrdlabels, "mssql_instance", mli->parent->instanceID, RRDLABEL_SRC_AUTO);
- rrdlabels_add(mli->st_db_active_transactions->rrdlabels, "database", db, RRDLABEL_SRC_AUTO);
- }
+ mdi->st_db_active_transactions->rrdlabels, "mssql_instance", mdi->parent->instanceID, RRDLABEL_SRC_AUTO);
+ rrdlabels_add(mdi->st_db_active_transactions->rrdlabels, "database", db, RRDLABEL_SRC_AUTO);
- if (!mli->rd_db_active_transactions) {
- mli->rd_db_active_transactions =
- rrddim_add(mli->st_db_active_transactions, "active", NULL, 1, 1, RRD_ALGORITHM_ABSOLUTE);
+ mdi->rd_db_active_transactions =
+ rrddim_add(mdi->st_db_active_transactions, "active", NULL, 1, 1, RRD_ALGORITHM_ABSOLUTE);
}
rrddim_set_by_pointer(
- mli->st_db_active_transactions,
- mli->rd_db_active_transactions,
- (collected_number)mli->MSSQLDatabaseActiveTransactions.current.Data);
+ mdi->st_db_active_transactions,
+ mdi->rd_db_active_transactions,
+ (collected_number)mdi->MSSQLDatabaseActiveTransactions.current.Data);
- rrdset_done(mli->st_db_active_transactions);
+ rrdset_done(mdi->st_db_active_transactions);
}
-static inline void mssql_data_file_size_chart(struct mssql_db_instance *mli, const char *db, int update_every)
+static inline void mssql_data_file_size_chart(struct mssql_db_instance *mdi, const char *db, int update_every)
{
- if (unlikely(!mli->parent->conn.is_connected))
+ if (unlikely(!mdi->parent->conn.is_connected))
return;
char id[RRD_ID_LENGTH_MAX + 1];
- if (unlikely(!mli->st_db_data_file_size)) {
- snprintfz(id, RRD_ID_LENGTH_MAX, "db_%s_instance_%s_data_files_size", db, mli->parent->instanceID);
+ if (unlikely(!mdi->st_db_data_file_size)) {
+ snprintfz(id, RRD_ID_LENGTH_MAX, "db_%s_instance_%s_data_files_size", db, mdi->parent->instanceID);
netdata_fix_chart_name(id);
- mli->st_db_data_file_size = rrdset_create_localhost(
+ mdi->st_db_data_file_size = rrdset_create_localhost(
"mssql",
id,
NULL,
@@ -1852,26 +1903,24 @@ static inline void mssql_data_file_size_chart(struct mssql_db_instance *mli, con
RRDSET_TYPE_LINE);
rrdlabels_add(
- mli->st_db_data_file_size->rrdlabels, "mssql_instance", mli->parent->instanceID, RRDLABEL_SRC_AUTO);
- rrdlabels_add(mli->st_db_data_file_size->rrdlabels, "database", db, RRDLABEL_SRC_AUTO);
- }
+ mdi->st_db_data_file_size->rrdlabels, "mssql_instance", mdi->parent->instanceID, RRDLABEL_SRC_AUTO);
+ rrdlabels_add(mdi->st_db_data_file_size->rrdlabels, "database", db, RRDLABEL_SRC_AUTO);
- if (unlikely(!mli->rd_db_data_file_size)) {
- mli->rd_db_data_file_size = rrddim_add(mli->st_db_data_file_size, "size", NULL, 1, 1, RRD_ALGORITHM_ABSOLUTE);
+ mdi->rd_db_data_file_size = rrddim_add(mdi->st_db_data_file_size, "size", NULL, 1, 1, RRD_ALGORITHM_ABSOLUTE);
}
- collected_number data = mli->MSSQLDatabaseDataFileSize.current.Data;
- rrddim_set_by_pointer(mli->st_db_data_file_size, mli->rd_db_data_file_size, data);
+ collected_number data = mdi->MSSQLDatabaseDataFileSize.current.Data;
+ rrddim_set_by_pointer(mdi->st_db_data_file_size, mdi->rd_db_data_file_size, data);
- rrdset_done(mli->st_db_data_file_size);
+ rrdset_done(mdi->st_db_data_file_size);
}
int dict_mssql_databases_charts_cb(const DICTIONARY_ITEM *item __maybe_unused, void *value, void *data __maybe_unused)
{
- struct mssql_db_instance *mli = value;
+ struct mssql_db_instance *mdi = value;
const char *db = dictionary_acquired_item_name((DICTIONARY_ITEM *)item);
- if (!mli->collecting_data) {
+ if (!mdi->collecting_data) {
goto endchartcb;
}
@@ -1895,7 +1944,7 @@ int dict_mssql_databases_charts_cb(const DICTIONARY_ITEM *item __maybe_unused, v
int i;
for (i = 0; transaction_chart[i]; i++) {
- transaction_chart[i](mli, db, *update_every);
+ transaction_chart[i](mdi, db, *update_every);
}
endchartcb:
@@ -2064,38 +2113,11 @@ static void do_mssql_memory_mgr(PERF_DATA_BLOCK *pDataBlock, struct mssql_instan
}
}
-int netdata_mssql_reset_value(const DICTIONARY_ITEM *item __maybe_unused, void *value, void *data __maybe_unused)
-{
- struct mssql_db_instance *mdi = value;
-
- mdi->collecting_data = false;
-
- return 1;
-}
-
int dict_mssql_charts_cb(const DICTIONARY_ITEM *item __maybe_unused, void *value, void *data __maybe_unused)
{
struct mssql_instance *mi = value;
- static long have_perm = 1;
int *update_every = data;
- if (mi->conn.is_connected && have_perm) {
- have_perm = metdata_mssql_check_permission(mi);
- if (!have_perm) {
- nd_log(
- NDLS_COLLECTORS,
- NDLP_ERR,
- "User %s does not have permission to run queries on %s",
- mi->conn.username,
- mi->instanceID);
- } else {
- metdata_mssql_fill_dictionary_from_db(mi);
- dictionary_sorted_walkthrough_read(mi->databases, dict_mssql_databases_run_queries, NULL);
- }
- } else {
- dictionary_sorted_walkthrough_read(mi->databases, netdata_mssql_reset_value, NULL);
- }
-
static void (*doMSSQL[])(PERF_DATA_BLOCK *, struct mssql_instance *, int) = {
do_mssql_general_stats,
do_mssql_errors,
@@ -2132,7 +2154,7 @@ int do_PerflibMSSQL(int update_every, usec_t dt __maybe_unused)
static bool initialized = false;
if (unlikely(!initialized)) {
- if (initialize())
+ if (initialize(update_every))
return -1;
initialized = true;
From 2c1c998319f5b06288ef1d9115c42a24e7d2d57e Mon Sep 17 00:00:00 2001
From: netdatabot
Date: Wed, 14 May 2025 00:23:18 +0000
Subject: [PATCH 34/51] [ci skip] Update changelog and version for nightly
build: v2.5.0-34-nightly.
---
CHANGELOG.md | 2 +-
packaging/version | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/CHANGELOG.md b/CHANGELOG.md
index 598ec0d6658c79..925f82c8989e0e 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -31,6 +31,7 @@
- Session claim id in docker [\#20240](https://github.com/netdata/netdata/pull/20240) ([stelfrag](https://github.com/stelfrag))
- Let the user override the default stack size [\#20236](https://github.com/netdata/netdata/pull/20236) ([stelfrag](https://github.com/stelfrag))
- Revert "Revert "fix\(go.d/couchdb\): correct db size charts unit"" [\#20235](https://github.com/netdata/netdata/pull/20235) ([ilyam8](https://github.com/ilyam8))
+- Improve MSSQL \(Part III\) [\#20230](https://github.com/netdata/netdata/pull/20230) ([thiagoftsm](https://github.com/thiagoftsm))
- Make all threads joinable and join on agent shutdown [\#20228](https://github.com/netdata/netdata/pull/20228) ([stelfrag](https://github.com/stelfrag))
## [v2.5.1](https://github.com/netdata/netdata/tree/v2.5.1) (2025-05-08)
@@ -470,7 +471,6 @@
- Capture deadly signals [\#19737](https://github.com/netdata/netdata/pull/19737) ([ktsaou](https://github.com/ktsaou))
- allow insecure cloud connections [\#19736](https://github.com/netdata/netdata/pull/19736) ([ktsaou](https://github.com/ktsaou))
- add more information about claiming failures [\#19735](https://github.com/netdata/netdata/pull/19735) ([ktsaou](https://github.com/ktsaou))
-- support https\_proxy too [\#19733](https://github.com/netdata/netdata/pull/19733) ([ktsaou](https://github.com/ktsaou))
## [v2.2.6](https://github.com/netdata/netdata/tree/v2.2.6) (2025-02-20)
diff --git a/packaging/version b/packaging/version
index de65169d7d0633..fc587c16b36939 100644
--- a/packaging/version
+++ b/packaging/version
@@ -1 +1 @@
-v2.5.0-32-nightly
+v2.5.0-34-nightly
From ed624904613bb4f14a93e9ad043d744e5098d505 Mon Sep 17 00:00:00 2001
From: Ilya Mashchenko
Date: Wed, 14 May 2025 10:00:56 +0300
Subject: [PATCH 35/51] docs: update mssql meta (#20278)
---
src/collectors/windows.plugin/metadata.yaml | 78 +++++++++++----------
src/go/qq.md | 0
2 files changed, 42 insertions(+), 36 deletions(-)
create mode 100644 src/go/qq.md
diff --git a/src/collectors/windows.plugin/metadata.yaml b/src/collectors/windows.plugin/metadata.yaml
index b5638df2d5e09c..75f4a9a7d755bf 100644
--- a/src/collectors/windows.plugin/metadata.yaml
+++ b/src/collectors/windows.plugin/metadata.yaml
@@ -1609,7 +1609,10 @@ modules:
default_behavior:
auto_detection:
description: |
- The collector automatically detects some metrics, but transaction metrics require configuration.
+ The collector automatically discovers and monitors standard SQL Server metrics without additional setup. However, for transaction-level metrics, you must:
+
+ - Complete the "Configure SQL Server for Monitoring" steps in the Setup -> Prerequisites section.
+ - Configure a database connection (see Setup → Configuration → Examples).
limits:
description: ""
performance_impact:
@@ -1617,39 +1620,42 @@ modules:
setup:
prerequisites:
list:
- - title: Create netdata user
+ - title: Configure SQL Server for Monitoring
description: |
- Create an SQL Server user with the necessary permissions to collect monitoring data:
-
- ```tsql
- USE master;
- CREATE LOGIN netdata_user WITH PASSWORD = '1ReallyStrongPasswordShouldBeInsertedHere';
- CREATE USER netdata_user FOR LOGIN netdata_user;
- GRANT CONNECT SQL TO netdata_user;
- GRANT VIEW SERVER STATE TO netdata_user;
- GO
- ```
-
- Additionally, enable the [Query Store](https://learn.microsoft.com/en-us/sql/relational-databases/performance/monitoring-performance-by-using-the-query-store?view=sql-server-ver16)
- on each database you want to monitor:
-
- ```tsql
- DECLARE @dbname NVARCHAR(max)
- DECLARE nd_user_cursor CURSOR FOR SELECT name
- FROM master.dbo.sysdatabases
- WHERE name NOT IN ('master', 'tempdb')
-
- OPEN nd_user_cursor
- FETCH NEXT FROM nd_user_cursor INTO @dbname
- WHILE @@FETCH_STATUS = 0
- BEGIN
- EXECUTE ("USE "+ @dbname+"; CREATE USER netdata_user FOR LOGIN netdata_user; ALTER DATABASE "+@dbname+" SET QUERY_STORE = ON ( QUERY_CAPTURE_MODE = ALL, DATA_FLUSH_INTERVAL_SECONDS = 900 )");
- FETCH next FROM nd_user_cursor INTO @dbname;
- END
- CLOSE nd_user_cursor
- DEALLOCATE nd_user_cursor
- GO
- ```
+ 1. **Create Monitoring User**
+
+ Create an SQL Server user with the necessary permissions to collect monitoring data:
+
+ ```tsql
+ USE master;
+ CREATE LOGIN netdata_user WITH PASSWORD = '1ReallyStrongPasswordShouldBeInsertedHere';
+ CREATE USER netdata_user FOR LOGIN netdata_user;
+ GRANT CONNECT SQL TO netdata_user;
+ GRANT VIEW SERVER STATE TO netdata_user;
+ GO
+ ```
+
+ 2. **Enable Query Store**
+
+ Enable the [Query Store](https://learn.microsoft.com/en-us/sql/relational-databases/performance/monitoring-performance-by-using-the-query-store?view=sql-server-ver16) and grant access to the monitoring user on all relevant databases:
+
+ ```tsql
+ DECLARE @dbname NVARCHAR(max)
+ DECLARE nd_user_cursor CURSOR FOR SELECT name
+ FROM master.dbo.sysdatabases
+ WHERE name NOT IN ('master', 'tempdb')
+
+ OPEN nd_user_cursor
+ FETCH NEXT FROM nd_user_cursor INTO @dbname
+ WHILE @@FETCH_STATUS = 0
+ BEGIN
+ EXECUTE ("USE "+ @dbname+"; CREATE USER netdata_user FOR LOGIN netdata_user; ALTER DATABASE "+@dbname+" SET QUERY_STORE = ON ( QUERY_CAPTURE_MODE = ALL, DATA_FLUSH_INTERVAL_SECONDS = 900 )");
+ FETCH next FROM nd_user_cursor INTO @dbname;
+ END
+ CLOSE nd_user_cursor
+ DEALLOCATE nd_user_cursor
+ GO
+ ```
configuration:
file:
name: "netdata.conf"
@@ -1694,8 +1700,8 @@ modules:
enabled: true
title: ""
list:
- - name: One Instance
- description: An example configuration.
+ - name: Single Instance
+ description: An example configuration with one instance.
folding:
enabled: false
config: |
@@ -1704,7 +1710,7 @@ modules:
server = 127.0.0.1\\Dev, 1433
uid = netdata_user
pwd = 1ReallyStrongPasswordShouldBeInsertedHere
- - name: Two Instances
+ - name: Multiple Instances
description: An example configuration with two instances.
folding:
enabled: false
diff --git a/src/go/qq.md b/src/go/qq.md
new file mode 100644
index 00000000000000..e69de29bb2d1d6
From 1fa0a4f51ba6d74a1993cc70b5646f28d9e835f0 Mon Sep 17 00:00:00 2001
From: Netdata bot <43409846+netdatabot@users.noreply.github.com>
Date: Wed, 14 May 2025 09:11:02 +0200
Subject: [PATCH 36/51] Regenerate integrations docs (#20279)
Co-authored-by: ilyam8 <22274335+ilyam8@users.noreply.github.com>
---
.../integrations/ms_sql_server.md | 87 +++++++++++++++++--
1 file changed, 81 insertions(+), 6 deletions(-)
diff --git a/src/collectors/windows.plugin/integrations/ms_sql_server.md b/src/collectors/windows.plugin/integrations/ms_sql_server.md
index e022eaa71bf7ca..f50028dcab1c33 100644
--- a/src/collectors/windows.plugin/integrations/ms_sql_server.md
+++ b/src/collectors/windows.plugin/integrations/ms_sql_server.md
@@ -38,7 +38,10 @@ This collector only supports collecting metrics from a single instance of this i
#### Auto-Detection
-The collector automatically detects all of the metrics, no further configuration is required.
+The collector automatically discovers and monitors standard SQL Server metrics without additional setup. However, for transaction-level metrics, you must:
+
+- Complete the "Configure SQL Server for Monitoring" steps in the Setup -> Prerequisites section.
+- Configure a database connection (see Setup → Configuration → Examples).
#### Limits
@@ -130,14 +133,51 @@ There are no alerts configured by default for this integration.
### Prerequisites
-No action required.
+#### Configure SQL Server for Monitoring
+
+1. **Create Monitoring User**
+
+ Create an SQL Server user with the necessary permissions to collect monitoring data:
+
+ ```tsql
+ USE master;
+ CREATE LOGIN netdata_user WITH PASSWORD = '1ReallyStrongPasswordShouldBeInsertedHere';
+ CREATE USER netdata_user FOR LOGIN netdata_user;
+ GRANT CONNECT SQL TO netdata_user;
+ GRANT VIEW SERVER STATE TO netdata_user;
+ GO
+ ```
+
+2. **Enable Query Store**
+
+ Enable the [Query Store](https://learn.microsoft.com/en-us/sql/relational-databases/performance/monitoring-performance-by-using-the-query-store?view=sql-server-ver16) and grant access to the monitoring user on all relevant databases:
+
+ ```tsql
+ DECLARE @dbname NVARCHAR(max)
+ DECLARE nd_user_cursor CURSOR FOR SELECT name
+ FROM master.dbo.sysdatabases
+ WHERE name NOT IN ('master', 'tempdb')
+
+ OPEN nd_user_cursor
+ FETCH NEXT FROM nd_user_cursor INTO @dbname
+ WHILE @@FETCH_STATUS = 0
+ BEGIN
+ EXECUTE ("USE "+ @dbname+"; CREATE USER netdata_user FOR LOGIN netdata_user; ALTER DATABASE "+@dbname+" SET QUERY_STORE = ON ( QUERY_CAPTURE_MODE = ALL, DATA_FLUSH_INTERVAL_SECONDS = 900 )");
+ FETCH next FROM nd_user_cursor INTO @dbname;
+ END
+ CLOSE nd_user_cursor
+ DEALLOCATE nd_user_cursor
+ GO
+ ```
+
+
### Configuration
#### File
The configuration file name for this integration is `netdata.conf`.
-Configuration for this specific integration is located in the `[plugin:windows]` section within that file.
+Configuration for this specific integration is located in the `[plugin:windows:PerflibMSSQL]` section within that file.
The file format is a modified INI syntax. The general structure is:
@@ -158,13 +198,48 @@ sudo ./edit-config netdata.conf
```
#### Options
-
+These options allow the collector to connect to your MSSQL instance and collect transaction data from it.
| Name | Description | Default | Required |
|:----|:-----------|:-------|:--------:|
-| PerflibMSSQL | An option to enable or disable the data collection. | yes | no |
+| driver | ODBC driver used to connect to the SQL Server. | SQL Server | no |
+| server | Server address or instance name. | empty | yes |
+| address | Alternative to `server`; supports named pipes if the server supports them. | empty | yes |
+| uid | SQL Server user identifier. | empty | yes |
+| pwd | Password for the specified user. | empty | yes |
+| additional instances | Number of additional SQL Server instances to monitor. | 0 | no |
+| windows authentication | Set to yes to use Windows credentials instead of SQL Server authentication. | no | no |
#### Examples
-There are no configuration examples.
+##### Single Instance
+
+An example configuration with one instance.
+
+```yaml
+[plugin:windows:PerflibMSSQL]
+ driver = SQL Server
+ server = 127.0.0.1\\Dev, 1433
+ uid = netdata_user
+ pwd = 1ReallyStrongPasswordShouldBeInsertedHere
+
+```
+##### Multiple Instances
+
+An example configuration with two instances.
+
+```yaml
+[plugin:windows:PerflibMSSQL]
+ driver = SQL Server
+ server = 127.0.0.1\\Dev, 1433
+ uid = netdata_user
+ pwd = 1ReallyStrongPasswordShouldBeInsertedHere
+ additional instances = 1
+[plugin:windows:PerflibMSSQL1]
+ driver = SQL Server
+ server = 127.0.0.1\\Production, 1434
+ uid = netdata_user
+ pwd = AnotherReallyStrongPasswordShouldBeInsertedHere2
+
+```
From 5c8d4dd1aa60e2928fcb234813d7cadd2ca128ca Mon Sep 17 00:00:00 2001
From: kanelatechnical
Date: Wed, 14 May 2025 21:29:34 +0300
Subject: [PATCH 37/51] Improved StatsD documentation (#20282)
Co-authored-by: ilyam8
---
src/collectors/statsd.plugin/README.md | 1087 ++++++++++++------------
1 file changed, 532 insertions(+), 555 deletions(-)
diff --git a/src/collectors/statsd.plugin/README.md b/src/collectors/statsd.plugin/README.md
index 25313ffa3d19ec..849c0b0cdc4516 100644
--- a/src/collectors/statsd.plugin/README.md
+++ b/src/collectors/statsd.plugin/README.md
@@ -1,161 +1,257 @@
-# StatsD
+# StatsD Collector
-[StatsD](https://github.com/statsd/statsd) is a system to collect data from any application. Applications send metrics to it,
-usually via non-blocking UDP communication, and StatsD servers collect these metrics, perform a few simple calculations on
-them and push them to backend time-series databases.
+## What is StatsD?
-If you want to learn more about the StatsD protocol, we have written a
-[blog post](https://blog.netdata.cloud/introduction-to-statsd/) about it!
+[StatsD](https://github.com/statsd/statsd) is a system for collecting metrics from applications. Your applications send metrics to StatsD, usually via non-blocking UDP communication, and StatsD servers collect these metrics, perform simple calculations, and push them to time-series databases.
+Learn more about the [StatsD protocol.](https://blog.netdata.cloud/introduction-to-statsd/)
-Netdata is a fully featured statsd server. It can collect statsd formatted metrics, visualize
-them on its dashboards and store them in it's database for long-term retention.
+## Overview
-Netdata statsd is inside Netdata (an internal plugin, running inside the Netdata daemon), it is
-configured via `netdata.conf` and by-default listens on standard statsd port 8125. Netdata supports
-both TCP and UDP packets at the same time.
+| Feature | Description |
+|----------------------------|----------------------------------------------------------------------------|
+| **Metric Collection** | Collect real-time metrics from any application supporting StatsD protocol |
+| **Visualization** | View metrics as private charts (one per metric) or custom synthetic charts |
+| **Supported Metric Types** | Gauges, Counters, Meters, Timers, Histograms, Sets, Dictionaries |
+| **Transport** | Both UDP (low-overhead) and TCP (reliable, higher volume) supported |
+| **Performance** | Can collect millions of metrics per second using just 1 CPU core |
+| **Integration** | Built directly into Netdata - no extra installation needed |
+| **Language Support** | Use with Python, Node.js, Java, Go, Ruby, Shell scripts, and more |
-Since statsd is embedded in Netdata, it means you now have a statsd server embedded on all your servers.
+:::tip
-Netdata statsd is fast. It can collect several millions of metrics per second on modern hardware, using
-just 1 CPU core. The implementation uses two threads: one thread collects metrics, another thread updates
-the charts from the collected data.
+Want a hands-on example? [Jump to the K6 StatsD Walkthrough](#step-by-step-guide-monitoring-k6-with-statsd)
-## Available StatsD synthetic application charts
+:::
-Netdata ships with a few synthetic chart definitions to automatically present application metrics into a
-more uniform way. These synthetic charts are configuration files (you can create your own) that re-arrange
-statsd metrics into a more meaningful way.
+## Supported Metric Types Summary
-On synthetic charts, we can have alerts as with any metric and chart.
+| Metric Type | Purpose | Format | LLM Summary |
+|--------------|-----------------------------------------------|---------------------|-------------------------------------------------------------------------------------------------|
+| Gauges | Report current values | `name:value\|g` | Report latest value; can increment/decrement; supports sampling & tags. |
+| Counters | Count events | `name:value\|c/C/m` | Report rate & event count; `:value` optional (default 1); supports sampling & tags. |
+| Meters | Count events (rate-focused) | `name:value\|m` | Report rate & event count; `:value` optional (default 1); supports sampling & tags. |
+| Timers | Statistical analysis of values (duration) | `name:value\|ms` | Report min, max, avg, percentiles, median, stddev, count; supports sampling & tags. |
+| Histograms | Statistical analysis of values (distribution) | `name:value\|h` | Report min, max, avg, percentiles, median, stddev, count; supports sampling & tags. |
+| Sets | Count unique occurrences | `name:value\|s` | Report unique count & event count; sampling NOT supported; values as text; supports tags. |
+| Dictionaries | Count occurrences of distinct values | `name:value\|d` | Report counts per value & total updates; sampling NOT supported; values as text; supports tags. |
-- [K6 load testing tool](https://k6.io)
- - **Description:** k6 is a developer-centric, free and open-source load testing tool built for making performance testing a productive and enjoyable experience.
- - [Documentation](/src/collectors/statsd.plugin/k6.md)
- - [Configuration](https://github.com/netdata/netdata/blob/master/src/collectors/statsd.plugin/k6.conf)
-- [Asterisk](https://www.asterisk.org/)
- - **Description:** Asterisk is an Open Source PBX and telephony toolkit.
- - [Documentation](/src/collectors/statsd.plugin/asterisk.md)
- - [Configuration](https://github.com/netdata/netdata/blob/master/src/collectors/statsd.plugin/asterisk.conf)
+### How StatsD Works with Netdata
+
+```mermaid
+graph TD
+ A[Your Application] -->|Sends metrics| B[Netdata StatsD]
+ B -->|Creates| C[Private Charts]
+ B -->|Creates| D[Synthetic Charts]
+ B -->|Stores in| E[Database]
+
+ style A fill:#f9f9f9,stroke:#333,stroke-width:1px
+ style B fill:#4caf50,stroke:#333,stroke-width:1px,color:black
+ style C fill:#4caf50,stroke:#333,stroke-width:1px,color:black
+ style D fill:#4caf50,stroke:#333,stroke-width:1px,color:black
+ style E fill:#4caf50,stroke:#333,stroke-width:1px,color:black
+```
+
+## Netdata as a StatsD Server
+
+Netdata comes with a **fully-featured StatsD server built in**. You can:
+
+- Collect StatsD-formatted metrics
+- Visualize them on the Netdata dashboard
+- Store them in Netdata's database for long-term retention
+
+Since StatsD is embedded in Netdata, **you effectively have a StatsD server on every system where Netdata is installed.**
-## Metrics supported by Netdata
+:::note
-Netdata fully supports the StatsD protocol and also extends it to support more advanced Netdata specific use cases.
-All StatsD client libraries can be used with Netdata too.
+**Netdata's StatsD implementation is incredibly fast.** It can collect **several million metrics per second** on modern hardware using just one CPU core. The implementation uses two threads: one collects metrics, and the other updates the charts.
-- **Gauges**
+:::
- The application sends `name:value|g`, where `value` is any **decimal/fractional** number, StatsD reports the
- latest value collected and the number of times it was updated (events).
+## Pre-configured StatsD Applications
- The application may increment or decrement a previous value, by setting the first character of the value to
- `+` or `-` (so, the only way to set a gauge to an absolute negative value, is to first set it to zero).
+Netdata includes **synthetic chart definitions** to automatically present application metrics consistently. These are defined in configuration files that you can use as-is or customize.
- [Sampling rate](#sampling-rates) is supported.
- [Tags](#tags) are supported for changing chart units, family and dimension name.
+For synthetic charts, you can set up alerts just like with any other metric or chart.
- When a gauge is not collected and the setting is not to show gaps on the charts (the default), the last value will be shown, until a data collection event changes it.
+Currently available applications:
-- **Counters** and **Meters**
+- [K6 load testing tool](https://k6.io)
+ - **Description:** k6 is a developer-centric, free, and open-source load testing tool for performance testing
+ - [Documentation](https://github.com/netdata/netdata/blob/master/src/collectors/statsd.plugin/k6.md)
+ - [Configuration](https://github.com/netdata/netdata/blob/master/src/collectors/statsd.plugin/k6.conf)
+- [Asterisk](https://www.asterisk.org/)
+ - **Description:** Asterisk is an Open Source PBX and telephony toolkit
+ - [Documentation](https://github.com/netdata/netdata/blob/master/src/collectors/statsd.plugin/asterisk.md)
+ - [Configuration](https://github.com/netdata/netdata/blob/master/src/collectors/statsd.plugin/asterisk.conf)
+
+## Supported Metric Types
+
+Netdata fully supports the StatsD protocol and extends it for more advanced use cases. All StatsD client libraries are compatible with Netdata.
+
+```mermaid
+graph TD
+ A[Application] -->|Sends| B[Metrics]
+ B --> C[Gauges]
+ B --> D[Counters]
+ B --> E[Timers]
+ B --> F[Histograms]
+ B --> G[Sets]
+ B --> H[Dictionaries]
+
+ style A fill:#f9f9f9,stroke:#333,stroke-width:1px
+ style B fill:#4caf50,stroke:#333,stroke-width:1px,color:black
+ style C fill:#4caf50,stroke:#333,stroke-width:1px,color:black
+ style D fill:#4caf50,stroke:#333,stroke-width:1px,color:black
+ style E fill:#4caf50,stroke:#333,stroke-width:1px,color:black
+ style F fill:#4caf50,stroke:#333,stroke-width:1px,color:black
+ style G fill:#4caf50,stroke:#333,stroke-width:1px,color:black
+ style H fill:#4caf50,stroke:#333,stroke-width:1px,color:black
+```
+
+
+Gauges
+
- The application sends `name:value|c`, `name:value|C` or `name:value|m`, where `value` is a positive or negative **integer** number of events occurred, StatsD reports the **rate** and the number of times it was updated (events).
+Purpose: Report current values (e.g., cache memory used by an application server)
- `:value` can be omitted and StatsD will assume it is `1`. `|c`, `|C` and `|m` can be omitted and StatsD will assume it is `|m`. So, the application may send just `name` and StatsD will parse it as `name:1|m`.
+**Format:** `name:value|g`
- - Counters use `|c` (etsy/StatsD compatible) or `|C` (brubeck compatible)
- - Meters use `|m`
+- `value` can be any decimal/fractional number
+- StatsD reports the latest value and the number of updates (events)
+- You can increment/decrement previous values by prefixing with `+` or `-`
+- Sampling rate is supported
+- Tags can change chart units, family, and [dimension](https://learn.netdata.cloud/docs/developer-and-contributor-corner/glossary#d) name
+- When not collected, the last value will be shown if "show gaps" is disabled (default)
- [Sampling rate](#sampling-rates) is supported.
- [Tags](#tags) are supported for changing chart units, family and dimension name.
+
- When a counter or meter is not collected, StatsD **defaults** to showing a zero value, until a data collection event changes the value.
-
-- **Timers** and **Histograms**
+
+Counters and Meters
+
- The application sends `name:value|ms` or `name:value|h`, where `value` is any **decimal/fractional** number, StatsD reports **min**, **max**, **average**, **95th percentile**, **median** and **standard deviation** and the total number of times it was updated (events). Internally it also calculates the **sum**, which is available for synthetic charts.
+Purpose: Count events (e.g., number of file downloads)
- - Timers use `|ms`
- - Histograms use `|h`
-
- The only difference between the two, is the `units` of the charts, as timers report *milliseconds*.
+**Format:** `name:value|c`, `name:value|C`, or `name:value|m`
- [Sampling rate](#sampling-rates) is supported.
- [Tags](#tags) are supported for changing chart units and family.
+- `value` must be an integer (positive or negative)
+- StatsD reports the rate and update count (events)
+- `:value` can be omitted (defaults to 1)
+- `|c`, `|C` and `|m` can be omitted (defaults to `|m`)
+- Counters use `|c` (etsy/StatsD compatible) or `|C` (brubeck compatible)
+- Meters use `|m`
+- Sampling rate is supported
+- Tags can change chart units, family, and dimension name
+- When not collected, StatsD shows zero until a new value arrives
- When a counter or meter is not collected, StatsD **defaults** to showing a zero value, until a data collection event changes the value.
+
-- **Sets**
+
+Timers and Histograms
+
- The application sends `name:value|s`, where `value` is anything (**number or text**, leading and trailing spaces are removed), StatsD reports the number of unique values sent and the number of times it was updated (events).
+Purpose: Statistical analysis of values (e.g., request duration, file sizes)
- Sampling rate is **not** supported for Sets. `value` is always considered text (so `01` and `1` are considered different).
+**Format:** `name:value|ms` or `name:value|h`
- [Tags](#tags) are supported for changing chart units and family.
+- `value` can be any decimal/fractional number
+- StatsD reports min, max, average, 95th percentile, median, standard deviation, and update count
+- Timers use `|ms` and report in milliseconds
+- Histograms use `|h`
+- Sampling rate is supported
+- Tags can change chart units and family
+- When not collected, StatsD shows zero until a new value arrives
- When a set is not collected, Netdata **defaults** to showing a zero value, until a data collection event changes the value.
+
-- **Dictionaries**
+
+Sets
+
- The application sends `name:value|d`, where `value` is anything (**number or text**, leading and trailing spaces are removed), StatsD reports the number of events sent for each `value` and the total times `name` was updated (events).
+Purpose: Count unique occurrences (e.g., unique users, unique filenames)
- Sampling rate is **not** supported for Dictionaries. `value` is always considered text (so `01` and `1` are considered different).
+**Format:** `name:value|s`
- [Tags](#tags) are supported for changing chart units and family.
+- `value` can be any string or number (leading/trailing spaces are removed)
+- StatsD reports the count of unique values and update count
+- Sampling rate is NOT supported
+- Values are always treated as text (so `01` and `1` are different)
+- Tags can change chart units and family
+- When not collected, StatsD shows zero until a new value arrives
- When a set is not collected, Netdata **defaults** to showing a zero value, until a data collection event changes the value.
+
-#### Sampling Rates
+
+Dictionaries
+
-The application may append `|@sampling_rate`, where `sampling_rate` is a number from `0.0` to `1.0` in order for StatD to extrapolate the value and predict the total for the entire period. If the application reports to StatsD a value for 1/10th of the time, it can append `|@0.1` to the metrics it sends to statsd.
+Purpose: Count occurrences of distinct values
-#### Tags
+**Format:** `name:value|d`
-The application may append `|#tag1:value1,tag2:value2,tag3:value3` etc, where `tagX` and `valueX` are strings. `:valueX` can be omitted.
+- `value` can be any string or number (leading/trailing spaces are removed)
+- StatsD reports the count of events for each `value` and total updates
+- Sampling rate is NOT supported
+- Values are always treated as text (so `01` and `1` are different)
+- Tags can change chart units and family
+- When not collected, StatsD shows zero until a new value arrives
-Currently, Netdata uses only 2 tags:
+
- * `units=string` which sets the units of the chart that is automatically generated
- * `family=string` which sets the family of the chart that is automatically generated (the family is the submenu of the dashboard)
- * `name=string` which sets the name of the dimension of the chart that is automatically generated (only for counters, meters, gauges)
+## Advanced Features
-Other tags are parsed, but currently are ignored.
+
+Sampling Rates
+
-Charts are not updated to change units or dimension names once they are created. So, either send the tags on every event, or use the special `zinit` value to initiaze the charts at the beginning. `zinit` is a special value that can be used on any chart, to have netdata initialize the charts, without actually setting any values to them. So, instead of sending `my.metric:VALUE|c|#units=bytes,name=size` every time, the application can send at the beginning `my.metric:zinit|c|#units=bytes,name=size` and then `my.metric:VALUE|c`.
+You can append `|@sampling_rate` to metrics, where `sampling_rate` is between 0.0 and 1.0. This tells StatsD to extrapolate the value for the entire period.
-#### Overlapping metrics
+Example: If your application reports data for only 1/10th of events, append `|@0.1` to have StatsD calculate the total.
+
-Netdata's StatsD server maintains different indexes for each of the metric types supported. This means the same metric `name` may exist under different types concurrently.
+
+Tags
+
-#### How to name your metrics
+You can append `|#tag1:value1,tag2:value2,tag3:value3` to metrics. Netdata currently uses these tags:
-A good practice is to name your metrics like `application.operation.metric`, where:
+- `units=string` - Sets the units of the automatically generated chart
+- `family=string` - Sets the family (dashboard submenu) of the chart
+- `name=string` - Sets the [dimension](https://learn.netdata.cloud/docs/developer-and-contributor-corner/glossary#d) name (for counters, meters, gauges only)
-- `application` is the application name - Netdata will automatically create a dashboard section based on the first keyword of the metrics, so you can have all your applications in different sections.
-- `operation` is the operation your application is executing, like `dbquery`, `request`, `response`, etc.
-- `metric` is anything you want to name your metric as. Netdata will automatically append the metric type (meter, counter, gauge, set, dictionary, timer, histogram) to the generated chart.
+:::tip
-Using [Tags](#tags) you can also change the submenus of the dashboard, the units of the charts and for meters, counters and gauges, the name of dimension. So, you can have a usable default view without using [Synthetic StatsD charts](#synthetic-statsd-charts)
+For consistency, either send tags with every event or use the special `zinit` value to initialize charts. For example, send `my.metric:zinit|c|#units=bytes,name=size` at the beginning, then just `my.metric:VALUE|c` afterward.
-#### Multiple metrics per packet
+:::
-Netdata accepts multiple metrics per packet if each is terminated with a newline (`\n`) at the end.
+
-#### TCP packets
+
+Sending Multiple Metrics
+
-Netdata listens for both TCP and UDP packets. For TCP, is it important to always append `\n` on each metric, as Netdata will use the newline character to detect if a metric is split into multiple TCP packets.
+You can send multiple metrics in a single packet by separating them with newlines (`\n`).
+#### TCP Packets
-#### UDP packets
+Netdata listens for both TCP and UDP packets. With TCP, always append `\n` to each metric so Netdata can detect metrics split across multiple TCP packets.
-When sending multiple metrics over a single UDP message, it is important not to exceed the network MTU, which is usually 1500 bytes.
+#### UDP Packets
-Netdata will accept UDP packets up to 9000 bytes, but the underlying network will not exceed MTU.
+When sending multiple metrics in a UDP message, keep the total size under the network MTU (usually 1500 bytes).
-> You can read more about the network maximum transmission unit(MTU) in this cloudflare [article](https://www.cloudflare.com/en-gb/learning/network-layer/what-is-mtu/).
+:::important
+
+Netdata will accept UDP packets up to 9000 bytes, but your network equipment may fragment any packets exceeding the MTU.
+
+:::
+
+
## Configuration
-You can find the configuration at `/etc/netdata/netdata.conf`:
+You can find the StatsD configuration in `/etc/netdata/netdata.conf`:
```
[statsd]
@@ -181,126 +277,140 @@ You can find the configuration at `/etc/netdata/netdata.conf`:
# bind to = udp:localhost:8125 tcp:localhost:8125
```
-### StatsD main config options
-
-- `enabled = yes|no`
+## Configuration Architecture
- controls if StatsD will be enabled for this Netdata. The default is enabled.
+### How the StatsD Configuration Works
-- `default port = 8125`
+Netdata's StatsD chart system uses three key sections in its configuration:
- controls the default port StatsD will use if no port is defined in the following setting.
+```mermaid
+graph TD
+ A[[statsd.d config]] --> B[app]
+ A --> C[dictionary]
+ A --> D[chart definitions]
+ B --> E[metric filtering]
+ C --> F[renaming for display]
+ D --> G[chart family/context/units/priorities]
+
+ classDef default fill:#4caf50,stroke:#333,stroke-width:1px,color:black
+ classDef config fill:#f9f9f9,stroke:#333,stroke-width:1px
+
+ class A config
+ class B,C,D,E,F,G default
+```
-- `bind to = udp:localhost tcp:localhost`
+The diagram shows how the configuration flows:
- is a space separated list of IPs and ports to listen to. The format is `PROTOCOL:IP:PORT` - if `PORT` is omitted, the `default port` will be used. If `IP` is IPv6, it needs to be enclosed in `[]`. `IP` can also be `*` (to listen on all IPs) or even a hostname.
+1. The central `statsd.d config` connects to **three main components**:
+ - The **application** configuration
+ - The **dictionary** system
+ - **Chart definitions**
-- `update every (flushInterval) = 1s` controls the frequency StatsD will push the collected metrics to Netdata charts.
+2. Each of these components serves a specific purpose:
+ - The **app** component handles **metric filtering**
+ - The **dictionary** manages **renaming metrics** for display
+ - **Chart definitions determine properties** like family, context, units, and priorities
-- `decimal detail = 1000` controls the number of fractional digits in gauges and histograms. Netdata collects metrics using signed 64-bit integers and their fractional detail is controlled using multipliers and divisors. This setting is used to multiply all collected values to convert them to integers and is also set as the divisors, so that the final data will be a floating point number with this fractional detail (1000 = X.0 - X.999, 10000 = X.0 - X.9999, etc).
+This structure allows for flexible and powerful metric configuration within Netdata's StatsD implementation.
-The rest of the settings are discussed below.
+### Key Configuration Options
-## StatsD charts
+- **`enabled = yes|no`** - Controls whether StatsD is enabled
+- **`default port = 8125`** - The default port if not specified in binding
+- **`bind to = udp:localhost tcp:localhost`** - Space-separated list of IPs and ports to listen on
+- **`update every (flushInterval) = 1s`** - How often StatsD updates Netdata charts
+- **`decimal detail = 1000`** - Controls decimal precision in gauges and histograms
-Netdata can visualize StatsD collected metrics in 2 ways:
+## StatsD Charts
-1. Each metric gets its own **private chart**. This is the default and does not require any configuration. You can adjust the default parameters.
+Netdata can visualize StatsD collected metrics in two ways:
-2. **Synthetic charts** can be created, combining multiple metrics, independently of their metric types. For this type of charts, special configuration is required, to define the chart title, type, units, its dimensions, etc.
+1. **Private charts** - Each metric gets its own chart (default, no configuration needed)
+2. **Synthetic charts** - Combine multiple metrics into custom charts (requires configuration)
-### Private metric charts
+### Private Metric Charts
-Private charts are controlled with `create private charts for metrics matching = *`. This setting accepts a space-separated list of [simple patterns](/src/libnetdata/simple_pattern/README.md). Netdata will create private charts for all metrics **by default**.
+Private charts are controlled with `create private charts for metrics matching = *`. This setting accepts a space-separated list of [simple patterns](https://github.com/netdata/netdata/blob/master/src/libnetdata/simple_pattern/README.md). By default, Netdata creates private charts for all metrics.
-For example, to render charts for all `myapp.*` metrics, except `myapp.*.badmetric`, use:
+Example: To create charts for all `myapp.*` metrics except `myapp.*.badmetric`:
```
create private charts for metrics matching = !myapp.*.badmetric myapp.*
```
-You can specify Netdata StatsD to have a different `memory mode` than the rest of the Netdata Agent. You can read more about `memory mode` in the [documentation](/src/database/README.md).
+You can configure a different memory mode specifically for StatsD charts:
-The default behavior is to use the same settings as the rest of the Netdata Agent. If you wish to change them, edit the following settings:
- `private charts memory mode`
- `private charts history`
-### Optimize private metric charts storage
-
-For optimization reasons, Netdata imposes a hard limit on private metric charts. The limit is set via the `max private charts hard limit` setting (which defaults to 1000 charts). Metrics above this hard limit are still collected, but they can only be used in synthetic charts (once a metric is added to chart, it will be sent to backend servers too).
-
-If you have many ephemeral metrics collected (i.e. that you collect values for a certain amount of time), you can set the configuration option `set charts as obsolete after`. Setting a value in seconds here, means that Netdata will mark those metrics (and their private charts) as obsolete after the specified time has passed since the last sent metric value. Those charts will later be deleted according to the setting in `cleanup obsolete charts after`. Setting `set charts as obsolete after` to 0 (which is also the default value) will disable this functionality.
+
+Private Chart Examples
+
-Example private charts (automatically generated without any configuration):
+Example of a gauge metric chart:
-#### Counters
-
-- Scope: **count the events of something** (e.g. number of file downloads)
-- Format: `name:INTEGER|c` or `name:INTEGER|C` or `name|c`
-- StatsD increments the counter by the `INTEGER` number supplied (positive, or negative).
-
-
-
-#### Gauges
+
-- Scope: **report the value of something** (e.g. cache memory used by the application server)
-- Format: `name:FLOAT|g`
-- StatsD remembers the last value supplied, and can increment or decrement the latest value if `FLOAT` begins with `+` or `-`.
+Example of a histogram metric chart:
-
+
-#### histograms
+Histogram chart with "sum" unselected:
-- Scope: **statistics on a size of events** (e.g. statistics on the sizes of files downloaded)
-- Format: `name:FLOAT|h`
-- StatsD maintains a list of all the values supplied and provides statistics on them.
+
-
+Example of a counter metric chart:
-The same chart with `sum` unselected, to show the detail of the dimensions supported:
-
+
-#### Meters
+Example of a meter metric chart:
-This is identical to `counter`.
+
-- Scope: **count the events of something** (e.g. number of file downloads)
-- Format: `name:INTEGER|m` or `name|m` or just `name`
-- StatsD increments the counter by the `INTEGER` number supplied (positive, or negative).
+Example of a set metric chart:
-
+
-#### Sets
+Example of a timer metric chart:
-- Scope: **count the unique occurrences of something** (e.g. unique filenames downloaded, or unique users that downloaded files)
-- Format: `name:TEXT|s`
-- StatsD maintains a unique index of all values supplied, and reports the unique entries in it.
+
+
-
+#### Storage Optimization
-#### Timers
+For performance reasons, Netdata limits private charts. The `max private charts hard limit` (default: 1000) controls this. Metrics above this limit can still be used in synthetic charts.
-- Scope: **statistics on the duration of events** (e.g. statistics for the duration of file downloads)
-- Format: `name:FLOAT|ms`
-- StatsD maintains a list of all the values supplied and provides statistics on them.
+For ephemeral metrics, use `set charts as obsolete after` and `cleanup obsolete charts after` to automatically clean up charts that haven't received data recently.
-
+### Synthetic StatsD Charts
-### Synthetic StatsD charts
+Use synthetic charts to create dedicated sections on the dashboard to render your StatsD charts.
-Use synthetic charts to create dedicated sections on the dashboard to render your StatsD charts.
+```mermaid
+graph TD
+ A[StatsD Metrics] --> B[App]
+ A --> C[Dictionary]
+ B --> D[Chart]
+ C --> D
+ D --> E[Dashboard]
+
+ style A fill:#f9f9f9,stroke:#333,stroke-width:1px
+ style B fill:#4caf50,stroke:#333,stroke-width:1px,color:black
+ style C fill:#4caf50,stroke:#333,stroke-width:1px,color:black
+ style D fill:#4caf50,stroke:#333,stroke-width:1px,color:black
+ style E fill:#4caf50,stroke:#333,stroke-width:1px,color:black
+```
-Synthetic charts are organized in
+Synthetic charts are organized in:
-- **application** aka section in Netdata Dashboard.
-- **charts for each application** aka family in Netdata Dashboard.
-- **StatsD metrics for each chart** /aka charts and context Netdata Dashboard.
+- **Application** - Section in Netdata Dashboard
+- **Charts for each application** - Family/submenu in the Dashboard
+- **StatsD metrics for each chart** - Charts and context in the Dashboard
-> You can read more about how the Netdata Agent organizes information in the relevant [documentation](/src/web/README.md)
+#### Basic Configuration Structure
-For each application you need to create a `.conf` file in `/etc/netdata/statsd.d`.
+For example, to monitor the application `myapp` using StatsD and Netdata, create the file `/etc/netdata/statsd.d/myapp.conf`:
-For example, if you want to monitor the application `myapp` using StatsD and Netdata, create the file `/etc/netdata/statsd.d/myapp.conf`, with this content:
```
[app]
name = myapp
@@ -313,8 +423,8 @@ For example, if you want to monitor the application `myapp` using StatsD and Net
m1 = metric1
m2 = metric2
-# replace 'mychart' with the chart id
-# the chart will be named: myapp.mychart
+# Chart definition with ID 'mychart'
+# The chart will be named: myapp.mychart
[mychart]
name = mychart
title = my chart title
@@ -327,116 +437,113 @@ For example, if you want to monitor the application `myapp` using StatsD and Net
dimension = myapp.metric2 m2
```
-Using the above configuration `myapp` should get its own section on the dashboard, having one chart with 2 dimensions.
+Using this configuration, `myapp` gets its own dashboard section with one chart containing two [dimensions](https://learn.netdata.cloud/docs/developer-and-contributor-corner/glossary#d).
-`[app]` starts a new application definition. The supported settings in this section are:
+When you send metrics like `foo:10|g` and `bar:20|g`, you'll see both private charts and your synthetic chart.
-- `name` defines the name of the app.
-- `metrics` is a Netdata [simple pattern](/src/libnetdata/simple_pattern/README.md). This pattern should match all the possible StatsD metrics that will be participating in the application `myapp`.
-- `private charts = yes|no`, enables or disables private charts for the metrics matched.
-- `gaps when not collected = yes|no`, enables or disables gaps on the charts of the application in case that no metrics are collected.
-- `memory mode` sets the memory mode for all charts of the application. The default is the global default for Netdata (not the global default for StatsD private charts). We suggest not to use this (we have commented it out in the example) and let your app use the global default for Netdata, which is our dbengine.
+
+Synthetic Chart Example
+
-- `history` sets the size of the round-robin database for this application. The default is the global default for Netdata (not the global default for StatsD private charts). This is only relevant if you use `memory mode = save`. Read more on our documentation for the Agent's [Database](/src/database/README.md).
+Example of a synthetic chart combining multiple metrics:
-`[dictionary]` defines name-value associations. These are used to renaming metrics, when added to synthetic charts. Metric names are also defined at each `dimension` line. However, using the dictionary dimension names can be declared globally, for each app and is the only way to rename dimensions when using patterns. Of course the dictionary can be empty or missing.
+
+
-Then, add any number of charts. Each chart should start with `[id]`. The chart will be called `app_name.id`. `family` controls the submenu on the dashboard. `context` controls the alert templates. `priority` controls the ordering of the charts on the dashboard. The rest of the settings are informational.
+#### Application Section Options
-Add any number of metrics to a chart, using `dimension` lines. These lines accept 5 space separated parameters:
+The `[app]` section defines the application and has these options:
-1. the metric name, as it is collected (it has to be matched by the `metrics =` pattern of the app)
-2. the dimension name, as it should be shown on the chart
-3. an optional selector (type) of the value to shown (see below)
-4. an optional multiplier
-5. an optional divider
-6. optional flags, space separated and enclosed in quotes. All the external plugins `DIMENSION` flags can be used. Currently, the only usable flag is `hidden`, to add the dimension, but not show it on the dashboard. This is usually needed to have the values available for percentage calculation, or use them in alerts.
+:::note
-So, the format is this:
+- **name** - Defines the application name
+- **metrics** - [Simple pattern](https://github.com/netdata/netdata/blob/master/src/libnetdata/simple_pattern/README.md) matching all metrics for this app
+- **private charts** - Enable/disable private charts for matched metrics (yes|no)
+- **gaps when not collected** - Show gaps when no metrics are collected (yes|no)
+- **memory mode** - Sets memory mode for application charts (optional, default is global Netdata setting)
+- **history** - Size of round-robin database (optional, only relevant with `memory mode = save`)
-```
-dimension = [pattern] METRIC NAME TYPE MULTIPLIER DIVIDER OPTIONS
-```
+:::
-`pattern` is a keyword. When set, `METRIC` is expected to be a Netdata [simple pattern](/src/libnetdata/simple_pattern/README.md) that will be used to match all the StatsD metrics to be added to the chart. So, `pattern` automatically matches any number of StatsD metrics, all of which will be added as separate chart dimensions.
+#### Dictionary Section
-`TYPE`, `MULTIPLIER`, `DIVIDER` and `OPTIONS` are optional.
+`[dictionary]` defines name-value pairs for renaming metrics in synthetic charts. This allows you to:
-`TYPE` can be:
+- Define dimension names globally for the whole app
+- Rename dimensions when using patterns
+- Create more human-readable names for technical metrics
-- `events` to show the number of events received by StatsD for this metric
-- `last` to show the last value, as calculated at the flush interval of the metric (the default)
+The dictionary can be empty or omitted if not needed.
-Then for histograms and timers the following types are also supported:
+#### Chart Definitions
-- `min`, show the minimum value
-- `max`, show the maximum value
-- `sum`, show the sum of all values
-- `average` (same as `last`)
-- `percentile`, show the 95th percentile (or any other percentile, as configured at StatsD global config)
-- `median`, show the median of all values (i.e. sort all values and get the middle value)
-- `stddev`, show the standard deviation of the values
+Each chart starts with `[id]` and will be named `app_name.id`. Key settings for charts:
-#### Example synthetic charts
+:::note
-StatsD metrics: `foo` and `bar`.
+- **family** - Controls dashboard submenu placement
+- **context** - Controls alert templates
+- **priority** - Controls chart ordering
+- **type** - Chart visualization type (line, area, stacked)
+- **units** - Chart measurement units
-Contents of file `/etc/netdata/statsd.d/foobar.conf`:
+:::
-```
-[app]
- name = foobarapp
- metrics = foo bar
- private charts = yes
-
-[foobar_chart1]
- title = Hey, foo and bar together
- family = foobar_family
- context = foobarapp.foobars
- units = foobars
- type = area
- dimension = foo 'foo me' last 1 1
- dimension = bar 'bar me' last 1 1
-```
-
-Metrics sent to statsd: `foo:10|g` and `bar:20|g`.
+#### [Dimension](https://learn.netdata.cloud/docs/developer-and-contributor-corner/glossary#d) Format
-Private charts:
+Add metrics to charts using `dimension` lines with this format:
-
+```
+dimension = [pattern] METRIC NAME TYPE MULTIPLIER DIVIDER OPTIONS
+```
-Synthetic chart:
+Where:
-
+1. **METRIC** - The metric name as collected (must match the `metrics` pattern)
+2. **NAME** - The dimension name to display (can use dictionary for renaming)
+3. **TYPE** - (Optional) Value selector like `events`, `last`, `min`, `max`, etc.
+4. **MULTIPLIER** - (Optional) Value to multiply the metric by
+5. **DIVIDER** - (Optional) Value to divide the metric by
+6. **OPTIONS** - (Optional) Flags like `hidden` to include but not display a dimension
-#### Renaming StatsD synthetic charts' metrics
+
+Renaming StatsD Synthetic Charts' Metrics
+
-You can define a dictionary to rename metrics sent by StatsD clients. This enables you to send response `"200"` and Netdata visualize it as `succesful connection`
+You can define a dictionary to rename metrics sent by StatsD clients. This allows you to transmit the response code `200` while Netdata displays it as `successful connection`.
The `[dictionary]` section accepts any number of `name = value` pairs.
Netdata uses this dictionary as follows:
-1. When a `dimension` has a non-empty `NAME`, that name is looked up at the dictionary.
+1. When a `dimension` has a non-empty `NAME`, that name is looked up in the dictionary
+2. If the above lookup finds nothing, the original StatsD metric name is looked up
+3. If any lookup succeeds, Netdata uses the dictionary's `value` for the dimension name
-2. If the above lookup gives nothing, or the `dimension` has an empty `NAME`, the original StatsD metric name is looked up at the dictionary.
+The dimensions will have the original StatsD metric name as ID and the dictionary value as name.
-3. If any of the above succeeds, Netdata uses the `value` of the dictionary, to set the name of the dimension. The dimensions will have as ID the original StatsD metric name, and as name, the dictionary value.
+You can use the dictionary in two ways:
-Use the dictionary in 2 ways:
+1. Set `dimension = myapp.metric1 ''` and have in the dictionary `myapp.metric1 = metric1 name`
+2. Set `dimension = myapp.metric1 'm1'` and have in the dictionary `m1 = metric1 name`
-1. set `dimension = myapp.metric1 ''` and have at the dictionary `myapp.metric1 = metric1 name`
-2. set `dimension = myapp.metric1 'm1'` and have at the dictionary `m1 = metric1 name`
+In both cases, the dimension will be added with ID `myapp.metric1` and named `metric1 name`. In alerts, you can reference it as either `${myapp.metric1}` or `${metric1 name}`.
-In both cases, the dimension will be added with ID `myapp.metric1` and will be named `metric1 name`. So, in alerts use either of the 2 as `${myapp.metric1}` or `${metric1 name}`.
+:::note
-> keep in mind that if you add multiple times the same StatsD metric to a chart, Netdata will append `TYPE` to the dimension ID, so `myapp.metric1` will be added as `myapp.metric1_last` or `myapp.metric1_events`, etc. If you add multiple times the same metric with the same `TYPE` to a chart, Netdata will also append an incremental counter to the dimension ID, i.e. `myapp.metric1_last1`, `myapp.metric1_last2`, etc.
+If you add the same StatsD metric multiple times to a chart, Netdata will append `TYPE` to the dimension ID, so `myapp.metric1` will become `myapp.metric1_last` or `myapp.metric1_events`. If you add the same metric with the same `TYPE` multiple times, Netdata will also append an incremental counter, e.g., `myapp.metric1_last1`, `myapp.metric1_last2`, etc.
-#### Dimension patterns
+:::
-Netdata allows adding multiple dimensions to a chart, by matching the StatsD metrics with a Netdata simple pattern.
+
-Assume we have an API that provides StatsD metrics for each response code per method it supports, like these:
+
+Dimension Patterns
+
+
+Netdata allows adding multiple dimensions to a chart by matching StatsD metrics with a **pattern**.
+
+For example, if you have an API that provides StatsD metrics for each response code per method:
```
myapp.api.get.200
@@ -453,7 +560,7 @@ myapp.api.all.400
myapp.api.all.500
```
-In order to add all the response codes of `myapp.api.get` to a chart, we simply make the following configuration:
+To add all response codes of `myapp.api.get` to a chart:
```
[api_get_responses]
@@ -461,9 +568,9 @@ In order to add all the response codes of `myapp.api.get` to a chart, we simply
dimension = pattern 'myapp.api.get.* '' last 1 1
```
-The above will add dimension named `200`, `400` and `500`. Netdata extracts the wildcard part of the metric name - so the dimensions will be named with whatever the `*` matched.
+This adds dimensions named `200`, `400`, and `500`. Netdata extracts the wildcard part of the metric name.
-You can rename the dimensions with this:
+You can rename these dimensions with the dictionary:
```
[dictionary]
@@ -476,9 +583,11 @@ You can rename the dimensions with this:
dimension = pattern 'myapp.api.get.* 'get.' last 1 1
```
-Note that we added a `NAME` to the dimension line with `get.`. This is prefixed to the wildcarded part of the metric name, to compose the key for looking up the dictionary. So `500` became `get.500` which was looked up to the dictionary to find value `500 cannot connect to db`. This way we can have different dimension names, for each of the API methods (i.e. `get.500 = 500 cannot connect to db` while `post.500 = 500 cannot write to disk`).
+The `NAME` prefix `get.` is combined with the wildcarded part to look up in the dictionary. So `500` becomes `get.500`, which is looked up to find `500 cannot connect to db`.
-To add all 200s across all API methods to a chart, you can do this:
+### More Pattern Examples
+
+To add all 200s across all API methods to a chart:
```
[ok_by_method]
@@ -486,9 +595,9 @@ To add all 200s across all API methods to a chart, you can do this:
dimension = pattern 'myapp.api.*.200 '' last 1 1
```
-The above will add `get`, `post`, `del` and `all` to the chart.
+This adds `get`, `post`, `del`, and `all` to the chart.
-If `all` is not wanted (a `stacked` chart does not need the `all` dimension, since the sum of the dimensions provides the total), the line should be:
+To exclude the `all` method:
```
[ok_by_method]
@@ -496,9 +605,7 @@ If `all` is not wanted (a `stacked` chart does not need the `all` dimension, sin
dimension = pattern '!myapp.api.all.* myapp.api.*.200 '' last 1 1
```
-With the above, all methods except `all` will be added to the chart.
-
-To automatically rename the methods, you can use this:
+To rename methods automatically:
```
[dictionary]
@@ -511,150 +618,130 @@ To automatically rename the methods, you can use this:
dimension = pattern '!myapp.api.all.* myapp.api.*.200 'method.' last 1 1
```
-Using the above, the dimensions will be added as `GET`, `ADD` and `DELETE`.
+This adds dimensions named `GET`, `ADD`, and `DELETE`.
+
-## StatsD examples
+## Using StatsD with Different Languages
-### Python
+
+Python
+
-It's really easy to instrument your python application with StatsD, for example using [jsocol/pystatsd](https://github.com/jsocol/pystatsd).
+Using [jsocol/pystatsd](https://github.com/jsocol/pystatsd):
```python
import statsd
+
c = statsd.StatsClient('localhost', 8125)
-c.incr('foo') # Increment the 'foo' counter.
+c.incr('foo') # Increment the 'foo' counter.
for i in range(100000000):
- c.incr('bar')
- c.incr('foo')
- if i % 3:
- c.decr('bar')
- c.timing('stats.timed', 320) # Record a 320ms 'stats.timed'.
+ c.incr('bar')
+ c.incr('foo')
+ if i % 3:
+ c.decr('bar')
+ c.timing('stats.timed', 320) # Record a 320ms 'stats.timed'.
```
-You can find detailed documentation in their [documentation page](https://statsd.readthedocs.io/en/v3.3/).
+See the [full documentation](https://statsd.readthedocs.io/en/v3.3/) for more details.
+
-### Javascript and Node.js
+
+JavaScript and Node.js
+
-Using the client library by [sivy/node-statsd](https://github.com/sivy/node-statsd), you can easily embed StatsD into your Node.js project.
+Using [sivy/node-statsd](https://github.com/sivy/node-statsd):
```javascript
var StatsD = require('node-statsd'),
- client = new StatsD();
+ client = new StatsD();
- // Timing: sends a timing command with the specified milliseconds
- client.timing('response_time', 42);
+// Timing: sends a timing command with the specified milliseconds
+client.timing('response_time', 42);
- // Increment: Increments a stat by a value (default is 1)
- client.increment('my_counter');
+// Increment: Increments a stat by a value (default is 1)
+client.increment('my_counter');
- // Decrement: Decrements a stat by a value (default is -1)
- client.decrement('my_counter');
+// Decrement: Decrements a stat by a value (default is -1)
+client.decrement('my_counter');
- // Using the callback
- client.set(['foo', 'bar'], 42, function(error, bytes){
+// Using the callback
+client.set(['foo', 'bar'], 42, function (error, bytes) {
//this only gets called once after all messages have been sent
- if(error){
- console.error('Oh noes! There was an error:', error);
+ if (error) {
+ console.error('Oh noes! There was an error:', error);
} else {
- console.log('Successfully sent', bytes, 'bytes');
+ console.log('Successfully sent', bytes, 'bytes');
}
- });
-
- // Sampling, tags and callback are optional and could be used in any combination
- client.histogram('my_histogram', 42, 0.25); // 25% Sample Rate
- client.histogram('my_histogram', 42, ['tag']); // User-defined tag
- client.histogram('my_histogram', 42, next); // Callback
- client.histogram('my_histogram', 42, 0.25, ['tag']);
- client.histogram('my_histogram', 42, 0.25, next);
- client.histogram('my_histogram', 42, ['tag'], next);
- client.histogram('my_histogram', 42, 0.25, ['tag'], next);
-```
-### Other languages
-
-You can also use StatsD with:
-- Golang, thanks to [alexcesaro/statsd](https://github.com/alexcesaro/statsd)
-- Ruby, thanks to [reinh/statsd](https://github.com/reinh/statsd)
-- Java, thanks to [DataDog/java-dogstatsd-client](https://github.com/DataDog/java-dogstatsd-client)
-
+});
-### Shell
-
-Getting the proper support for a programming language is not always easy, but the Unix shell is available on most Unix systems. You can use shell and `nc` to instrument your systems and send metric data to Netdata's StatsD implementation.
-
-Using the method you can send metrics from any script. You can generate events like: backup.started, backup.ended, backup.time, or even tail logs and convert them to metrics.
-
-> **IMPORTANT**:
->
-> To send StatsD messages you need from the `netcat` package, the `nc` command.
-> There are multiple versions of this package. Please try to experiment with the `nc` command you have available on your right system, to find the right parameters.
->
-> In the examples below, we assume the `openbsd-netcat` is installed.
+// Sampling, tags and callback are optional and could be used in any combination
+client.histogram('my_histogram', 42, 0.25); // 25% Sample Rate
+client.histogram('my_histogram', 42, ['tag']); // User-defined tag
+client.histogram('my_histogram', 42, next); // Callback
+client.histogram('my_histogram', 42, 0.25, ['tag']);
+client.histogram('my_histogram', 42, 0.25, next);
+client.histogram('my_histogram', 42, ['tag'], next);
+client.histogram('my_histogram', 42, 0.25, ['tag'], next);
+```
-If you plan to send short StatsD events at sporadic occasions, use UDP. The messages should not be too long (remember, most networks support up to 1500 bytes MTU, which is also the limit for StatsD messages over UDP). The good thing is that using UDP will not block your script, even if the StatsD server is not there (UDP messages are "fire-and-forget").
+
+
+Other Languages
+
-For UDP use this:
+StatsD clients are available for many languages:
-```sh
-echo "APPLICATION.METRIC:VALUE|TYPE" | nc -u -w 0 localhost 8125
-```
+- Golang: [alexcesaro/statsd](https://github.com/alexcesaro/statsd)
+- Ruby: [reinh/statsd](https://github.com/reinh/statsd)
+- Java: [DataDog/java-dogstatsd-client](https://github.com/DataDog/java-dogstatsd-client)
-`-u` turns on UDP, `-w 0` tells `nc` not to wait for a response from StatsD (idle time to close the connection).
+
-where:
+
+Shell Script
+
-- `APPLICATION` is any name for your application
-- `METRIC` is the name for the specific metric
-- `VALUE` is the value for that metric (**meters**, **counters**, **gauges**, **timers** and **histograms** accept integer/decimal/fractional numbers, **sets** and **dictionaries** accept strings)
-- `TYPE` is one of `m`, `c`, `g`, `ms`, `h`, `s`, `d` to define the metric type.
+You can use the Unix shell with `nc` to send StatsD metrics from any script.
-For tailing a log and converting it to metrics, do something like this:
+:::important
-```sh
-tail -f some.log | awk 'awk commands to parse the log and format statsd metrics' | nc -N -w 120 localhost 8125
-```
+You'll need the `netcat` package with the `nc` command. Different versions have different parameters, so experiment to find what works on your system. The examples below assume `openbsd-netcat` is installed.
-`-N` tells `nc` to close the socket once it receives EOF on its input. `-w 120` tells `nc` to stop if the connection is idle for 120 seconds. The timeout is needed to stop the `nc` command if you restart Netdata while `nc` is connected to it. Without it, `nc` will sit idle forever.
+:::
-When you embed the above commands to a script, you may notice that all the metrics are sent to StatsD with a delay. They are buffered in the pipes `|`. You can turn them to real-time by prepending each command with `stdbuf -i0 -oL -eL command to be run`, like this:
+#### Using UDP (for sporadic events)
```sh
-stdbuf -i0 -oL -eL tail -f some.log |\
- stdbuf -i0 -oL -eL awk 'awk commands to parse the log and format statsd metrics' |\
- stdbuf -i0 -oL -eL nc -N -w 120 localhost 8125
+echo "APPLICATION.METRIC:VALUE|TYPE" | nc -u -w 0 localhost 8125
```
-If you use `mawk` you also need to run awk with `-W interactive`.
+- `-u` enables UDP
+- `-w 0` tells `nc` not to wait for a response
Examples:
-To set `myapp.used_memory` as gauge to value `123456`, use:
-
```sh
+# Set a gauge value
echo "myapp.used_memory:123456|g|#units:bytes" | nc -u -w 0 localhost 8125
-```
-To increment `myapp.files_sent` by `10`, as a counter, use:
+# Increment a counter
+echo "myapp.files_sent:10|c|#units:files" | nc -u -w 0 localhost 8125
-```sh
-echo "myapp.files_sent:10|c|#units:files" | nc -u -w 0 localhost 8125
+# Send multiple metrics
+printf "myapp.used_memory:123456|g|#units:bytes\nmyapp.files_sent:10|c|#units:files\n" | nc -u -w 0 localhost 8125
```
-You can send multiple metrics like this:
-
-```sh
-# send multiple metrics via UDP
-printf "myapp.used_memory:123456|g|#units:bytes\nmyapp.files_sent:10|c|#units:files\n" | nc -u -w 0 localhost 8125
-```
-
-Remember, for UDP communication each packet should not exceed the MTU. So, if you plan to push too many metrics at once, prefer TCP communication:
+#### Using TCP (for many metrics at once)
```sh
# send multiple metrics via TCP
cat /tmp/statsd.metrics.txt | nc -N -w 120 localhost 8125
```
-You can also use this little function to take care of all the details:
+#### Helper Function for Shell Scripts
+
+This function handles both UDP and TCP automatically:
```sh
#!/usr/bin/env bash
@@ -684,216 +771,137 @@ then
fi
```
-You can use it like this:
+Usage:
```sh
-# first, source it in your script
+# source it in your script
source statsd.sh
-# then, at any point:
+# then use it anywhere
statsd "myapp.used_memory:123456|g|#units:bytes" "myapp.files_sent:10|c|#units:files" ...
-```
-or even at a terminal prompt, like this:
-
-```sh
+# or at command line
./statsd.sh "myapp.used_memory:123456|g|#units:bytes" "myapp.files_sent:10|c|#units:files" ...
```
-The function is smart enough to call `nc` just once and pass all the metrics to it. It will also automatically switch to TCP if the metrics to send are above 1000 bytes.
+The function automatically switches to TCP if the metrics exceed 1000 bytes.
+
-If you have gotten thus far, make sure to check out our [community forums](https://community.netdata.cloud) to share your experience using Netdata with StatsD.
+## Step-by-Step Guide: Monitoring K6 with StatsD
-## StatsD Step By Step Guide
+This guide demonstrates how to use Netdata's StatsD to visualize metrics from [k6](https://k6.io), an open-source load testing tool.
-In this guide, we'll go through a scenario of visualizing our data in Netdata in a matter of seconds using
-[k6](https://k6.io), an open-source tool for automating load testing that outputs metrics to the StatsD format.
+
+Prerequisites
+
-Although we'll use k6 as the use-case, the same principles can be applied to every application that supports
-the StatsD protocol. Simply enable the StatsD output and point it to the node that runs Netdata, which is `localhost` in this case.
+- A node with [Netdata](https://github.com/netdata/netdata/blob/master/packaging/installer/README.md) installed
+- [k6](https://k6.io/docs/getting-started/installation) installed
-In general, the process for creating a StatsD collector can be summarized in 2 steps:
+
-- Run an experiment by sending StatsD metrics to Netdata, without any prior configuration. This will create
- a chart per metric (called private charts) and will help you verify that everything works as expected from the application side of things.
+
+The Process in Brief
+
- - Make sure to reload the dashboard tab **after** you start sending data to Netdata.
+1. **Run an experiment** sending StatsD metrics to Netdata without configuration
+ - This creates a private chart per metric
+ - Reload the dashboard after starting to send data
-- Create a configuration file for your app using [edit-config](/docs/netdata-agent/configuration/README.md): `sudo ./edit-config
- statsd.d/myapp.conf`
+2. **Create a configuration file** for your app:
+ ```
+ sudo ./edit-config statsd.d/myapp.conf
+ ```
+ - This organizes metrics into meaningful sections
- - Each app will have it's own section in the right-hand menu.
+
-Now, let's see the above process in detail.
+
+Understanding Your Metrics
+
-### Prerequisites
+First, understand what metrics your application provides. For k6, check their [metrics documentation](https://k6.io/docs/using-k6/metrics/).
-- A node with the [Netdata](/packaging/installer/README.md) installed.
-- An application to instrument. For this guide, that will be [k6](https://k6.io/docs/getting-started/installation).
+When instrumenting your own code, you'll need to decide:
-### Understanding the metrics
+- What to measure
+- Which StatsD metric type is appropriate for each measurement
-The real in instrumenting an application with StatsD for you is to decide what metrics you
-want to visualize and how you want them grouped. In other words, you need decide which metrics
-will be grouped in the same charts and how the charts will be grouped on Netdata's dashboard.
+
-Start with documentation for the particular application that you want to monitor (or the
-technological stack that you are using). In our case, the
-[k6 documentation](https://k6.io/docs/using-k6/metrics/) has a whole page dedicated to the
-metrics output by k6, along with descriptions.
+
+Exploring Available Metrics with Private Charts
+
-If you are using StatsD to monitor an existing application, you don't have much control over
-these metrics. For example, k6 has a type called `trend`, which is identical to timers and histograms.
-Thus, _k6 is clearly dictating_ which metrics can be used as histograms and simple gauges.
+Every StatsD metric initially gets its own "private chart." While you'll likely disable this in production, it's helpful during setup to see all available metrics.
-On the other hand, if you are instrumenting your own code, you will need to not only decide what are
-the "things" that you want to measure, but also decide which StatsD metric type is the appropriate for each.
+Private charts clearly show the metric type (gauge, timer, etc.) and available operations for complex types like histograms.
+
-### Use private charts to see all available metrics
+
+Creating a StatsD Configuration File
+
-In Netdata, every metric will receive its own chart, called a `private chart`. Although in the
-final implementation this is something that we will disable, since it can create considerable noise
-(imagine having 100s of metrics), it’s very handy while building the configuration file.
+Use Netdata's [`edit-config`](https://github.com/netdata/netdata/blob/master/docs/netdata-agent/configuration/README.md#edit-a-configuration-file-using-edit-config) to create a new file:
-You can get a quick visual representation of the metrics and their type (e.g it’s a gauge, a timer, etc.).
+```bash
+sudo ./edit-config statsd.d/k6.conf
+```
-An important thing to notice is that StatsD has different types of metrics, as illustrated in the
-[supported metrics](#metrics-supported-by-netdata). Histograms and timers support mathematical operations
-to be performed on top of the baseline metric, like reporting the `average` of the value.
+Start with this basic configuration:
-Here are some examples of default private charts. You can see that the histogram private charts will
-visualize all the available operations.
+```
+[app]
+ name = k6
+ metrics = k6*
+ private charts = yes
+ gaps when not collected = no
+ memory mode = dbengine
+```
-**Gauge private chart**
+
-
+
+Organizing Metrics
+
-**Histogram private chart**
+Next, decide how to organize metrics in the Netdata dashboard:
-
+1. **Dictionary** - Create human-readable names for technical metrics
+ ```
+ [dictionary]
+ http_req_blocked = Blocked HTTP Requests
+ http_req_connecting = Connecting HTTP Requests
+ http_req_receiving = Receiving HTTP Requests
+ http_reqs = Total HTTP requests
+ ```
-### Create a new StatsD configuration file
+2. **Families** - Group charts into dashboard submenus. For k6, we'll use `k6 native metrics` and `http metrics` families.
-Start by creating a new configuration file under the `statsd.d/` folder in the
-[Netdata config directory](/docs/netdata-agent/configuration/README.md#the-netdata-config-directory).
-Use [`edit-config`](/docs/netdata-agent/configuration/README.md#edit-a-configuration-file-using-edit-config)
-to create a new file called `k6.conf`.
+3. **[Dimensions](https://learn.netdata.cloud/docs/developer-and-contributor-corner/glossary#d)** - Choose which metrics to show and how to group them in charts
-```bash=
-sudo ./edit-config statsd.d/k6.conf
-```
+
+
+
+Complete Configuration Example
+
-Copy the following configuration into your file as a starting point.
+Here's a complete configuration for k6:
-```text
+```
[app]
name = k6
metrics = k6*
private charts = yes
gaps when not collected = no
memory mode = dbengine
-```
-Next, you need is to understand how to organize metrics in Netdata’s StatsD.
-
-#### Synthetic charts
-
-Netdata lets you group the metrics exposed by your instrumented application with _synthetic charts_.
-
-First, create a `[dictionary]` section to transform the names of the metrics into human-readable equivalents.
-`http_req_blocked`, `http_req_connecting`, `http_req_receiving`, and `http_reqs` are all metrics exposed by k6.
-
-```
[dictionary]
http_req_blocked = Blocked HTTP Requests
http_req_connecting = Connecting HTTP Requests
http_req_receiving = Receiving HTTP Requests
http_reqs = Total HTTP requests
-```
-
-Continue this dictionary process with any other metrics you want to collect with Netdata.
-
-#### Families and context
-
-Families and context are additional ways to group metrics. Families control the submenu at right-hand menu and
-it's a subcategory of the section. Given the metrics given by K6, we are organizing them in 2 major groups,
-or `families`: `k6 native metrics` and `http metrics`.
-
-Context is a second way to group metrics, when the metrics are of the same nature but different origin. In
-our case, if we ran several different load testing experiments side-by-side, we could define the same app,
-but different context (e.g `http_requests.experiment1`, `http_requests.experiment2`).
-
-Find more details about family and context in our [documentation](/src/web/README.md#families).
-
-#### Dimensions
-
-Now, having decided on how we are going to group the charts, we need to define how we are going to group
-metrics into different charts. This is particularly important, since we decide:
-
-- What metrics **not** to show, since they are not useful for our use-case.
-- What metrics to consolidate into the same charts, so as to reduce noise and increase visual correlation.
-
-The dimension option has this syntax: `dimension = [pattern] METRIC NAME TYPE MULTIPLIER DIVIDER OPTIONS`
-
-- **pattern**: A keyword that tells the StatsD server the `METRIC` string is actually a
- [simple pattern](/src/libnetdata/simple_pattern/README.md).
- We don't use simple patterns in the example, but if we wanted to visualize all the `http_req` metrics, we
- could have a single dimension: `dimension = pattern 'k6.http_req*' last 1 1`. Find detailed examples with
- patterns in [dimension patterns](/src/collectors/statsd.plugin/README.md#dimension-patterns).
-
-- **METRIC** The id of the metric as it comes from the client. You can easily find this in the private charts above,
- for example: `k6.http_req_connecting`.
-- **NAME**: The name of the dimension. You can use the dictionary to expand this to something more human-readable.
-
-- **TYPE**:
-
- - For all charts:
- - `events`: The number of events (data points) received by the StatsD server
- - `last`: The last value that the server received
-
- - For histograms and timers:
- - `min`, `max`, `sum`, `average`, `percentile`, `median`, `stddev`: This is helpful if you want to see
- different representations of the same value. You can find an example at the `[iteration_duration]`
- above. Note that the baseline `metric` is the same, but the `name` of the dimension is different,
- since we use the baseline, but we perform a computation on it, creating a different final metric for
- visualization(dimension).
-
-- **MULTIPLIER DIVIDER**: Handy if you want to convert Kilobytes to Megabytes or you want to give negative value.
- The second is handy for better visualization of send/receive. You can find an example at the **packets** submenu of the **IPv4 Networking Section**.
-
-If you define a chart, run Netdata to visualize metrics, and then add or remove a dimension from that chart,
-this will result in a new chart with the same name, confusing Netdata. If you change the dimensions of the chart,
-make sure to also change the `name` of that chart, since it serves as the `id` of that chart in Netdata's storage.
-(e.g http_req --> http_req_1).
-
-#### Finalize your StatsD configuration file
-
-It's time to assemble all the pieces together and create the synthetic charts that will consist our application
-dashboard in Netdata. We can do it in a few simple steps:
-
-- Decide which metrics we want to use (we have viewed all of them as private charts). For example, we want to use
- `k6.http_requests`, `k6.vus`, etc.
-
-- Decide how we want organize them in different synthetic charts. For example, we want `k6.http_requests`, `k6.vus`
- on their own, but `k6.http_req_blocked` and `k6.http_req_connecting` on the same chart.
-
-- For each synthetic chart, we define a **unique** name and a human readable title.
-
-- We decide at which `family` (submenu section) we want each synthetic chart to belong to. For example, here we
- have defined 2 families: `http requests`, `k6_metrics`.
-
-- If we have multiple instances of the same metric, we can define different contexts, (Optional).
-
-- We define a dimension according to the syntax we highlighted above.
-
-- We define a type for each synthetic chart (line, area, stacked)
-
-- We define the units for each synthetic chart.
-
-Following the above steps, we append to the `k6.conf` that we defined above, the following configuration:
-
-```
[http_req_total]
name = http_req_total
title = Total HTTP Requests
@@ -960,82 +968,51 @@ Following the above steps, we append to the `k6.conf` that we defined above, the
type = stacked
```
-Note that Netdata will report the rate for metrics and counters, even if k6 or another application
-sends an _absolute_ number. For example, k6 sends absolute HTTP requests with `http_reqs`,
-but Netdata visualizes that in `requests/second`.
+:::note
+
+Netdata will report the rate for metrics and counters even if your application sends absolute numbers. For example, k6 sends absolute HTTP requests with `http_reqs`, but Netdata visualizes that as `requests/second`.
+
+:::
-To enable this StatsD configuration, [restart Netdata](/docs/netdata-agent/start-stop-restart.md).
+Restart Netdata to enable this configuration.
+
-### Final touches
+
+Adding Custom Icons and Descriptions
+
-At this point, you have used StatsD to gather metrics for k6, creating a whole new section in your
-Netdata dashboard in the process. Moreover, you can further customize the icon of the particular section,
-as well as the description for each chart.
+You can customize the section icon and add helpful chart descriptions.
-While the following configuration will be placed in a new file, as the documentation suggests, it is
-instructing to use `dashboard_info.js` as a template. Open the file and see how the rest of sections and collectors have been defined.
+Create a custom dashboard info file:
-```javascript=
+```javascript
netdataDashboard.menu = {
'k6': {
title: 'K6 Load Testing',
icon: '',
info: 'k6 is an open-source load testing tool and cloud service providing the best developer experience for API performance testing.'
},
- .
- .
- .
-```
+};
-We can then add a description for each chart. Simply find the following section in `dashboard_info.js` to understand how a chart definitions are used:
-
-```javascript=
-netdataDashboard.context = {
- 'system.cpu': {
- info: function (os) {
- void (os);
- return 'Total CPU utilization (all cores). 100% here means there is no CPU idle time at all. You can get per core usage at the CPUs section and per application usage at the Applications Monitoring section.'
- + netdataDashboard.sparkline(' Keep an eye on iowait ', 'system.cpu', 'iowait', '%', '. If it is constantly high, your disks are a bottleneck and they slow your system down.')
- + netdataDashboard.sparkline(' An important metric worth monitoring, is softirq ', 'system.cpu', 'softirq', '%', '. A constantly high percentage of softirq may indicate network driver issues.');
- },
- valueRange: "[0, 100]"
- },
-```
-
-Afterwards, you can open your `custom_dashboard_info.js`, as suggested in the documentation linked above,
-and add something like the following example:
-
-```javascript=
netdataDashboard.context = {
'k6.http_req_duration': {
- info: "Total time for the request. It's equal to http_req_sending + http_req_waiting + http_req_receiving (i.e. how long did the remote server take to process the request and respond, without the initial DNS lookup/connection times)"
- },
-
+ info: "Total time for the request. It's equal to http_req_sending + http_req_waiting + http_req_receiving (i.e. how long did the remote server take to process the request and respond, without the initial DNS lookup/connection times)"
+ },
+};
```
-The chart is identified as ``.``.
-These descriptions can greatly help the Netdata user who is monitoring your application in the midst of an incident.
+These descriptions help users monitor your application, especially during incidents. The `info` field supports HTML, allowing you to embed links and instructions.
+
-The `info` field supports `html`, embedding useful links and instructions in the description.
+
+Contributing Your Collector
+
-### Vendoring a new collector
-
-While we learned how to visualize any data source in Netdata using the StatsD protocol, we have also created a new collector.
-
-As long as you use the same underlying collector, every new `myapp.conf` file will create a new data
-source and dashboard section for Netdata. Netdata loads all the configuration files by default, but it will
-**not** create dashboard sections or charts, unless it starts receiving data for that particular data source.
-This means that we can now share our collector with the rest of the Netdata community.
-
-- Make sure you follow the [contributing guide](https://github.com/netdata/.github/edit/main/CONTRIBUTING.md)
-- Fork the netdata/netdata repository
-- Place the configuration file inside `netdata/collectors/statsd.plugin`
-- Add a reference in `netdata/collectors/statsd.plugin/Makefile.am`. For example, if we contribute the `k6.conf` file:
-```Makefile
-dist_statsdconfig_DATA = \
- example.conf \
- k6.conf \
- $(NULL)
-```
+Once you've created a working configuration, consider sharing it with the Netdata community:
+1. Follow the [contributing guide](https://github.com/netdata/.github/blob/main/CONTRIBUTING.md)
+2. Fork the netdata/netdata repository
+3. Place your configuration file in `netdata/collectors/statsd.plugin`
+4. Add a reference in `netdata/collectors/statsd.plugin/Makefile.am`
+
From 3df95546e78db8772ce16f661d0dd7b0202a0f5e Mon Sep 17 00:00:00 2001
From: thiagoftsm
Date: Wed, 14 May 2025 19:26:02 +0000
Subject: [PATCH 38/51] New Windows Metrics (CPU and Memory) (#20277)
---
src/collectors/all.h | 2 +
src/collectors/windows.plugin/metadata.yaml | 12 ++++++
.../windows.plugin/perflib-memory.c | 37 +++++++++++++++++++
.../windows.plugin/perflib-processes.c | 33 +++++++++++++++++
4 files changed, 84 insertions(+)
diff --git a/src/collectors/all.h b/src/collectors/all.h
index 581f8ae564c90b..3658cb7d1edab4 100644
--- a/src/collectors/all.h
+++ b/src/collectors/all.h
@@ -33,6 +33,7 @@
#define NETDATA_CHART_PRIO_SYSTEM_ACTIVE_PROCESSES 750
#define NETDATA_CHART_PRIO_SYSTEM_CTXT 800
#define NETDATA_CHART_PRIO_SYSTEM_IDLEJITTER 800
+#define NETDATA_CHART_PRIO_SYSTEM_THREAD_QUEUE 801 // Windows only
#define NETDATA_CHART_PRIO_SYSTEM_INTR 900
#define NETDATA_CHART_PRIO_SYSTEM_SOFTIRQS 950
#define NETDATA_CHART_PRIO_SYSTEM_SOFTNET_STAT 955
@@ -127,6 +128,7 @@
#define NETDATA_CHART_PRIO_MEM_SWAP_PAGES 1037 // Windows only
#define NETDATA_CHART_PRIO_MEM_SWAPIO 1038
#define NETDATA_CHART_PRIO_MEM_SYSTEM_POOL 1039 // Windows only
+#define NETDATA_CHART_PRIO_MEM_FREE_SYSTEM_PAGE 1040 // Windows only
#define NETDATA_CHART_PRIO_MEM_ZSWAP 1036
#define NETDATA_CHART_PRIO_MEM_ZSWAPIO 1037
#define NETDATA_CHART_PRIO_MEM_ZSWAP_COMPRESS_RATIO 1038
diff --git a/src/collectors/windows.plugin/metadata.yaml b/src/collectors/windows.plugin/metadata.yaml
index 75f4a9a7d755bf..9408f59651bb02 100644
--- a/src/collectors/windows.plugin/metadata.yaml
+++ b/src/collectors/windows.plugin/metadata.yaml
@@ -203,6 +203,12 @@ modules:
dimensions:
- name: paged
- name: pool-paged
+ - name: mem.system_page_table_entries
+ description: Unused page table entries.
+ unit: "pages"
+ chart_type: line
+ dimensions:
+ - name: free
- meta:
plugin_name: windows.plugin
module_name: PerflibProcesses
@@ -298,6 +304,12 @@ modules:
chart_type: line
dimensions:
- name: switches
+ - name: system.processor_queue_length
+ description: The number of threads in the processor queue.
+ unit: "threads"
+ chart_type: line
+ dimensions:
+ - name: threads
- meta:
plugin_name: windows.plugin
module_name: PerflibStorage
diff --git a/src/collectors/windows.plugin/perflib-memory.c b/src/collectors/windows.plugin/perflib-memory.c
index 5c550587f24302..f993b010e6fec4 100644
--- a/src/collectors/windows.plugin/perflib-memory.c
+++ b/src/collectors/windows.plugin/perflib-memory.c
@@ -27,8 +27,12 @@ struct system_pool {
RRDDIM *rd_paged;
RRDDIM *rd_nonpaged;
+ RRDSET *freeSystemPageTableEntries;
+ RRDDIM *rd_free_system_page_table_entries;
+
COUNTER_DATA pagedData;
COUNTER_DATA nonPagedData;
+ COUNTER_DATA pageTableEntries;
};
struct swap localSwap = {0};
@@ -49,6 +53,7 @@ void initialize_pool_keys(struct system_pool *p)
{
p->pagedData.key = "Pool Paged Bytes";
p->nonPagedData.key = "Pool Nonpaged Bytes";
+ p->pageTableEntries.key = "Free System Page Table Entries";
}
static void initialize(void)
@@ -153,6 +158,37 @@ static void do_memory_system_pool(PERF_DATA_BLOCK *pDataBlock, PERF_OBJECT_TYPE
rrdset_done(localPool.pool);
}
+static void do_memory_page_table_entries(PERF_DATA_BLOCK *pDataBlock, PERF_OBJECT_TYPE *pObjectType, int update_every)
+{
+ perflibGetObjectCounter(pDataBlock, pObjectType, &localPool.pageTableEntries);
+
+ if (!localPool.freeSystemPageTableEntries) {
+ localPool.freeSystemPageTableEntries = rrdset_create_localhost(
+ "mem",
+ "free_system_page_table_entries",
+ NULL,
+ "mem",
+ "mem.system_page_table_entries",
+ "Unused page table entries.",
+ "pages",
+ PLUGIN_WINDOWS_NAME,
+ "PerflibMemory",
+ NETDATA_CHART_PRIO_MEM_FREE_SYSTEM_PAGE,
+ update_every,
+ RRDSET_TYPE_LINE);
+
+ localPool.rd_free_system_page_table_entries =
+ rrddim_add(localPool.freeSystemPageTableEntries, "free", NULL, 1, 1, RRD_ALGORITHM_ABSOLUTE);
+ }
+
+ rrddim_set_by_pointer(
+ localPool.freeSystemPageTableEntries,
+ localPool.rd_free_system_page_table_entries,
+ (collected_number)localPool.pageTableEntries.current.Data);
+
+ rrdset_done(localPool.freeSystemPageTableEntries);
+}
+
static bool do_memory(PERF_DATA_BLOCK *pDataBlock, int update_every)
{
PERF_OBJECT_TYPE *pObjectType = perflibFindObjectTypeByName(pDataBlock, "Memory");
@@ -187,6 +223,7 @@ static bool do_memory(PERF_DATA_BLOCK *pDataBlock, int update_every)
do_memory_swap(pDataBlock, pObjectType, update_every);
do_memory_system_pool(pDataBlock, pObjectType, update_every);
+ do_memory_page_table_entries(pDataBlock, pObjectType, update_every);
return true;
}
diff --git a/src/collectors/windows.plugin/perflib-processes.c b/src/collectors/windows.plugin/perflib-processes.c
index 25d62fb0676ad6..fbe140633a88fe 100644
--- a/src/collectors/windows.plugin/perflib-processes.c
+++ b/src/collectors/windows.plugin/perflib-processes.c
@@ -12,6 +12,37 @@ static void initialize(void)
;
}
+static void do_processor_queue(PERF_DATA_BLOCK *pDataBlock, PERF_OBJECT_TYPE *pObjectType, int update_every)
+{
+ static RRDSET *st_queue = NULL;
+ static RRDDIM *rd_queue = NULL;
+ static COUNTER_DATA processorQueue = {.key = "Processor Queue Length"};
+ if (!perflibGetObjectCounter(pDataBlock, pObjectType, &processorQueue))
+ return;
+
+ if (!st_queue) {
+ st_queue = rrdset_create_localhost(
+ "system",
+ "processor_queue",
+ NULL,
+ "system",
+ "system.processor_queue_length",
+ "The number of threads in the processor queue.",
+ "threads",
+ _COMMON_PLUGIN_NAME,
+ _COMMON_PLUGIN_MODULE_NAME,
+ NETDATA_CHART_PRIO_SYSTEM_THREAD_QUEUE,
+ update_every,
+ RRDSET_TYPE_LINE);
+
+ rd_queue = rrddim_add(st_queue, "threads", NULL, 1, 1, RRD_ALGORITHM_ABSOLUTE);
+ }
+
+ rrddim_set_by_pointer(st_queue, rd_queue, (collected_number)processorQueue.current.Data);
+
+ rrdset_done(st_queue);
+}
+
static bool do_processes(PERF_DATA_BLOCK *pDataBlock, int update_every)
{
PERF_OBJECT_TYPE *pObjectType = perflibFindObjectTypeByName(pDataBlock, "System");
@@ -36,6 +67,8 @@ static bool do_processes(PERF_DATA_BLOCK *pDataBlock, int update_every)
ULONGLONG totalThreads = threads.current.Data;
common_system_threads(totalThreads, update_every);
}
+
+ do_processor_queue(pDataBlock, pObjectType, update_every);
return true;
}
From 2701387606955e0a7885202348a4c1a486fde021 Mon Sep 17 00:00:00 2001
From: netdatabot
Date: Thu, 15 May 2025 00:35:45 +0000
Subject: [PATCH 39/51] [ci skip] Update changelog and version for nightly
build: v2.5.0-39-nightly.
---
CHANGELOG.md | 11 ++++-------
packaging/version | 2 +-
2 files changed, 5 insertions(+), 8 deletions(-)
diff --git a/CHANGELOG.md b/CHANGELOG.md
index 925f82c8989e0e..2b440f77fdd609 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -6,6 +6,10 @@
**Merged pull requests:**
+- Improved StatsD documentation [\#20282](https://github.com/netdata/netdata/pull/20282) ([kanelatechnical](https://github.com/kanelatechnical))
+- Regenerate integrations docs [\#20279](https://github.com/netdata/netdata/pull/20279) ([netdatabot](https://github.com/netdatabot))
+- docs: update mssql meta [\#20278](https://github.com/netdata/netdata/pull/20278) ([ilyam8](https://github.com/ilyam8))
+- New Windows Metrics \(CPU and Memory\) [\#20277](https://github.com/netdata/netdata/pull/20277) ([thiagoftsm](https://github.com/thiagoftsm))
- chore\(go.d/snmp\): small cleanup snmp profiles code [\#20274](https://github.com/netdata/netdata/pull/20274) ([ilyam8](https://github.com/ilyam8))
- Switch to poll from epoll [\#20273](https://github.com/netdata/netdata/pull/20273) ([stelfrag](https://github.com/stelfrag))
- build\(deps\): bump golang.org/x/net from 0.39.0 to 0.40.0 in /src/go [\#20270](https://github.com/netdata/netdata/pull/20270) ([dependabot[bot]](https://github.com/apps/dependabot))
@@ -318,7 +322,6 @@
- Store alert config asynchronously [\#19885](https://github.com/netdata/netdata/pull/19885) ([stelfrag](https://github.com/stelfrag))
- Large-scale cleanup of static build infrastructure. [\#19852](https://github.com/netdata/netdata/pull/19852) ([Ferroin](https://github.com/Ferroin))
- ebpf.plugin: rework memory [\#19844](https://github.com/netdata/netdata/pull/19844) ([thiagoftsm](https://github.com/thiagoftsm))
-- Add Docker tags for the last few nightly builds. [\#19734](https://github.com/netdata/netdata/pull/19734) ([Ferroin](https://github.com/Ferroin))
## [v2.3.2](https://github.com/netdata/netdata/tree/v2.3.2) (2025-04-02)
@@ -465,12 +468,6 @@
- more strict parsing of the output of system-info.sh [\#19745](https://github.com/netdata/netdata/pull/19745) ([ktsaou](https://github.com/ktsaou))
- pass NULL to sensors\_init\(\) when the standard files exist in /etc/ [\#19744](https://github.com/netdata/netdata/pull/19744) ([ktsaou](https://github.com/ktsaou))
- allow coredumps to be generated [\#19743](https://github.com/netdata/netdata/pull/19743) ([ktsaou](https://github.com/ktsaou))
-- work on agent-events crashes [\#19741](https://github.com/netdata/netdata/pull/19741) ([ktsaou](https://github.com/ktsaou))
-- zero mtime when a fallback check fails [\#19740](https://github.com/netdata/netdata/pull/19740) ([ktsaou](https://github.com/ktsaou))
-- fix\(go.d\): ignore sigpipe to exit gracefully [\#19739](https://github.com/netdata/netdata/pull/19739) ([ilyam8](https://github.com/ilyam8))
-- Capture deadly signals [\#19737](https://github.com/netdata/netdata/pull/19737) ([ktsaou](https://github.com/ktsaou))
-- allow insecure cloud connections [\#19736](https://github.com/netdata/netdata/pull/19736) ([ktsaou](https://github.com/ktsaou))
-- add more information about claiming failures [\#19735](https://github.com/netdata/netdata/pull/19735) ([ktsaou](https://github.com/ktsaou))
## [v2.2.6](https://github.com/netdata/netdata/tree/v2.2.6) (2025-02-20)
diff --git a/packaging/version b/packaging/version
index fc587c16b36939..725e52f1ab3711 100644
--- a/packaging/version
+++ b/packaging/version
@@ -1 +1 @@
-v2.5.0-34-nightly
+v2.5.0-39-nightly
From 35531a7576f31601194a735d244cc4aee040aedb Mon Sep 17 00:00:00 2001
From: Netdata bot <43409846+netdatabot@users.noreply.github.com>
Date: Thu, 15 May 2025 07:10:33 +0200
Subject: [PATCH 40/51] Regenerate integrations docs (#20284)
Co-authored-by: thiagoftsm <49162938+thiagoftsm@users.noreply.github.com>
---
src/collectors/windows.plugin/integrations/memory_statistics.md | 1 +
src/collectors/windows.plugin/integrations/system_statistics.md | 1 +
2 files changed, 2 insertions(+)
diff --git a/src/collectors/windows.plugin/integrations/memory_statistics.md b/src/collectors/windows.plugin/integrations/memory_statistics.md
index df25e30a0ee408..40c2560c207a3a 100644
--- a/src/collectors/windows.plugin/integrations/memory_statistics.md
+++ b/src/collectors/windows.plugin/integrations/memory_statistics.md
@@ -71,6 +71,7 @@ Metrics:
| mem.swap_iops | read, write | operations/s |
| mem.swap_pages_io | read, write | pages/s |
| mem.system_pool_size | paged, pool-paged | bytes |
+| mem.system_page_table_entries | free | pages |
diff --git a/src/collectors/windows.plugin/integrations/system_statistics.md b/src/collectors/windows.plugin/integrations/system_statistics.md
index c2a259a3775caa..7c01dc594a0935 100644
--- a/src/collectors/windows.plugin/integrations/system_statistics.md
+++ b/src/collectors/windows.plugin/integrations/system_statistics.md
@@ -71,6 +71,7 @@ Metrics:
| system.processes | running | processes |
| system.threads | threads | threads |
| system.ctxt | switches | context switches/s |
+| system.processor_queue_length | threads | threads |
From cc0502ab96db3fb11a71603ef27099055efbe329 Mon Sep 17 00:00:00 2001
From: Costa Tsaousis
Date: Thu, 15 May 2025 14:51:06 +0300
Subject: [PATCH 41/51] Model Context Protocol Server (MCP) for Netdata
(#20244)
* websocket server implementation, integrated to the web server.
- Full support for RFC 6455 (WebSocket Protocol)
- Compression via the permessage-deflate extension (RFC 7692)
- Supports restricted window sizes, close codes, and fragmentation
- Passes successfully the entire Autobahn Test Suite
- Fast: adaptive buffers for zero malloc/free during processing
* websocket server implementation, integrated to the web server.
- Full support for RFC 6455 (WebSocket Protocol)
- Compression via the permessage-deflate extension (RFC 7692)
- Supports restricted window sizes, close codes, and fragmentation
- Passes successfully the entire Autobahn Test Suite
- Fast: adaptive buffers for zero malloc/free during processing
* add counters to poll-events to find the cause of infinite loop
* added jsonrpc framework to websocket server
* prototype mcp over websocket
* fix formatting
* completed mcp initialize
* changed the interface of mcp to follow the same pattern all netdata APIs do
* added resource and resourse template for contexts
* implemented context categories
* use the first dot on contexts
* in RRDSTATS_RETENTION use just the contents of the response to figure out if the tier is used
* delete mcp prototype
* reset websocket window bits to zero of web_client reset
* do not enable the websocket server, unless internal checks are enabled
---
.gitignore | 2 +
CMakeLists.txt | 63 +-
src/daemon/daemon-shutdown.c | 2 +
src/daemon/pulse/pulse-workers.c | 1 +
.../contexts/api_v2_contexts_agents.c | 108 +-
.../contexts/rrdcontext-context-registry.c | 245 +++
.../contexts/rrdcontext-context-registry.h | 20 +
src/database/contexts/rrdcontext-context.c | 6 +
src/database/contexts/rrdcontext-internal.h | 1 +
src/database/contexts/rrdcontext.h | 2 +
src/database/rrd-metadata.c | 6 +-
src/database/rrd-metadata.h | 1 +
src/database/rrd-retention.c | 125 ++
src/database/rrd-retention.h | 47 +
.../circular_buffer/circular_buffer.c | 138 +-
.../circular_buffer/circular_buffer.h | 18 +-
src/libnetdata/http/http_defs.h | 2 +
src/libnetdata/log/nd_log.h | 64 +-
src/libnetdata/socket/nd-poll.c | 2 +-
src/libnetdata/socket/poll-events.c | 42 +-
src/libnetdata/socket/poll-events.h | 9 +-
src/libnetdata/url/url.h | 4 -
src/streaming/stream-receiver-connection.c | 3 +
src/streaming/stream-sender.c | 2 +
src/web/api/http_header.c | 135 ++
src/web/mcp/adapters/mcp-websocket.c | 124 ++
src/web/mcp/adapters/mcp-websocket.h | 30 +
src/web/mcp/mcp-context.c | 102 +
src/web/mcp/mcp-context.h | 11 +
src/web/mcp/mcp-initialize.c | 252 +++
src/web/mcp/mcp-initialize.h | 11 +
src/web/mcp/mcp-notifications.c | 131 ++
src/web/mcp/mcp-notifications.h | 11 +
src/web/mcp/mcp-prompts.c | 131 ++
src/web/mcp/mcp-prompts.h | 11 +
src/web/mcp/mcp-resources.c | 496 +++++
src/web/mcp/mcp-resources.h | 11 +
src/web/mcp/mcp-system.c | 114 ++
src/web/mcp/mcp-system.h | 11 +
src/web/mcp/mcp-tools.c | 218 +++
src/web/mcp/mcp-tools.h | 11 +
src/web/mcp/mcp-websocket-test.html | 1272 +++++++++++++
src/web/mcp/mcp.c | 436 +++++
src/web/mcp/mcp.h | 134 ++
src/web/server/static/static-threaded.c | 14 +-
src/web/server/static/static-threaded.h | 1 +
src/web/server/web_client.c | 47 +-
src/web/server/web_client.h | 22 +
src/web/server/web_server.c | 2 +
.../config/fuzzingclient.json | 15 +
.../websocket/autobahn-test-suite/run-test.sh | 15 +
src/web/websocket/websocket-buffer.h | 221 +++
src/web/websocket/websocket-compression.c | 248 +++
src/web/websocket/websocket-compression.h | 58 +
src/web/websocket/websocket-echo-test.html | 1659 +++++++++++++++++
src/web/websocket/websocket-echo.c | 56 +
src/web/websocket/websocket-echo.h | 17 +
src/web/websocket/websocket-handshake.c | 436 +++++
src/web/websocket/websocket-internal.h | 252 +++
src/web/websocket/websocket-jsonrpc.c | 311 +++
src/web/websocket/websocket-jsonrpc.h | 56 +
src/web/websocket/websocket-message.c | 190 ++
src/web/websocket/websocket-receive.c | 882 +++++++++
src/web/websocket/websocket-send.c | 411 ++++
src/web/websocket/websocket-thread.c | 515 +++++
src/web/websocket/websocket-thread.h | 11 +
src/web/websocket/websocket-utils.c | 105 ++
src/web/websocket/websocket.c | 264 +++
src/web/websocket/websocket.h | 106 ++
69 files changed, 10322 insertions(+), 157 deletions(-)
create mode 100644 src/database/contexts/rrdcontext-context-registry.c
create mode 100644 src/database/contexts/rrdcontext-context-registry.h
create mode 100644 src/database/rrd-retention.c
create mode 100644 src/database/rrd-retention.h
create mode 100644 src/web/mcp/adapters/mcp-websocket.c
create mode 100644 src/web/mcp/adapters/mcp-websocket.h
create mode 100644 src/web/mcp/mcp-context.c
create mode 100644 src/web/mcp/mcp-context.h
create mode 100644 src/web/mcp/mcp-initialize.c
create mode 100644 src/web/mcp/mcp-initialize.h
create mode 100644 src/web/mcp/mcp-notifications.c
create mode 100644 src/web/mcp/mcp-notifications.h
create mode 100644 src/web/mcp/mcp-prompts.c
create mode 100644 src/web/mcp/mcp-prompts.h
create mode 100644 src/web/mcp/mcp-resources.c
create mode 100644 src/web/mcp/mcp-resources.h
create mode 100644 src/web/mcp/mcp-system.c
create mode 100644 src/web/mcp/mcp-system.h
create mode 100644 src/web/mcp/mcp-tools.c
create mode 100644 src/web/mcp/mcp-tools.h
create mode 100644 src/web/mcp/mcp-websocket-test.html
create mode 100644 src/web/mcp/mcp.c
create mode 100644 src/web/mcp/mcp.h
create mode 100644 src/web/websocket/autobahn-test-suite/config/fuzzingclient.json
create mode 100755 src/web/websocket/autobahn-test-suite/run-test.sh
create mode 100644 src/web/websocket/websocket-buffer.h
create mode 100644 src/web/websocket/websocket-compression.c
create mode 100644 src/web/websocket/websocket-compression.h
create mode 100644 src/web/websocket/websocket-echo-test.html
create mode 100644 src/web/websocket/websocket-echo.c
create mode 100644 src/web/websocket/websocket-echo.h
create mode 100644 src/web/websocket/websocket-handshake.c
create mode 100644 src/web/websocket/websocket-internal.h
create mode 100644 src/web/websocket/websocket-jsonrpc.c
create mode 100644 src/web/websocket/websocket-jsonrpc.h
create mode 100644 src/web/websocket/websocket-message.c
create mode 100644 src/web/websocket/websocket-receive.c
create mode 100644 src/web/websocket/websocket-send.c
create mode 100644 src/web/websocket/websocket-thread.c
create mode 100644 src/web/websocket/websocket-thread.h
create mode 100644 src/web/websocket/websocket-utils.c
create mode 100644 src/web/websocket/websocket.c
create mode 100644 src/web/websocket/websocket.h
diff --git a/.gitignore b/.gitignore
index b44a44cab2e6fa..df15e100449088 100644
--- a/.gitignore
+++ b/.gitignore
@@ -195,3 +195,5 @@ packaging/tools/agent-events/parseBehaviorTest.go
packaging/tools/agent-events/server
packaging/tools/agent-events/go.mod
packaging/tools/agent-events/go.sum
+
+**/.claude/settings.local.json
diff --git a/CMakeLists.txt b/CMakeLists.txt
index 6df39f76e73427..89bac6c8292597 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -1621,6 +1621,8 @@ set(RRD_PLUGIN_FILES
src/database/rrd.h
src/database/rrd-metadata.c
src/database/rrd-metadata.h
+ src/database/rrd-retention.c
+ src/database/rrd-retention.h
src/database/rrdset.c
src/database/storage-engine.c
src/database/storage-engine.h
@@ -1690,6 +1692,8 @@ set(RRD_PLUGIN_FILES
src/database/pattern-array.c
src/database/pattern-array.h
src/database/contexts/rrdcontext-queues.c
+ src/database/contexts/rrdcontext-context-registry.c
+ src/database/contexts/rrdcontext-context-registry.h
)
if(ENABLE_DBENGINE)
@@ -1825,22 +1829,57 @@ set(STREAMING_PLUGIN_FILES
)
set(WEB_PLUGIN_FILES
- src/web/server/web_client.c
- src/web/server/web_client.h
- src/web/server/web_server.c
- src/web/server/web_server.h
- src/web/server/static/static-threaded.c
- src/web/server/static/static-threaded.h
- src/web/server/web_client_cache.c
- src/web/server/web_client_cache.h
- src/web/api/v3/api_v3_stream_info.c
- src/web/api/v3/api_v3_stream_path.c
- src/web/api/queries/backfill.c
- src/web/api/queries/backfill.h
src/web/api/functions/function-metrics-cardinality.c
src/web/api/functions/function-metrics-cardinality.h
+ src/web/api/queries/backfill.c
+ src/web/api/queries/backfill.h
src/web/api/request_source.c
src/web/api/request_source.h
+ src/web/api/v3/api_v3_stream_info.c
+ src/web/api/v3/api_v3_stream_path.c
+ src/web/mcp/adapters/mcp-websocket.c
+ src/web/mcp/adapters/mcp-websocket.h
+ src/web/mcp/mcp-context.c
+ src/web/mcp/mcp-context.h
+ src/web/mcp/mcp-initialize.c
+ src/web/mcp/mcp-initialize.h
+ src/web/mcp/mcp-notifications.c
+ src/web/mcp/mcp-notifications.h
+ src/web/mcp/mcp-prompts.c
+ src/web/mcp/mcp-prompts.h
+ src/web/mcp/mcp-resources.c
+ src/web/mcp/mcp-resources.h
+ src/web/mcp/mcp-system.c
+ src/web/mcp/mcp-system.h
+ src/web/mcp/mcp-tools.c
+ src/web/mcp/mcp-tools.h
+ src/web/mcp/mcp.c
+ src/web/mcp/mcp.h
+ src/web/server/static/static-threaded.c
+ src/web/server/static/static-threaded.h
+ src/web/server/web_client.c
+ src/web/server/web_client.h
+ src/web/server/web_client_cache.c
+ src/web/server/web_client_cache.h
+ src/web/server/web_server.c
+ src/web/server/web_server.h
+ src/web/websocket/websocket-buffer.h
+ src/web/websocket/websocket-compression.c
+ src/web/websocket/websocket-compression.h
+ src/web/websocket/websocket-echo.c
+ src/web/websocket/websocket-echo.h
+ src/web/websocket/websocket-handshake.c
+ src/web/websocket/websocket-internal.h
+ src/web/websocket/websocket-jsonrpc.c
+ src/web/websocket/websocket-jsonrpc.h
+ src/web/websocket/websocket-message.c
+ src/web/websocket/websocket-receive.c
+ src/web/websocket/websocket-send.c
+ src/web/websocket/websocket-thread.c
+ src/web/websocket/websocket-thread.h
+ src/web/websocket/websocket-utils.c
+ src/web/websocket/websocket.c
+ src/web/websocket/websocket.h
)
set(CLAIM_PLUGIN_FILES
diff --git a/src/daemon/daemon-shutdown.c b/src/daemon/daemon-shutdown.c
index 7403e3335d5ed4..714cbceb604d09 100644
--- a/src/daemon/daemon-shutdown.c
+++ b/src/daemon/daemon-shutdown.c
@@ -23,6 +23,7 @@ void rrd_functions_inflight_destroy(void);
void cgroup_netdev_link_destroy(void);
void bearer_tokens_destroy(void);
void alerts_by_x_cleanup(void);
+void websocket_threads_join(void);
static bool abort_on_fatal = true;
@@ -294,6 +295,7 @@ static void netdata_cleanup_and_exit(EXIT_REASON reason, bool abnormal, bool exi
if (!abnormal)
add_agent_event(EVENT_AGENT_SHUTDOWN_TIME, (int64_t)(now_monotonic_usec() - shutdown_start_time));
+ websocket_threads_join();
nd_thread_join_threads();
sqlite_close_databases();
watcher_step_complete(WATCHER_STEP_ID_CLOSE_SQL_DATABASES);
diff --git a/src/daemon/pulse/pulse-workers.c b/src/daemon/pulse/pulse-workers.c
index 25e89a9681ec5d..614ae2c458113a 100644
--- a/src/daemon/pulse/pulse-workers.c
+++ b/src/daemon/pulse/pulse-workers.c
@@ -153,6 +153,7 @@ static struct worker_utilization all_workers_utilization[] = {
{ .name = "PROFILER", .family = "workers profile", .priority = 1000000 },
{ .name = "PGCEVICT", .family = "workers dbengine eviction", .priority = 1000000 },
{ .name = "BACKFILL", .family = "workers backfill", .priority = 1000000 },
+ { .name = "WEBSOCKET", .family = "workers websocket", .priority = 1000000 },
// has to be terminated with a NULL
{ .name = NULL, .family = NULL }
diff --git a/src/database/contexts/api_v2_contexts_agents.c b/src/database/contexts/api_v2_contexts_agents.c
index 31f2f80263fcbd..356f1d8baff46c 100644
--- a/src/database/contexts/api_v2_contexts_agents.c
+++ b/src/database/contexts/api_v2_contexts_agents.c
@@ -3,20 +3,10 @@
#include "api_v2_contexts.h"
#include "aclk/aclk_capas.h"
#include "database/rrd-metadata.h"
+#include "database/rrd-retention.h"
void build_info_to_json_object(BUFFER *b);
-static time_t round_retention(time_t retention_seconds) {
- if(retention_seconds > 60 * 86400)
- retention_seconds = HOWMANY(retention_seconds, 86400) * 86400;
- else if(retention_seconds > 86400)
- retention_seconds = HOWMANY(retention_seconds, 3600) * 3600;
- else
- retention_seconds = HOWMANY(retention_seconds, 60) * 60;
-
- return retention_seconds;
-}
-
void buffer_json_agents_v2(BUFFER *wb, struct query_timings *timings, time_t now_s, bool info, bool array) {
if(!now_s)
now_s = now_realtime_sec();
@@ -73,6 +63,7 @@ void buffer_json_agents_v2(BUFFER *wb, struct query_timings *timings, time_t now
{
buffer_json_member_add_uint64(wb, "collected", metadata.contexts.collected);
buffer_json_member_add_uint64(wb, "available", metadata.contexts.available);
+ buffer_json_member_add_uint64(wb, "unique", metadata.contexts.unique);
}
buffer_json_object_close(wb);
@@ -85,77 +76,40 @@ void buffer_json_agents_v2(BUFFER *wb, struct query_timings *timings, time_t now
}
buffer_json_object_close(wb); // api
- buffer_json_member_add_array(wb, "db_size");
- size_t group_seconds;
- for (size_t tier = 0; tier < nd_profile.storage_tiers; tier++) {
- STORAGE_ENGINE *eng = localhost->db[tier].eng;
- if (!eng) continue;
-
- group_seconds = get_tier_grouping(tier) * localhost->rrd_update_every;
- uint64_t max = storage_engine_disk_space_max(eng->seb, localhost->db[tier].si);
- uint64_t used = storage_engine_disk_space_used(eng->seb, localhost->db[tier].si);
-#ifdef ENABLE_DBENGINE
- if (!max && eng->seb == STORAGE_ENGINE_BACKEND_DBENGINE) {
- max = rrdeng_get_directory_free_bytes_space(multidb_ctx[tier]);
- max += used;
- }
-#endif
- time_t first_time_s = storage_engine_global_first_time_s(eng->seb, localhost->db[tier].si);
-// size_t currently_collected_metrics = storage_engine_collected_metrics(eng->seb, localhost->db[tier].si);
+ // Get retention information using our new function
+ RRDSTATS_RETENTION retention = rrdstats_retention_collect();
- NETDATA_DOUBLE percent;
- if (used && max)
- percent = (NETDATA_DOUBLE) used * 100.0 / (NETDATA_DOUBLE) max;
- else
- percent = 0.0;
+ buffer_json_member_add_array(wb, "db_size");
+ for (size_t i = 0; i < retention.storage_tiers; i++) {
+ RRD_STORAGE_TIER *tier_info = &retention.tiers[i];
+ if (!tier_info->backend || tier_info->tier != i)
+ continue;
buffer_json_add_array_item_object(wb);
- buffer_json_member_add_uint64(wb, "tier", tier);
- char human_duration[128];
- duration_snprintf_time_t(human_duration, sizeof(human_duration), (stime_t)group_seconds);
- buffer_json_member_add_string(wb, "granularity", human_duration);
-
- buffer_json_member_add_uint64(wb, "metrics", storage_engine_metrics(eng->seb, localhost->db[tier].si));
- buffer_json_member_add_uint64(wb, "samples", storage_engine_samples(eng->seb, localhost->db[tier].si));
-
- if(used || max) {
- buffer_json_member_add_uint64(wb, "disk_used", used);
- buffer_json_member_add_uint64(wb, "disk_max", max);
- buffer_json_member_add_double(wb, "disk_percent", percent);
+ buffer_json_member_add_uint64(wb, "tier", tier_info->tier);
+ buffer_json_member_add_string(wb, "granularity", tier_info->granularity_human);
+ buffer_json_member_add_uint64(wb, "metrics", tier_info->metrics);
+ buffer_json_member_add_uint64(wb, "samples", tier_info->samples);
+
+ if(tier_info->disk_used || tier_info->disk_max) {
+ buffer_json_member_add_uint64(wb, "disk_used", tier_info->disk_used);
+ buffer_json_member_add_uint64(wb, "disk_max", tier_info->disk_max);
+ // Format disk_percent to have only 2 decimal places
+ double rounded_percent = floor(tier_info->disk_percent * 100.0 + 0.5) / 100.0;
+ buffer_json_member_add_double(wb, "disk_percent", rounded_percent);
}
- if(first_time_s < now_s) {
- time_t retention = now_s - first_time_s;
-
- buffer_json_member_add_time_t(wb, "from", first_time_s);
- buffer_json_member_add_time_t(wb, "to", now_s);
- buffer_json_member_add_time_t(wb, "retention", retention);
-
- duration_snprintf(human_duration, sizeof(human_duration),
- round_retention(retention), "s", false);
-
- buffer_json_member_add_string(wb, "retention_human", human_duration);
-
- if(used || max) { // we have disk space information
- time_t time_retention = 0;
-#ifdef ENABLE_DBENGINE
- time_retention = multidb_ctx[tier]->config.max_retention_s;
-#endif
- time_t space_retention = (time_t)((NETDATA_DOUBLE)(now_s - first_time_s) * 100.0 / percent);
- time_t actual_retention = MIN(space_retention, time_retention ? time_retention : space_retention);
-
- duration_snprintf(
- human_duration, sizeof(human_duration),
- (int)time_retention, "s", false);
-
- buffer_json_member_add_time_t(wb, "requested_retention", time_retention);
- buffer_json_member_add_string(wb, "requested_retention_human", human_duration);
-
- duration_snprintf(human_duration, sizeof(human_duration),
- (int)round_retention(actual_retention), "s", false);
-
- buffer_json_member_add_time_t(wb, "expected_retention", actual_retention);
- buffer_json_member_add_string(wb, "expected_retention_human", human_duration);
+ if(tier_info->first_time_s < tier_info->last_time_s) {
+ buffer_json_member_add_time_t(wb, "from", tier_info->first_time_s);
+ buffer_json_member_add_time_t(wb, "to", tier_info->last_time_s);
+ buffer_json_member_add_time_t(wb, "retention", tier_info->retention);
+ buffer_json_member_add_string(wb, "retention_human", tier_info->retention_human);
+
+ if(tier_info->disk_used || tier_info->disk_max) {
+ buffer_json_member_add_time_t(wb, "requested_retention", tier_info->requested_retention);
+ buffer_json_member_add_string(wb, "requested_retention_human", tier_info->requested_retention_human);
+ buffer_json_member_add_time_t(wb, "expected_retention", tier_info->expected_retention);
+ buffer_json_member_add_string(wb, "expected_retention_human", tier_info->expected_retention_human);
}
}
buffer_json_object_close(wb);
diff --git a/src/database/contexts/rrdcontext-context-registry.c b/src/database/contexts/rrdcontext-context-registry.c
new file mode 100644
index 00000000000000..cfa0ea7e2a9f45
--- /dev/null
+++ b/src/database/contexts/rrdcontext-context-registry.c
@@ -0,0 +1,245 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#include "rrdcontext.h"
+#include "rrdcontext-internal.h"
+#include "rrdcontext-context-registry.h"
+
+// The registry - using a raw JudyL array
+// Key: STRING pointer
+// Value: reference count (size_t)
+static Pvoid_t context_registry_judyl = NULL;
+
+// Spinlock to protect access to the registry
+static SPINLOCK context_registry_spinlock = SPINLOCK_INITIALIZER;
+
+// Clean up the context registry
+void rrdcontext_context_registry_destroy(void) {
+ spinlock_lock(&context_registry_spinlock);
+
+ Word_t index = 0;
+ Pvoid_t *PValue;
+
+ // Free the strings we've held references to
+ PValue = JudyLFirst(context_registry_judyl, &index, PJE0);
+ while (PValue) {
+ // Each string has been duplicated when added, so free it
+ string_freez((STRING *)index);
+ PValue = JudyLNext(context_registry_judyl, &index, PJE0);
+ }
+
+ // Free the entire Judy array
+ JudyLFreeArray(&context_registry_judyl, PJE0);
+
+ spinlock_unlock(&context_registry_spinlock);
+}
+
+// Add a context to the registry or increment its reference count
+bool rrdcontext_context_registry_add(STRING *context) {
+ if (unlikely(!context))
+ return false;
+
+ bool is_new = false;
+
+ spinlock_lock(&context_registry_spinlock);
+
+ // Get or insert a slot for this context
+ Pvoid_t *PValue = JudyLIns(&context_registry_judyl, (Word_t)context, PJE0);
+
+ if (unlikely(PValue == PJERR)) {
+ // Memory allocation error
+ internal_error(true, "RRDCONTEXT: JudyL memory allocation failed in rrdcontext_context_registry_add()");
+ spinlock_unlock(&context_registry_spinlock);
+ return false;
+ }
+
+ size_t count = (size_t)(Word_t)*PValue;
+
+ if (count == 0) {
+ // This is a new context - duplicate the string to increase its reference count
+ string_dup(context);
+ is_new = true;
+ }
+
+ // Increment the reference count
+ *PValue = (void *)(Word_t)(count + 1);
+
+ spinlock_unlock(&context_registry_spinlock);
+
+ return is_new;
+}
+
+// Remove a context from the registry or decrement its reference count
+bool rrdcontext_context_registry_remove(STRING *context) {
+ if (unlikely(!context))
+ return false;
+
+ bool is_last = false;
+
+ spinlock_lock(&context_registry_spinlock);
+
+ // Try to get the value for this context
+ Pvoid_t *PValue = JudyLGet(context_registry_judyl, (Word_t)context, PJE0);
+
+ if (PValue) {
+ size_t count = (size_t)(Word_t)*PValue;
+
+ if (count > 1) {
+ // More than one reference, just decrement
+ *PValue = (void *)(Word_t)(count - 1);
+ }
+ else {
+ // Last reference - remove it and free the string
+ int ret;
+ ret = JudyLDel(&context_registry_judyl, (Word_t)context, PJE0);
+ if (ret == 1) {
+ string_freez(context);
+ is_last = true;
+ }
+ }
+ }
+
+ spinlock_unlock(&context_registry_spinlock);
+
+ return is_last;
+}
+
+// Get the current number of unique contexts
+size_t rrdcontext_context_registry_unique_count(void) {
+ Word_t count = 0;
+
+ spinlock_lock(&context_registry_spinlock);
+
+ // Count entries manually
+ Word_t index = 0;
+ Pvoid_t *PValue = JudyLFirst(context_registry_judyl, &index, PJE0);
+
+ while (PValue) {
+ count++;
+ PValue = JudyLNext(context_registry_judyl, &index, PJE0);
+ }
+
+ spinlock_unlock(&context_registry_spinlock);
+
+ return (size_t)count;
+}
+
+void rrdcontext_context_registry_json_mcp_array(BUFFER *wb, SIMPLE_PATTERN *pattern) {
+ spinlock_lock(&context_registry_spinlock);
+
+ buffer_json_member_add_array(wb, "header");
+ buffer_json_add_array_item_string(wb, "context");
+ buffer_json_add_array_item_string(wb, "number_of_nodes_having_it");
+ buffer_json_array_close(wb);
+
+ buffer_json_member_add_array(wb, "contexts");
+
+ Word_t index = 0;
+ bool first = true;
+ Pvoid_t *PValue;
+ while ((PValue = JudyLFirstThenNext(context_registry_judyl, &index, &first))) {
+ if (!index || !*PValue) continue;
+
+ const char *context_name = string2str((STRING *)index);
+
+ // Skip if we have a pattern and it doesn't match
+ if (pattern && !simple_pattern_matches(pattern, context_name))
+ continue;
+
+ buffer_json_add_array_item_array(wb);
+ buffer_json_add_array_item_string(wb, context_name);
+ buffer_json_add_array_item_uint64(wb, *(size_t *)PValue);
+ buffer_json_array_close(wb);
+ }
+
+ buffer_json_array_close(wb);
+
+ spinlock_unlock(&context_registry_spinlock);
+}
+
+// Implementation to extract and output unique context categories
+void rrdcontext_context_registry_json_mcp_categories_array(BUFFER *wb, SIMPLE_PATTERN *pattern) {
+ spinlock_lock(&context_registry_spinlock);
+
+ // JudyL array to store unique category STRINGs as keys and counts as values
+ Pvoid_t categories_judyl = NULL;
+
+ // Header information
+ buffer_json_member_add_array(wb, "header");
+ buffer_json_add_array_item_string(wb, "category");
+ buffer_json_add_array_item_string(wb, "number_of_contexts");
+ buffer_json_array_close(wb);
+
+ buffer_json_member_add_array(wb, "categories");
+
+ // First pass: count occurrences of each category
+ Word_t index = 0;
+ bool first = true;
+ Pvoid_t *PValue;
+ while ((PValue = JudyLFirstThenNext(context_registry_judyl, &index, &first))) {
+ if (!index || !*PValue) continue;
+
+ const char *context_name = string2str((STRING *)index);
+
+ // Find the last dot in the context name
+ const char *first_dot = strchr(context_name, '.');
+
+ // Create a STRING for the category (everything up to the last dot)
+ STRING *category_str;
+ if (first_dot) {
+ // Create a STRING with the part before the last dot
+ category_str = string_strndupz(context_name, first_dot - context_name);
+ } else {
+ // No dots, use the entire context as the category
+ category_str = string_strdupz(context_name);
+ }
+
+ if (!category_str) continue;
+
+ // Get or insert a slot for this category
+ Pvoid_t *CategoryValue = JudyLIns(&categories_judyl, (Word_t)category_str, PJE0);
+
+ if (CategoryValue) {
+ // Check if this is a new entry
+ size_t count = (size_t)(Word_t)*CategoryValue;
+ if (count > 0) {
+ // Already exists, free our reference (JudyL already has one)
+ string_freez(category_str);
+ }
+ // Increment the count
+ *CategoryValue = (void *)(Word_t)(count + 1);
+ } else {
+ // Failed to insert, free the STRING
+ string_freez(category_str);
+ }
+ }
+
+ // Second pass: output the unique categories and their counts
+ index = 0;
+ first = true;
+ while ((PValue = JudyLFirstThenNext(categories_judyl, &index, &first))) {
+ if (!index) continue;
+
+ STRING *category_str = (STRING *)index;
+ const char *category = string2str(category_str);
+
+ // Apply pattern filtering here, on the category itself
+ if (!pattern || simple_pattern_matches(pattern, category)) {
+ size_t count = (size_t)(Word_t)*PValue;
+
+ buffer_json_add_array_item_array(wb);
+ buffer_json_add_array_item_string(wb, category);
+ buffer_json_add_array_item_uint64(wb, count);
+ buffer_json_array_close(wb);
+ }
+
+ // Free the STRING object as we go
+ string_freez(category_str);
+ }
+
+ buffer_json_array_close(wb);
+
+ // Free the JudyL array (values were already freed in the loop above)
+ JudyLFreeArray(&categories_judyl, PJE0);
+
+ spinlock_unlock(&context_registry_spinlock);
+}
diff --git a/src/database/contexts/rrdcontext-context-registry.h b/src/database/contexts/rrdcontext-context-registry.h
new file mode 100644
index 00000000000000..4a7f2d783bc3e0
--- /dev/null
+++ b/src/database/contexts/rrdcontext-context-registry.h
@@ -0,0 +1,20 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#ifndef NETDATA_RRDCONTEXT_CONTEXT_REGISTRY_H
+#define NETDATA_RRDCONTEXT_CONTEXT_REGISTRY_H
+
+#include "libnetdata/libnetdata.h"
+
+void rrdcontext_context_registry_destroy(void);
+
+bool rrdcontext_context_registry_add(STRING *context);
+bool rrdcontext_context_registry_remove(STRING *context);
+
+size_t rrdcontext_context_registry_unique_count(void);
+
+void rrdcontext_context_registry_json_mcp_array(BUFFER *wb, SIMPLE_PATTERN *pattern);
+
+// Helper function to get context categories (extracts prefixes before the last dot)
+void rrdcontext_context_registry_json_mcp_categories_array(BUFFER *wb, SIMPLE_PATTERN *pattern);
+
+#endif // NETDATA_RRDCONTEXT_CONTEXT_REGISTRY_H
\ No newline at end of file
diff --git a/src/database/contexts/rrdcontext-context.c b/src/database/contexts/rrdcontext-context.c
index 96b2cc29f2f630..93c9d2856c1e37 100644
--- a/src/database/contexts/rrdcontext-context.c
+++ b/src/database/contexts/rrdcontext-context.c
@@ -28,6 +28,9 @@ static void rrdcontext_insert_callback(const DICTIONARY_ITEM *item __maybe_unuse
rc->rrdhost = host;
rc->flags = rc->flags & RRD_FLAGS_ALLOWED_EXTERNALLY_ON_NEW_OBJECTS; // no need for atomics at constructor
+
+ // Add the context to the registry to track unique contexts
+ rrdcontext_context_registry_add(rc->id);
if(rc->hub.version) {
// we are loading data from the SQL database
@@ -95,6 +98,9 @@ static void rrdcontext_delete_callback(const DICTIONARY_ITEM *item __maybe_unuse
// update the count of contexts
__atomic_sub_fetch(&rc->rrdhost->rrdctx.contexts_count, 1, __ATOMIC_RELAXED);
+
+ // Remove the context from the registry
+ rrdcontext_context_registry_remove(rc->id);
rrdcontext_del_from_hub_queue(rc, false);
rrdcontext_del_from_pp_queue(rc, false);
diff --git a/src/database/contexts/rrdcontext-internal.h b/src/database/contexts/rrdcontext-internal.h
index 53a47d36464bec..9fa985197ab4a3 100644
--- a/src/database/contexts/rrdcontext-internal.h
+++ b/src/database/contexts/rrdcontext-internal.h
@@ -4,6 +4,7 @@
#define NETDATA_RRDCONTEXT_INTERNAL_H 1
#include "rrdcontext.h"
+#include "rrdcontext-context-registry.h"
#include "../sqlite/sqlite_context.h"
#include "../../aclk/schema-wrappers/rrdcontext-context.h"
#include "../../aclk/aclk_contexts_api.h"
diff --git a/src/database/contexts/rrdcontext.h b/src/database/contexts/rrdcontext.h
index a1a0ab37ce3a7b..9c698723ad167d 100644
--- a/src/database/contexts/rrdcontext.h
+++ b/src/database/contexts/rrdcontext.h
@@ -760,5 +760,7 @@ static inline bool query_target_has_percentage_units(QUERY_TARGET *qt) {
uint32_t rrdcontext_queue_version(RRDCONTEXT_QUEUE_JudyLSet *queue);
int32_t rrdcontext_queue_entries(RRDCONTEXT_QUEUE_JudyLSet *queue);
+#include "rrdcontext-context-registry.h"
+
#endif // NETDATA_RRDCONTEXT_H
diff --git a/src/database/rrd-metadata.c b/src/database/rrd-metadata.c
index 0577195bb2d89a..66c95dfa3b8059 100644
--- a/src/database/rrd-metadata.c
+++ b/src/database/rrd-metadata.c
@@ -3,6 +3,7 @@
#define RRDHOST_INTERNALS
#include "rrd.h"
#include "rrd-metadata.h"
+#include "contexts/rrdcontext-context-registry.h"
// Collect metrics metadata from all hosts
RRDSTATS_METADATA rrdstats_metadata_collect(void) {
@@ -10,7 +11,7 @@ RRDSTATS_METADATA rrdstats_metadata_collect(void) {
.nodes = { .total = 0, .receiving = 0, .sending = 0, .archived = 0 },
.metrics = { .collected = 0, .available = 0 },
.instances = { .collected = 0, .available = 0 },
- .contexts = { .collected = 0, .available = 0 }
+ .contexts = { .collected = 0, .available = 0, .unique = 0 }
};
rrd_rdlock();
@@ -45,6 +46,9 @@ RRDSTATS_METADATA rrdstats_metadata_collect(void) {
dfe_done(host);
rrd_rdunlock();
+
+ // Get the count of unique contexts from our registry
+ metadata.contexts.unique = rrdcontext_context_registry_unique_count();
return metadata;
}
\ No newline at end of file
diff --git a/src/database/rrd-metadata.h b/src/database/rrd-metadata.h
index b493d9537d2357..5bd87f74d4f37e 100644
--- a/src/database/rrd-metadata.h
+++ b/src/database/rrd-metadata.h
@@ -27,6 +27,7 @@ typedef struct rrdstats_metadata {
struct {
size_t collected;
size_t available;
+ size_t unique; // Count of unique contexts across all hosts
} contexts;
} RRDSTATS_METADATA;
diff --git a/src/database/rrd-retention.c b/src/database/rrd-retention.c
new file mode 100644
index 00000000000000..b9e183c44df2d1
--- /dev/null
+++ b/src/database/rrd-retention.c
@@ -0,0 +1,125 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#define RRDHOST_INTERNALS
+#include "rrd.h"
+#include "rrd-retention.h"
+#include "libnetdata/parsers/duration.h"
+
+// Round retention time to more human-readable values (days/hours/minutes)
+static time_t round_retention(time_t retention_seconds) {
+ if(retention_seconds > 60 * 86400)
+ retention_seconds = HOWMANY(retention_seconds, 86400) * 86400;
+ else if(retention_seconds > 86400)
+ retention_seconds = HOWMANY(retention_seconds, 3600) * 3600;
+ else
+ retention_seconds = HOWMANY(retention_seconds, 60) * 60;
+
+ return retention_seconds;
+}
+
+// Collect retention statistics from all tiers
+RRDSTATS_RETENTION rrdstats_retention_collect(void) {
+ time_t now_s = now_realtime_sec();
+
+ // Initialize the retention structure
+ RRDSTATS_RETENTION retention = {
+ .storage_tiers = 0
+ };
+
+ rrd_rdlock();
+
+ if(!localhost) {
+ rrd_rdunlock();
+ return retention;
+ }
+
+ // Count the available storage tiers
+ retention.storage_tiers = nd_profile.storage_tiers;
+
+ // Iterate through all available storage tiers
+ for(size_t tier = 0; tier < retention.storage_tiers && tier < RRD_MAX_STORAGE_TIERS; tier++) {
+ STORAGE_ENGINE *eng = localhost->db[tier].eng;
+ if(!eng)
+ continue;
+
+ RRD_STORAGE_TIER *tier_info = &retention.tiers[tier];
+ tier_info->tier = tier;
+ tier_info->backend = eng->seb;
+ tier_info->group_seconds = get_tier_grouping(tier) * localhost->rrd_update_every;
+
+ // Format human-readable granularity
+ duration_snprintf_time_t(tier_info->granularity_human, sizeof(tier_info->granularity_human), (time_t)tier_info->group_seconds);
+
+ // Get metrics and samples counts
+ tier_info->metrics = storage_engine_metrics(eng->seb, localhost->db[tier].si);
+ tier_info->samples = storage_engine_samples(eng->seb, localhost->db[tier].si);
+
+ // Get disk usage information
+ tier_info->disk_max = storage_engine_disk_space_max(eng->seb, localhost->db[tier].si);
+ tier_info->disk_used = storage_engine_disk_space_used(eng->seb, localhost->db[tier].si);
+
+#ifdef ENABLE_DBENGINE
+ if(!tier_info->disk_max && eng->seb == STORAGE_ENGINE_BACKEND_DBENGINE) {
+ tier_info->disk_max = rrdeng_get_directory_free_bytes_space(multidb_ctx[tier]);
+ tier_info->disk_max += tier_info->disk_used;
+ }
+#endif
+
+ // Calculate disk usage percentage
+ if(tier_info->disk_used && tier_info->disk_max)
+ tier_info->disk_percent = (double)tier_info->disk_used * 100.0 / (double)tier_info->disk_max;
+ else
+ tier_info->disk_percent = 0.0;
+
+ // Get retention information
+ tier_info->first_time_s = storage_engine_global_first_time_s(eng->seb, localhost->db[tier].si);
+ tier_info->last_time_s = now_s;
+
+ if(tier_info->first_time_s < tier_info->last_time_s) {
+ tier_info->retention = tier_info->last_time_s - tier_info->first_time_s;
+
+ // Format human-readable retention
+ duration_snprintf(tier_info->retention_human, sizeof(tier_info->retention_human),
+ round_retention(tier_info->retention), "s", false);
+
+ if(tier_info->disk_used || tier_info->disk_max) {
+ // Get requested retention time
+ tier_info->requested_retention = 0;
+#ifdef ENABLE_DBENGINE
+ if(eng->seb == STORAGE_ENGINE_BACKEND_DBENGINE)
+ tier_info->requested_retention = multidb_ctx[tier]->config.max_retention_s;
+#endif
+
+ // Format human-readable requested retention
+ duration_snprintf(tier_info->requested_retention_human, sizeof(tier_info->requested_retention_human),
+ (int)tier_info->requested_retention, "s", false);
+
+ // Calculate expected retention based on current usage
+ time_t space_retention = 0;
+ if(tier_info->disk_percent > 0)
+ space_retention = (time_t)((double)(now_s - tier_info->first_time_s) * 100.0 / tier_info->disk_percent);
+
+ tier_info->expected_retention = (tier_info->requested_retention && tier_info->requested_retention < space_retention)
+ ? tier_info->requested_retention
+ : space_retention;
+
+ // Format human-readable expected retention
+ duration_snprintf(tier_info->expected_retention_human, sizeof(tier_info->expected_retention_human),
+ (int)round_retention(tier_info->expected_retention), "s", false);
+ }
+ }
+ else {
+ // No data yet in this tier
+ tier_info->retention = 0;
+ tier_info->retention_human[0] = '\0';
+ tier_info->requested_retention = 0;
+ tier_info->requested_retention_human[0] = '\0';
+ tier_info->expected_retention = 0;
+ tier_info->expected_retention_human[0] = '\0';
+ }
+ }
+
+ rrd_rdunlock();
+
+ return retention;
+}
\ No newline at end of file
diff --git a/src/database/rrd-retention.h b/src/database/rrd-retention.h
new file mode 100644
index 00000000000000..0fc587b6948aa4
--- /dev/null
+++ b/src/database/rrd-retention.h
@@ -0,0 +1,47 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#ifndef NETDATA_RRD_RETENTION_H
+#define NETDATA_RRD_RETENTION_H
+
+#include "libnetdata/libnetdata.h"
+#include "storage-engine.h"
+
+// Maximum number of storage tiers the system supports
+#define RRD_MAX_STORAGE_TIERS 32
+
+// Structure to hold information about each storage tier
+typedef struct rrd_storage_tier {
+ size_t tier; // Tier number
+ STORAGE_ENGINE_BACKEND backend; // Storage engine backend (RRDDIM or DBENGINE)
+ size_t group_seconds; // Granularity in seconds
+ char granularity_human[32]; // Human-readable granularity string
+
+ size_t metrics; // Number of metrics in this tier
+ size_t samples; // Number of samples in this tier
+
+ uint64_t disk_used; // Disk space used in bytes
+ uint64_t disk_max; // Maximum available disk space in bytes
+ double disk_percent; // Disk usage percentage (0.0-100.0)
+
+ time_t first_time_s; // Oldest timestamp in this tier
+ time_t last_time_s; // Most recent timestamp in this tier
+ time_t retention; // Current retention in seconds (last_time_s - first_time_s)
+ char retention_human[32]; // Human-readable current retention
+
+ time_t requested_retention; // Configured maximum retention in seconds
+ char requested_retention_human[32]; // Human-readable configured retention
+
+ time_t expected_retention; // Expected retention based on current usage
+ char expected_retention_human[32]; // Human-readable expected retention
+} RRD_STORAGE_TIER;
+
+// Main structure to hold retention information across all tiers
+typedef struct rrdstats_retention {
+ size_t storage_tiers; // Number of available storage tiers
+ RRD_STORAGE_TIER tiers[RRD_MAX_STORAGE_TIERS]; // Array of tier information
+} RRDSTATS_RETENTION;
+
+// Function to collect retention statistics
+RRDSTATS_RETENTION rrdstats_retention_collect(void);
+
+#endif // NETDATA_RRD_RETENTION_H
\ No newline at end of file
diff --git a/src/libnetdata/circular_buffer/circular_buffer.c b/src/libnetdata/circular_buffer/circular_buffer.c
index 7ffe6b8bc79a7e..8e8e0207a4dc3d 100644
--- a/src/libnetdata/circular_buffer/circular_buffer.c
+++ b/src/libnetdata/circular_buffer/circular_buffer.c
@@ -1,7 +1,10 @@
#include "../libnetdata.h"
-struct circular_buffer *cbuffer_new(size_t initial, size_t max, size_t *statistics) {
- struct circular_buffer *buf = mallocz(sizeof(struct circular_buffer));
+// Initialize a pre-allocated circular buffer
+void cbuffer_init(struct circular_buffer *buf, size_t initial, size_t max, size_t *statistics) {
+ if (unlikely(!buf))
+ return;
+
buf->size = initial;
buf->data = mallocz(initial);
buf->write = 0;
@@ -10,19 +13,44 @@ struct circular_buffer *cbuffer_new(size_t initial, size_t max, size_t *statisti
buf->statistics = statistics;
if(buf->statistics)
- __atomic_add_fetch(buf->statistics, sizeof(struct circular_buffer) + buf->size, __ATOMIC_RELAXED);
+ __atomic_add_fetch(buf->statistics, buf->size, __ATOMIC_RELAXED);
+}
+
+// Cleanup resources for a pre-allocated circular buffer
+void cbuffer_cleanup(struct circular_buffer *buf) {
+ if (unlikely(!buf))
+ return;
+
+ if(buf->statistics)
+ __atomic_sub_fetch(buf->statistics, buf->size, __ATOMIC_RELAXED);
+
+ freez(buf->data);
+ buf->data = NULL;
+ buf->size = 0;
+ buf->write = 0;
+ buf->read = 0;
+}
+
+// Allocate and initialize a new circular buffer
+struct circular_buffer *cbuffer_new(size_t initial, size_t max, size_t *statistics) {
+ struct circular_buffer *buf = mallocz(sizeof(struct circular_buffer));
+ cbuffer_init(buf, initial, max, statistics);
+
+ if(buf->statistics)
+ __atomic_add_fetch(buf->statistics, sizeof(struct circular_buffer), __ATOMIC_RELAXED);
return buf;
}
+// Free a circular buffer allocated with cbuffer_new
void cbuffer_free(struct circular_buffer *buf) {
if (unlikely(!buf))
return;
if(buf->statistics)
- __atomic_sub_fetch(buf->statistics, sizeof(struct circular_buffer) + buf->size, __ATOMIC_RELAXED);
+ __atomic_sub_fetch(buf->statistics, sizeof(struct circular_buffer), __ATOMIC_RELAXED);
- freez(buf->data);
+ cbuffer_cleanup(buf);
freez(buf);
}
@@ -63,13 +91,18 @@ static int cbuffer_realloc_unsafe(struct circular_buffer *buf) {
return 0;
}
+ALWAYS_INLINE
+size_t cbuffer_used_size_unsafe(struct circular_buffer *buf) {
+ return (buf->write >= buf->read) ? (buf->write - buf->read) : (buf->size - buf->read + buf->write);
+}
+
+ALWAYS_INLINE
size_t cbuffer_available_size_unsafe(struct circular_buffer *buf) {
- size_t len = (buf->write >= buf->read) ? (buf->write - buf->read) : (buf->size - buf->read + buf->write);
- return buf->max_size - len;
+ return buf->max_size - cbuffer_used_size_unsafe(buf);
}
int cbuffer_add_unsafe(struct circular_buffer *buf, const char *d, size_t d_len) {
- size_t len = (buf->write >= buf->read) ? (buf->write - buf->read) : (buf->size - buf->read + buf->write);
+ size_t len = cbuffer_used_size_unsafe(buf);
while (d_len + len >= buf->size) {
if (cbuffer_realloc_unsafe(buf)) {
return 1;
@@ -90,6 +123,7 @@ int cbuffer_add_unsafe(struct circular_buffer *buf, const char *d, size_t d_len)
}
// Assume caller does not remove too many bytes (i.e. read will jump over write)
+ALWAYS_INLINE
void cbuffer_remove_unsafe(struct circular_buffer *buf, size_t num) {
buf->read += num;
// Assume num < size (i.e. caller cannot remove more bytes than are in the buffer)
@@ -97,6 +131,7 @@ void cbuffer_remove_unsafe(struct circular_buffer *buf, size_t num) {
buf->read -= buf->size;
}
+ALWAYS_INLINE
size_t cbuffer_next_unsafe(struct circular_buffer *buf, char **start) {
if (start != NULL)
*start = buf->data + buf->read;
@@ -107,7 +142,92 @@ size_t cbuffer_next_unsafe(struct circular_buffer *buf, char **start) {
return buf->size - buf->read;
}
+ALWAYS_INLINE
void cbuffer_flush(struct circular_buffer*buf) {
buf->write = 0;
buf->read = 0;
-}
\ No newline at end of file
+}
+
+// Ensures that the requested size is available as a contiguous block in the buffer
+// Returns true if there's enough data and it's now contiguous, false otherwise
+bool cbuffer_ensure_unwrapped_size(struct circular_buffer *buf, size_t size) {
+ if (unlikely(!buf || !buf->data))
+ return false;
+
+ size_t used = cbuffer_used_size_unsafe(buf);
+ if(used < size)
+ return false;
+
+ char *unwrapped;
+ size_t unwrapped_size = cbuffer_next_unsafe(buf, &unwrapped);
+ if(unwrapped_size >= size)
+ return true;
+
+ size_t wrapped_size = used - unwrapped_size;
+
+ char *tmp = mallocz(unwrapped_size);
+ memcpy(tmp, unwrapped, unwrapped_size);
+ cbuffer_remove_unsafe(buf, unwrapped_size);
+
+ memmove(buf->data + unwrapped_size, buf->data, wrapped_size);
+ memcpy(buf->data, tmp, unwrapped_size);
+ freez(tmp);
+
+ buf->read = 0;
+ buf->write = unwrapped_size + wrapped_size;
+
+ return true;
+}
+
+// Reserve space in the circular buffer for direct writing
+// Returns a pointer to the reserved space, or NULL if reservation fails
+char *cbuffer_reserve_unsafe(struct circular_buffer *buf, size_t size) {
+ if (unlikely(!buf || !buf->data || size == 0))
+ return NULL;
+
+ // First, make sure we have enough space in the buffer
+ size_t len = cbuffer_used_size_unsafe(buf);
+ while (size + len >= buf->size) {
+ if (cbuffer_realloc_unsafe(buf)) {
+ // Can't grow buffer anymore
+ return NULL;
+ }
+ }
+
+ if(buf->write + size > buf->size) {
+ if (!cbuffer_ensure_unwrapped_size(buf, len))
+ return NULL;
+
+ if(buf->read != 0 && buf->write + size > buf->size) {
+ // It is a contiguous buffer, but we need to move the data
+ // Move the data to the beginning of the buffer
+ memmove(buf->data, buf->data + buf->read, buf->write - buf->read);
+ buf->write -= buf->read;
+ buf->read = 0;
+ }
+ }
+
+ // Check if we can write contiguously from the current write position
+ if (buf->write + size <= buf->size) {
+ // Simple case - we have enough space at the current write position
+ return buf->data + buf->write;
+ }
+ else {
+ // impossible case since cbuffer_ensure_unwrapped_size() returned true
+ return NULL;
+ }
+}
+
+// Commit the reserved space after writing to it
+// Size should be less than or equal to the size passed to cbuffer_reserve_unsafe
+void cbuffer_commit_reserved_unsafe(struct circular_buffer *buf, size_t size) {
+ if (unlikely(!buf || !buf->data || size == 0))
+ return;
+
+ // Update the write pointer
+ buf->write += size;
+
+ // Handle wrap-around if we've gone past the buffer boundary
+ if (buf->write >= buf->size)
+ buf->write -= buf->size;
+}
diff --git a/src/libnetdata/circular_buffer/circular_buffer.h b/src/libnetdata/circular_buffer/circular_buffer.h
index 9d29a84d70c77b..6de2361904718b 100644
--- a/src/libnetdata/circular_buffer/circular_buffer.h
+++ b/src/libnetdata/circular_buffer/circular_buffer.h
@@ -9,12 +9,28 @@ struct circular_buffer {
char *data;
};
+// Allocation/deallocation functions
struct circular_buffer *cbuffer_new(size_t initial, size_t max, size_t *statistics);
void cbuffer_free(struct circular_buffer *buf);
+
+// Static allocation support
+void cbuffer_init(struct circular_buffer *buf, size_t initial, size_t max, size_t *statistics);
+void cbuffer_cleanup(struct circular_buffer *buf);
+
+// Buffer operations
int cbuffer_add_unsafe(struct circular_buffer *buf, const char *d, size_t d_len);
void cbuffer_remove_unsafe(struct circular_buffer *buf, size_t num);
size_t cbuffer_next_unsafe(struct circular_buffer *buf, char **start);
size_t cbuffer_available_size_unsafe(struct circular_buffer *buf);
-void cbuffer_flush(struct circular_buffer*buf);
+void cbuffer_flush(struct circular_buffer *buf);
+
+// Reserve/commit operations for direct buffer access
+char *cbuffer_reserve_unsafe(struct circular_buffer *buf, size_t size);
+void cbuffer_commit_reserved_unsafe(struct circular_buffer *buf, size_t size);
+
+// Check if a size is wrapped in buffer and unwrap if necessary
+bool cbuffer_ensure_unwrapped_size(struct circular_buffer *buf, size_t size);
+
+size_t cbuffer_used_size_unsafe(struct circular_buffer *buf);
#endif
diff --git a/src/libnetdata/http/http_defs.h b/src/libnetdata/http/http_defs.h
index e1e26863efc032..93d7acd5a8cc8f 100644
--- a/src/libnetdata/http/http_defs.h
+++ b/src/libnetdata/http/http_defs.h
@@ -9,6 +9,7 @@
// HTTP_CODES 1XX
#define HTTP_RESP_SWITCH_PROTO 101
+#define HTTP_RESP_WEBSOCKET_HANDSHAKE 101 // WebSocket uses 101 Switching Protocols
// HTTP_CODES 2XX Success
#define HTTP_RESP_OK 200
@@ -51,6 +52,7 @@ typedef enum __attribute__((__packed__)) {
HTTP_REQUEST_MODE_FILECOPY = 5,
HTTP_REQUEST_MODE_OPTIONS = 6,
HTTP_REQUEST_MODE_STREAM = 7,
+ HTTP_REQUEST_MODE_WEBSOCKET = 8,
} HTTP_REQUEST_MODE;
ENUM_STR_DEFINE_FUNCTIONS_EXTERN(HTTP_REQUEST_MODE);
diff --git a/src/libnetdata/log/nd_log.h b/src/libnetdata/log/nd_log.h
index 788792ade29ba0..e244a4525a3be9 100644
--- a/src/libnetdata/log/nd_log.h
+++ b/src/libnetdata/log/nd_log.h
@@ -80,42 +80,34 @@ struct log_stack_entry {
void log_stack_pop(void *ptr);
void log_stack_push(struct log_stack_entry *lgs);
-#define D_WEB_BUFFER 0x0000000000000001
-#define D_WEB_CLIENT 0x0000000000000002
-#define D_LISTENER 0x0000000000000004
-#define D_WEB_DATA 0x0000000000000008
-#define D_OPTIONS 0x0000000000000010
-#define D_PROCNETDEV_LOOP 0x0000000000000020
-#define D_RRD_STATS 0x0000000000000040
-#define D_WEB_CLIENT_ACCESS 0x0000000000000080
-#define D_TC_LOOP 0x0000000000000100
-#define D_DEFLATE 0x0000000000000200
-#define D_CONFIG 0x0000000000000400
-#define D_PLUGINSD 0x0000000000000800
-#define D_CHILDS 0x0000000000001000
-#define D_EXIT 0x0000000000002000
-#define D_CHECKS 0x0000000000004000
-#define D_NFACCT_LOOP 0x0000000000008000
-#define D_PROCFILE 0x0000000000010000
-#define D_RRD_CALLS 0x0000000000020000
-#define D_DICTIONARY 0x0000000000040000
-#define D_MEMORY 0x0000000000080000
-#define D_CGROUP 0x0000000000100000
-#define D_REGISTRY 0x0000000000200000
-#define D_VARIABLES 0x0000000000400000
-#define D_HEALTH 0x0000000000800000
-#define D_CONNECT_TO 0x0000000001000000
-#define D_RRDHOST 0x0000000002000000
-#define D_LOCKS 0x0000000004000000
-#define D_EXPORTING 0x0000000008000000
-#define D_STATSD 0x0000000010000000
-#define D_POLLFD 0x0000000020000000
-#define D_STREAM 0x0000000040000000
-#define D_ANALYTICS 0x0000000080000000
-#define D_RRDENGINE 0x0000000100000000
-#define D_ACLK 0x0000000200000000
-#define D_REPLICATION 0x0000002000000000
-#define D_SYSTEM 0x8000000000000000
+#define D_WEB_BUFFER (1ULL << 0)
+#define D_WEB_CLIENT (1ULL << 1)
+#define D_LISTENER (1ULL << 2)
+#define D_WEB_DATA (1ULL << 3)
+#define D_OPTIONS (1ULL << 4)
+#define D_PROCNETDEV_LOOP (1ULL << 5)
+#define D_RRD_STATS (1ULL << 6)
+#define D_WEB_CLIENT_ACCESS (1ULL << 7)
+#define D_TC_LOOP (1ULL << 8)
+#define D_DEFLATE (1ULL << 9)
+#define D_CONFIG (1ULL << 10)
+#define D_PLUGINSD (1ULL << 11)
+#define D_PROCFILE (1ULL << 12)
+#define D_RRD_CALLS (1ULL << 13)
+#define D_DICTIONARY (1ULL << 14)
+#define D_CGROUP (1ULL << 15)
+#define D_REGISTRY (1ULL << 16)
+#define D_HEALTH (1ULL << 17)
+#define D_LOCKS (1ULL << 18)
+#define D_EXPORTING (1ULL << 19)
+#define D_STATSD (1ULL << 20)
+#define D_STREAM (1ULL << 21)
+#define D_ANALYTICS (1ULL << 22)
+#define D_RRDENGINE (1ULL << 23)
+#define D_ACLK (1ULL << 24)
+#define D_WEBSOCKET (1ULL << 25)
+#define D_MCP (1ULL << 26)
+#define D_SYSTEM (1ULL << 27)
extern uint64_t debug_flags;
extern const char *program_name;
diff --git a/src/libnetdata/socket/nd-poll.c b/src/libnetdata/socket/nd-poll.c
index 401d3098ec1480..034b5f996da329 100644
--- a/src/libnetdata/socket/nd-poll.c
+++ b/src/libnetdata/socket/nd-poll.c
@@ -6,7 +6,7 @@
#define POLLRDHUP 0
#endif
-#if defined(OS_LINUX_DISABLE_EPOLL_DUE_TO_BUG)
+#if defined(OS_LINUX_DISABLED_DUE_TO_BUG_IN_KERNEL_EPOLL)
#include
struct fd_info {
diff --git a/src/libnetdata/socket/poll-events.c b/src/libnetdata/socket/poll-events.c
index 5cfd2a3ae7a5ca..e2ffefbc0b258e 100644
--- a/src/libnetdata/socket/poll-events.c
+++ b/src/libnetdata/socket/poll-events.c
@@ -1,9 +1,10 @@
// SPDX-License-Identifier: GPL-3.0-or-later
+#include "daemon/static_threads.h"
#include "libnetdata/libnetdata.h"
static inline void poll_process_updated_events(POLLINFO *pi) {
- if(pi->events != pi->events_we_wait_for) {
+ if(pi->events != pi->events_we_wait_for && !(pi->flags & POLLINFO_FLAG_REMOVED_FROM_POLL)) {
if(!nd_poll_upd(pi->p->ndpl, pi->fd, pi->events))
nd_log(NDLS_DAEMON, NDLP_ERR, "Failed to update socket %d to nd_poll", pi->fd);
pi->events_we_wait_for = pi->events;
@@ -72,15 +73,26 @@ POLLINFO *poll_add_fd(POLLJOB *p
return pi;
}
+void poll_process_remove_from_poll(POLLINFO *pi) {
+ POLLJOB *p = pi->p;
+
+ if(!nd_poll_del(p->ndpl, pi->fd))
+ nd_log(NDLS_DAEMON, NDLP_ERR,
+ "Failed to delete socket %d from nd_poll() - is the socket already closed?", pi->fd);
+ else
+ pi->flags |= POLLINFO_FLAG_REMOVED_FROM_POLL;
+}
+
static inline void poll_close_fd(POLLINFO *pi, const char *func) {
POLLJOB *p = pi->p;
DOUBLE_LINKED_LIST_REMOVE_ITEM_UNSAFE(p->ll, pi, prev, next);
- if(!nd_poll_del(p->ndpl, pi->fd))
- // this is ok, if the socket is already closed
- nd_log(NDLS_DAEMON, NDLP_DEBUG,
+ if(!(pi->flags & POLLINFO_FLAG_REMOVED_FROM_POLL) && !nd_poll_del(p->ndpl, pi->fd))
+ nd_log(NDLS_DAEMON, NDLP_ERR,
"Failed to delete socket %d from nd_poll() - called from %s() - is the socket already closed?",
pi->fd, func);
+ else
+ pi->flags |= POLLINFO_FLAG_REMOVED_FROM_POLL;
if(pi->flags & POLLINFO_FLAG_CLIENT_SOCKET) {
pi->del_callback(pi);
@@ -394,6 +406,14 @@ void poll_events(LISTEN_SOCKETS *sockets
CLEANUP_FUNCTION_REGISTER(poll_events_cleanup) cleanup_ptr = &p;
+ size_t iteration_counter = 0,
+ timeout_counter = 0,
+ errors_counter = 0,
+ read_counter = 0,
+ writes_counter = 0,
+ unhandled_counter = 0,
+ cleanup_counter = 0;
+
while(!check_to_stop_callback() && !nd_thread_signaled_to_cancel()) {
if(unlikely(timer_usec)) {
now_usec = now_boottime_usec();
@@ -424,6 +444,7 @@ void poll_events(LISTEN_SOCKETS *sockets
nd_poll_result_t result;
retval = nd_poll_wait(p.ndpl, ND_CHECK_CANCELLABILITY_WHILE_WAITING_EVERY_MS, &result);
+ iteration_counter++;
time_t now = now_boottime_sec();
if(unlikely(retval == -1)) {
@@ -431,20 +452,23 @@ void poll_events(LISTEN_SOCKETS *sockets
break;
}
else if(unlikely(!retval)) {
+ timeout_counter++;
// timeout
;
}
else {
POLLINFO *pi = (POLLINFO *)result.data;
- if(result.events & (ND_POLL_HUP | ND_POLL_INVALID | ND_POLL_ERROR))
+ if(result.events & (ND_POLL_HUP | ND_POLL_INVALID | ND_POLL_ERROR)) {
+ errors_counter++;
poll_process_error(pi, result.events);
-
+ }
else if(result.events & ND_POLL_WRITE) {
+ writes_counter++;
poll_process_send(pi, now);
}
-
else if(result.events & ND_POLL_READ) {
+ read_counter++;
if (pi->flags & POLLINFO_FLAG_CLIENT_SOCKET) {
if (pi->socktype == SOCK_DGRAM)
poll_process_udp_read(pi, now);
@@ -496,6 +520,8 @@ void poll_events(LISTEN_SOCKETS *sockets
}
}
else {
+ unhandled_counter++;
+
nd_log(NDLS_DAEMON, NDLP_ERR,
"POLLFD: LISTENER: socket slot %zu (fd %d) client %s port %s unhandled event id %d."
, i
@@ -510,6 +536,8 @@ void poll_events(LISTEN_SOCKETS *sockets
}
if(unlikely(p.checks_every > 0 && now - last_check > p.checks_every)) {
+ cleanup_counter++;
+
last_check = now;
// cleanup old sockets
diff --git a/src/libnetdata/socket/poll-events.h b/src/libnetdata/socket/poll-events.h
index cb16551eca8b52..f5d3aa113c6cb9 100644
--- a/src/libnetdata/socket/poll-events.h
+++ b/src/libnetdata/socket/poll-events.h
@@ -5,9 +5,10 @@
#include "nd-poll.h"
-#define POLLINFO_FLAG_SERVER_SOCKET 0x00000001
-#define POLLINFO_FLAG_CLIENT_SOCKET 0x00000002
-#define POLLINFO_FLAG_DONT_CLOSE 0x00000004
+#define POLLINFO_FLAG_SERVER_SOCKET (1U << 0)
+#define POLLINFO_FLAG_CLIENT_SOCKET (1U << 1)
+#define POLLINFO_FLAG_DONT_CLOSE (1U << 2)
+#define POLLINFO_FLAG_REMOVED_FROM_POLL (1U << 3)
typedef struct poll POLLJOB;
typedef struct pollinfo POLLINFO;
@@ -80,6 +81,8 @@ int poll_default_rcv_callback(POLLINFO *pi, nd_poll_event_t *events);
void poll_default_del_callback(POLLINFO *pi);
void *poll_default_add_callback(POLLINFO *pi, nd_poll_event_t *events, void *data);
+void poll_process_remove_from_poll(POLLINFO *pi);
+
POLLINFO *poll_add_fd(POLLJOB *p
, int fd
, int socktype
diff --git a/src/libnetdata/url/url.h b/src/libnetdata/url/url.h
index 67e57ed3df189d..495e1c47f4a6f6 100644
--- a/src/libnetdata/url/url.h
+++ b/src/libnetdata/url/url.h
@@ -19,10 +19,6 @@ char to_hex(char code);
/* IMPORTANT: be sure to free() the returned string after use */
char *url_encode(const char *str);
-/* Returns a url-decoded version of str */
-/* IMPORTANT: be sure to free() the returned string after use */
-char *url_decode(char *str);
-
char *url_decode_r(char *to, const char *url, size_t size);
bool url_is_request_complete_and_extract_payload(const char *begin, const char *end, size_t length, BUFFER **post_payload);
diff --git a/src/streaming/stream-receiver-connection.c b/src/streaming/stream-receiver-connection.c
index ba940b573a3edd..03af70b7551589 100644
--- a/src/streaming/stream-receiver-connection.c
+++ b/src/streaming/stream-receiver-connection.c
@@ -137,6 +137,7 @@ static int stream_receiver_response_too_busy_now(struct web_client *w) {
}
static void stream_receiver_takeover_web_connection(struct web_client *w, struct receiver_state *rpt) {
+ // Set the file descriptor and ssl from the web client
rpt->sock.fd = w->fd;
rpt->sock.ssl = w->ssl;
@@ -151,6 +152,8 @@ static void stream_receiver_takeover_web_connection(struct web_client *w, struct
w->fd = -1;
buffer_flush(w->response.data);
+
+ web_server_remove_current_socket_from_poll();
}
static void stream_send_error_on_taken_over_connection(struct receiver_state *rpt, const char *msg) {
diff --git a/src/streaming/stream-sender.c b/src/streaming/stream-sender.c
index 50e843c177610b..ee2766f76d4ff0 100644
--- a/src/streaming/stream-sender.c
+++ b/src/streaming/stream-sender.c
@@ -448,6 +448,8 @@ static void stream_sender_move_running_to_connector_or_remove_internal(struct st
stream_sender_log_disconnection(sth, s, reason, receiver_reason);
+ // IMPORTANT: make sure it REMOVED from nd_poll() before closing the socket
+ // otherwise, undefined things will happen due to socket reuse and epoll()
nd_sock_close(&s->sock);
stream_parent_set_host_disconnect_reason(s->host, reason, now_realtime_sec());
diff --git a/src/web/api/http_header.c b/src/web/api/http_header.c
index 76353bb140a075..1a7b848874eca6 100644
--- a/src/web/api/http_header.c
+++ b/src/web/api/http_header.c
@@ -62,6 +62,10 @@ static void http_header_origin(struct web_client *w, const char *v, size_t len _
static void http_header_connection(struct web_client *w, const char *v, size_t len __maybe_unused) {
if(strcasestr(v, "keep-alive"))
web_client_enable_keepalive(w);
+
+ // Check for WebSocket upgrade request
+ if(strcasestr(v, "upgrade"))
+ web_client_set_websocket_handshake(w);
}
static void http_header_dnt(struct web_client *w, const char *v, size_t len __maybe_unused) {
@@ -175,6 +179,130 @@ static void http_header_x_netdata_auth(struct web_client *w, const char *v, size
}
}
+// Handle WebSocket-specific headers
+static void http_header_upgrade(struct web_client *w, const char *v, size_t len __maybe_unused) {
+ if(strcasecmp(v, "websocket") == 0) {
+ web_client_set_websocket(w);
+ }
+}
+
+static void http_header_sec_websocket_key(struct web_client *w, const char *v, size_t len __maybe_unused) {
+ if(web_client_is_websocket(w)) {
+ // Store the websocket key for later use in the handshake
+ freez(w->websocket.key);
+ w->websocket.key = strdupz(v);
+ }
+}
+
+static void http_header_sec_websocket_version(struct web_client *w, const char *v, size_t len __maybe_unused) {
+ if(web_client_is_websocket(w)) {
+ // We only support version 13, which will be checked during handshake
+ // No need to store this as we only accept one version
+ if(strcmp(v, "13") != 0) {
+ netdata_log_debug(D_WEB_CLIENT, "%llu: WebSocket version %s not supported, only version 13 is supported", w->id, v);
+ web_client_clear_websocket(w);
+ }
+ }
+}
+
+static void http_header_sec_websocket_protocol(struct web_client *w, const char *v, size_t len __maybe_unused) {
+ if(web_client_is_websocket(w)) {
+ // Store the requested protocols for later evaluation during handshake
+ w->websocket.protocol = WEBSOCKET_PROTOCOL_2id(v);
+ }
+}
+
+static void http_header_sec_websocket_extensions(struct web_client *w, const char *v, size_t len __maybe_unused) {
+ if(web_client_is_websocket(w)) {
+ // Reset extension flags
+ w->websocket.ext_flags = WS_EXTENSION_NONE;
+
+ // Check if "permessage-deflate" is requested
+ if (strstr(v, "permessage-deflate") != NULL) {
+ // Parse extension parameters
+ char extension_copy[1024];
+ strncpy(extension_copy, v, sizeof(extension_copy) - 1);
+ extension_copy[sizeof(extension_copy) - 1] = '\0';
+
+ char *token, *saveptr;
+ token = strtok_r(extension_copy, ",", &saveptr);
+
+ while (token) {
+ // Trim leading/trailing spaces
+ char *ext = token;
+ while (*ext && isspace(*ext)) ext++;
+ char *end = ext + strlen(ext) - 1;
+ while (end > ext && isspace(*end)) *end-- = '\0';
+
+ // Check if this is permessage-deflate extension
+ if (strncmp(ext, "permessage-deflate", 18) == 0) {
+ w->websocket.ext_flags |= WS_EXTENSION_PERMESSAGE_DEFLATE;
+
+ // Parse parameters
+ char *params = ext + 18;
+ if (*params == ';') {
+ params++;
+
+ char *param, *param_saveptr;
+ param = strtok_r(params, ";", ¶m_saveptr);
+
+ while (param) {
+ // Trim leading/trailing spaces
+ while (*param && isspace(*param)) param++;
+ end = param + strlen(param) - 1;
+ while (end > param && isspace(*end)) *end-- = '\0';
+
+ // Client no context takeover
+ if (strcmp(param, "client_no_context_takeover") == 0)
+ w->websocket.ext_flags |= WS_EXTENSION_CLIENT_NO_CONTEXT_TAKEOVER;
+
+ // Server no context takeover
+ else if (strcmp(param, "server_no_context_takeover") == 0)
+ w->websocket.ext_flags |= WS_EXTENSION_SERVER_NO_CONTEXT_TAKEOVER;
+
+ // Server max window bits
+ else if (strncmp(param, "server_max_window_bits=", 23) == 0) {
+ w->websocket.server_max_window_bits = str2u(param + 23);
+ if(w->websocket.server_max_window_bits >= 8 && w->websocket.server_max_window_bits <= 15)
+ w->websocket.ext_flags |= WS_EXTENSION_SERVER_MAX_WINDOW_BITS;
+ }
+ // Server max window bits without value
+ else if (strcmp(param, "server_max_window_bits") == 0) {
+ w->websocket.ext_flags |= WS_EXTENSION_SERVER_MAX_WINDOW_BITS;
+ w->websocket.server_max_window_bits = 0; // Default
+ }
+
+ // Client max window bits with value
+ else if (strncmp(param, "client_max_window_bits=", 23) == 0) {
+ w->websocket.client_max_window_bits = str2u(param + 23);
+ if(w->websocket.client_max_window_bits >= 8 && w->websocket.client_max_window_bits <= 15)
+ w->websocket.ext_flags |= WS_EXTENSION_CLIENT_MAX_WINDOW_BITS;
+ }
+ // Client max window bits without value
+ else if (strcmp(param, "client_max_window_bits") == 0) {
+ w->websocket.ext_flags |= WS_EXTENSION_CLIENT_MAX_WINDOW_BITS;
+ w->websocket.client_max_window_bits = 0; // Default
+ }
+
+ param = strtok_r(NULL, ";", ¶m_saveptr);
+ }
+ }
+
+ break; // Found and parsed permessage-deflate
+ }
+
+ token = strtok_r(NULL, ",", &saveptr);
+ }
+ }
+
+ netdata_log_debug(D_WEB_CLIENT, "%llu: Client requested WebSocket extensions: %s, "
+ "enabled flags: %u, client_max_window_bits: %u, server_max_window_bits: %u",
+ w->id, v, w->websocket.ext_flags,
+ w->websocket.client_max_window_bits,
+ w->websocket.server_max_window_bits);
+ }
+}
+
struct {
uint32_t hash;
const char *key;
@@ -196,6 +324,13 @@ struct {
{ .hash = 0, .key = "X-Netdata-User-Name", .cb = http_header_x_netdata_user_name },
{ .hash = 0, .key = "X-Netdata-Auth", .cb = http_header_x_netdata_auth },
+ // WebSocket headers
+ { .hash = 0, .key = "Upgrade", .cb = http_header_upgrade },
+ { .hash = 0, .key = "Sec-WebSocket-Key", .cb = http_header_sec_websocket_key },
+ { .hash = 0, .key = "Sec-WebSocket-Version", .cb = http_header_sec_websocket_version },
+ { .hash = 0, .key = "Sec-WebSocket-Protocol",.cb = http_header_sec_websocket_protocol },
+ { .hash = 0, .key = "Sec-WebSocket-Extensions",.cb = http_header_sec_websocket_extensions },
+
// for historical reasons.
// there are a few nightly versions of netdata UI that incorrectly use this instead of X-Netdata-Auth
{ .hash = 0, .key = "Authorization", .cb = http_header_x_netdata_auth },
diff --git a/src/web/mcp/adapters/mcp-websocket.c b/src/web/mcp/adapters/mcp-websocket.c
new file mode 100644
index 00000000000000..7d6a1ef8b3e829
--- /dev/null
+++ b/src/web/mcp/adapters/mcp-websocket.c
@@ -0,0 +1,124 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#include "mcp-websocket.h"
+#include "web/websocket/websocket-internal.h"
+
+// Store the MCP context in the WebSocket client's data field
+void mcp_websocket_set_context(struct websocket_server_client *wsc, MCP_CLIENT *ctx) {
+ if (!wsc) return;
+ wsc->user_data = ctx;
+}
+
+// Get the MCP context from a WebSocket client
+MCP_CLIENT *mcp_websocket_get_context(struct websocket_server_client *wsc) {
+ if (!wsc) return NULL;
+ return (MCP_CLIENT *)wsc->user_data;
+}
+
+// WebSocket buffer sender function for the MCP adapter
+int mcp_websocket_send_buffer(struct websocket_server_client *wsc, BUFFER *buffer) {
+ if (!wsc || !buffer) return -1;
+
+ const char *text = buffer_tostring(buffer);
+ if (!text || !*text) return -1;
+
+ return websocket_protocol_send_text(wsc, text);
+}
+
+// Create a response context for a WebSocket client
+static MCP_CLIENT *mcp_websocket_create_context(struct websocket_server_client *wsc) {
+ if (!wsc) return NULL;
+
+ MCP_CLIENT *ctx = mcp_create_client(MCP_TRANSPORT_WEBSOCKET, wsc);
+ mcp_websocket_set_context(wsc, ctx);
+
+ return ctx;
+}
+
+// WebSocket connection handler for MCP
+void mcp_websocket_on_connect(struct websocket_server_client *wsc) {
+ if (!wsc) return;
+
+ // Create the MCP context
+ MCP_CLIENT *ctx = mcp_websocket_create_context(wsc);
+ if (!ctx) {
+ websocket_protocol_send_close(wsc, WS_CLOSE_INTERNAL_ERROR, "Failed to create MCP context");
+ return;
+ }
+
+ websocket_debug(wsc, "MCP client connected");
+}
+
+// WebSocket message handler for MCP - receives message and routes to MCP
+void mcp_websocket_on_message(struct websocket_server_client *wsc, const char *message, size_t length, WEBSOCKET_OPCODE opcode) {
+ if (!wsc || !message || length == 0) return;
+
+ // Only handle text messages
+ if (opcode != WS_OPCODE_TEXT) {
+ websocket_debug(wsc, "Ignoring binary message");
+ return;
+ }
+
+ // Get the MCP context
+ MCP_CLIENT *ctx = mcp_websocket_get_context(wsc);
+ if (!ctx) {
+ websocket_debug(wsc, "MCP context not found");
+ websocket_protocol_send_close(wsc, WS_CLOSE_INTERNAL_ERROR, "MCP context not found");
+ return;
+ }
+
+ // Parse the JSON-RPC request
+ struct json_object *request = NULL;
+ enum json_tokener_error jerr = json_tokener_success;
+ request = json_tokener_parse_verbose(message, &jerr);
+
+ if (!request || jerr != json_tokener_success) {
+ websocket_debug(wsc, "Failed to parse JSON-RPC request: %s", json_tokener_error_desc(jerr));
+ CLEAN_BUFFER *b = buffer_create(0, NULL);
+ mcp_jsonrpc_error(b, NULL, 0, -32700);
+ mcp_websocket_send_buffer(wsc, b);
+ return;
+ }
+
+ // Pass the request to the MCP handler
+ mcp_handle_request(ctx, request);
+
+ // Free the request object
+ json_object_put(request);
+}
+
+// WebSocket close handler for MCP
+void mcp_websocket_on_close(struct websocket_server_client *wsc, WEBSOCKET_CLOSE_CODE code, const char *reason) {
+ if (!wsc) return;
+
+ websocket_debug(wsc, "MCP client closing (code: %d, reason: %s)", code, reason ? reason : "none");
+
+ // Clean up the MCP context
+ MCP_CLIENT *ctx = mcp_websocket_get_context(wsc);
+ if (ctx) {
+ mcp_free_client(ctx);
+ mcp_websocket_set_context(wsc, NULL);
+ }
+}
+
+// WebSocket disconnect handler for MCP
+void mcp_websocket_on_disconnect(struct websocket_server_client *wsc) {
+ if (!wsc) return;
+
+ websocket_debug(wsc, "MCP client disconnected");
+
+ // Clean up the MCP context
+ MCP_CLIENT *ctx = mcp_websocket_get_context(wsc);
+ if (ctx) {
+ mcp_free_client(ctx);
+ mcp_websocket_set_context(wsc, NULL);
+ }
+}
+
+// Register WebSocket callbacks for MCP
+void mcp_websocket_adapter_initialize(void) {
+ // Initialize the MCP subsystem
+ mcp_initialize_subsystem();
+
+ netdata_log_info("MCP WebSocket adapter initialized");
+}
\ No newline at end of file
diff --git a/src/web/mcp/adapters/mcp-websocket.h b/src/web/mcp/adapters/mcp-websocket.h
new file mode 100644
index 00000000000000..4970d58af1bbbe
--- /dev/null
+++ b/src/web/mcp/adapters/mcp-websocket.h
@@ -0,0 +1,30 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#ifndef NETDATA_MCP_ADAPTER_WEBSOCKET_H
+#define NETDATA_MCP_ADAPTER_WEBSOCKET_H
+
+#include "web/websocket/websocket-internal.h"
+#include "web/mcp/mcp.h"
+
+// Initialize the WebSocket adapter for MCP
+void mcp_websocket_adapter_initialize(void);
+
+// WebSocket protocol handler callbacks for MCP
+void mcp_websocket_on_connect(struct websocket_server_client *wsc);
+void mcp_websocket_on_message(struct websocket_server_client *wsc, const char *message, size_t length, WEBSOCKET_OPCODE opcode);
+void mcp_websocket_on_close(struct websocket_server_client *wsc, WEBSOCKET_CLOSE_CODE code, const char *reason);
+void mcp_websocket_on_disconnect(struct websocket_server_client *wsc);
+
+// Helper functions for the WebSocket adapter
+int mcp_websocket_send_json(struct websocket_server_client *wsc, struct json_object *json);
+int mcp_websocket_send_buffer(struct websocket_server_client *wsc, BUFFER *buffer);
+
+// Get and set MCP context from a WebSocket client
+MCP_CLIENT *mcp_websocket_get_context(struct websocket_server_client *wsc);
+void mcp_websocket_set_context(struct websocket_server_client *wsc, MCP_CLIENT *ctx);
+
+// Convenience wrappers for sending responses
+void mcp_websocket_send_error_response(struct websocket_server_client *wsc, int code, const char *message, uint64_t id);
+void mcp_websocket_send_success_response(struct websocket_server_client *wsc, struct json_object *result, uint64_t id);
+
+#endif // NETDATA_MCP_ADAPTER_WEBSOCKET_H
\ No newline at end of file
diff --git a/src/web/mcp/mcp-context.c b/src/web/mcp/mcp-context.c
new file mode 100644
index 00000000000000..3f3a2d6208e142
--- /dev/null
+++ b/src/web/mcp/mcp-context.c
@@ -0,0 +1,102 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+/**
+ * MCP Context Namespace
+ *
+ * The MCP Context namespace provides methods for managing contextual information
+ * exchanged between clients and servers. Context represents stateful information
+ * that enhances the interaction between components.
+ *
+ * Key features of the context namespace:
+ *
+ * 1. Context Management:
+ * - Provide contextual information to the server (context/provide)
+ * - Clear specific context data (context/clear)
+ * - Check the status of current context (context/status)
+ *
+ * 2. Context Persistence:
+ * - Save context for future use (context/save)
+ * - Load previously saved context (context/load)
+ *
+ * Context in MCP can include:
+ * - User preferences and settings
+ * - Session-specific information
+ * - Authentication and authorization details
+ * - Client capabilities and limitations
+ * - Conversation or interaction history
+ *
+ * In the Netdata environment, context might include:
+ * - User display preferences (theme, date formats, etc.)
+ * - View configurations (dashboard layouts, chart settings)
+ * - Filtering and query preferences
+ * - Historical interaction patterns
+ * - Authentication tokens and permissions
+ *
+ * Context can be transient (per session) or persistent (saved across sessions),
+ * and may be scoped to specific interactions or broadly applied.
+ */
+
+#include "mcp-context.h"
+#include "mcp-initialize.h"
+
+// Stub implementations for all context namespace methods (transport-agnostic)
+static MCP_RETURN_CODE mcp_context_method_provide(MCP_CLIENT *mcpc, struct json_object *params __maybe_unused, uint64_t id __maybe_unused) {
+ buffer_sprintf(mcpc->error, "Method 'context/provide' not implemented yet");
+ return MCP_RC_NOT_IMPLEMENTED;
+}
+
+static MCP_RETURN_CODE mcp_context_method_clear(MCP_CLIENT *mcpc, struct json_object *params __maybe_unused, uint64_t id __maybe_unused) {
+ buffer_sprintf(mcpc->error, "Method 'context/clear' not implemented yet");
+ return MCP_RC_NOT_IMPLEMENTED;
+}
+
+static MCP_RETURN_CODE mcp_context_method_status(MCP_CLIENT *mcpc, struct json_object *params __maybe_unused, uint64_t id __maybe_unused) {
+ buffer_sprintf(mcpc->error, "Method 'context/status' not implemented yet");
+ return MCP_RC_NOT_IMPLEMENTED;
+}
+
+static MCP_RETURN_CODE mcp_context_method_save(MCP_CLIENT *mcpc, struct json_object *params __maybe_unused, uint64_t id __maybe_unused) {
+ buffer_sprintf(mcpc->error, "Method 'context/save' not implemented yet");
+ return MCP_RC_NOT_IMPLEMENTED;
+}
+
+static MCP_RETURN_CODE mcp_context_method_load(MCP_CLIENT *mcpc, struct json_object *params __maybe_unused, uint64_t id __maybe_unused) {
+ buffer_sprintf(mcpc->error, "Method 'context/load' not implemented yet");
+ return MCP_RC_NOT_IMPLEMENTED;
+}
+
+// Context namespace method dispatcher (transport-agnostic)
+MCP_RETURN_CODE mcp_context_route(MCP_CLIENT *mcpc, const char *method, struct json_object *params, uint64_t id) {
+ if (!mcpc || !method) return MCP_RC_INTERNAL_ERROR;
+
+ netdata_log_debug(D_MCP, "MCP context method: %s", method);
+
+ // Flush previous buffers
+ buffer_flush(mcpc->result);
+ buffer_flush(mcpc->error);
+
+ MCP_RETURN_CODE rc;
+
+ if (strcmp(method, "provide") == 0) {
+ rc = mcp_context_method_provide(mcpc, params, id);
+ }
+ else if (strcmp(method, "clear") == 0) {
+ rc = mcp_context_method_clear(mcpc, params, id);
+ }
+ else if (strcmp(method, "status") == 0) {
+ rc = mcp_context_method_status(mcpc, params, id);
+ }
+ else if (strcmp(method, "save") == 0) {
+ rc = mcp_context_method_save(mcpc, params, id);
+ }
+ else if (strcmp(method, "load") == 0) {
+ rc = mcp_context_method_load(mcpc, params, id);
+ }
+ else {
+ // Method not found in context namespace
+ buffer_sprintf(mcpc->error, "Method 'context/%s' not implemented yet", method);
+ rc = MCP_RC_NOT_IMPLEMENTED;
+ }
+
+ return rc;
+}
diff --git a/src/web/mcp/mcp-context.h b/src/web/mcp/mcp-context.h
new file mode 100644
index 00000000000000..982a3a1599f75d
--- /dev/null
+++ b/src/web/mcp/mcp-context.h
@@ -0,0 +1,11 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#ifndef NETDATA_MCP_CONTEXT_H
+#define NETDATA_MCP_CONTEXT_H
+
+#include "mcp.h"
+
+// Context namespace method dispatcher (transport-agnostic)
+MCP_RETURN_CODE mcp_context_route(MCP_CLIENT *mcpc, const char *method, struct json_object *params, uint64_t id);
+
+#endif // NETDATA_MCP_CONTEXT_H
diff --git a/src/web/mcp/mcp-initialize.c b/src/web/mcp/mcp-initialize.c
new file mode 100644
index 00000000000000..72450adfba1981
--- /dev/null
+++ b/src/web/mcp/mcp-initialize.c
@@ -0,0 +1,252 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#include "mcp-initialize.h"
+#include "database/rrd-metadata.h"
+#include "database/rrd-retention.h"
+#include "daemon/common.h"
+
+// Initialize handler - provides information about what's available (transport-agnostic)
+MCP_RETURN_CODE mcp_method_initialize(MCP_CLIENT *mcpc, struct json_object *params, uint64_t id) {
+ if (!mcpc) {
+ buffer_strcat(mcpc->error, "Invalid MCP client context");
+ return MCP_RC_ERROR;
+ }
+
+ // Extract client's requested protocol version
+ struct json_object *protocol_version_obj = NULL;
+ if (json_object_object_get_ex(params, "protocolVersion", &protocol_version_obj)) {
+ const char *version_str = json_object_get_string(protocol_version_obj);
+
+ // Convert to our enum
+ mcpc->protocol_version = MCP_PROTOCOL_VERSION_2id(version_str);
+
+ // If unknown version, default to the latest we support
+ if (mcpc->protocol_version == MCP_PROTOCOL_VERSION_UNKNOWN) {
+ mcpc->protocol_version = MCP_PROTOCOL_VERSION_LATEST;
+ }
+ } else {
+ // No version specified, default to oldest version for compatibility
+ mcpc->protocol_version = MCP_PROTOCOL_VERSION_2024_11_05;
+ }
+
+ netdata_log_debug(D_MCP, "MCP initialize request from client %s version %s, protocol version %s",
+ string2str(mcpc->client_name), string2str(mcpc->client_version),
+ MCP_PROTOCOL_VERSION_2str(mcpc->protocol_version));
+
+ // Initialize result buffer with JSON structure
+ mcp_init_success_result(mcpc, id);
+
+ // Use rrdstats_metadata_collect to get infrastructure statistics
+ RRDSTATS_METADATA metadata = rrdstats_metadata_collect();
+
+ // Use rrdstats_retention_collect to get retention information
+ RRDSTATS_RETENTION retention = rrdstats_retention_collect();
+
+ buffer_json_member_add_object(mcpc->result, "result");
+
+ // Add protocol version based on what client requested
+ buffer_json_member_add_string(mcpc->result, "protocolVersion",
+ MCP_PROTOCOL_VERSION_2str(mcpc->protocol_version));
+
+ // Add server info object
+ buffer_json_member_add_object(mcpc->result, "serverInfo");
+ buffer_json_member_add_string(mcpc->result, "name", "Netdata");
+ buffer_json_member_add_string(mcpc->result, "version", NETDATA_VERSION);
+ buffer_json_object_close(mcpc->result); // Close serverInfo
+
+ // Add capabilities object according to MCP standard
+ buffer_json_member_add_object(mcpc->result, "capabilities");
+
+ // Tools capabilities
+ buffer_json_member_add_object(mcpc->result, "tools");
+ buffer_json_member_add_boolean(mcpc->result, "listChanged", false);
+ buffer_json_member_add_boolean(mcpc->result, "asyncExecution", true);
+ buffer_json_member_add_boolean(mcpc->result, "batchExecution", true);
+ buffer_json_object_close(mcpc->result); // Close tools
+
+ // Resources capabilities
+ buffer_json_member_add_object(mcpc->result, "resources");
+ buffer_json_member_add_boolean(mcpc->result, "listChanged", true);
+ buffer_json_member_add_boolean(mcpc->result, "subscribe", true);
+ buffer_json_object_close(mcpc->result); // Close resources
+
+ // Prompts capabilities
+ buffer_json_member_add_object(mcpc->result, "prompts");
+ buffer_json_member_add_boolean(mcpc->result, "listChanged", false);
+ buffer_json_object_close(mcpc->result); // Close prompts
+
+ // Notification capabilities
+ buffer_json_member_add_object(mcpc->result, "notifications");
+ buffer_json_member_add_boolean(mcpc->result, "push", true);
+ buffer_json_member_add_boolean(mcpc->result, "subscription", true);
+ buffer_json_object_close(mcpc->result); // Close notifications
+
+ // Add logging capabilities
+ buffer_json_member_add_object(mcpc->result, "logging");
+ buffer_json_object_close(mcpc->result); // Close logging
+
+ // Add version-specific capabilities
+ if (mcpc->protocol_version >= MCP_PROTOCOL_VERSION_2025_03_26) {
+ // Add completions capability - new in 2025-03-26
+ buffer_json_member_add_object(mcpc->result, "completions");
+ buffer_json_object_close(mcpc->result); // Close completions
+ }
+
+ buffer_json_object_close(mcpc->result); // Close capabilities
+
+ // Add dynamic instructions based on server profile
+ char instructions[1024];
+
+ const char *common =
+ "Use the resources to identify the systems, components and applications being monitored,\n"
+ "and the alerts that have been configured.\n"
+ "\n"
+ "Use the tools to perform queries on metrics and logs, seek for outliers and anomalies,\n"
+ "perform root cause analysis and get live information about processes, network connections,\n"
+ "containers, VMs, systemd/windows services, sensors, kubernetes clusters, and more.\n"
+ "\n"
+ "Tools can also help in investigating currently raised alerts and their past transitions.";
+
+ // Determine server role based on metadata
+ if (metadata.nodes.total > 1) {
+ // This is a parent node with child nodes streaming to it
+ snprintfz(instructions, sizeof(instructions),
+ "This is a Netdata Parent Server hosting metrics and logs for %zu node%s.\n\n%s",
+ metadata.nodes.total, (metadata.nodes.total == 1) ? "" : "s", common);
+ }
+ else {
+ // This is a standalone server
+ snprintfz(instructions, sizeof(instructions),
+ "This is Netdata on a Standalone Server.\n\n%s", common);
+ }
+
+ buffer_json_member_add_string(mcpc->result, "instructions", instructions);
+
+ // Add _meta field (optional)
+ buffer_json_member_add_object(mcpc->result, "_meta");
+ buffer_json_member_add_string(mcpc->result, "generator", "netdata");
+
+ // Get current time and calculate uptimes
+ time_t now = now_realtime_sec();
+ time_t system_uptime_seconds = now_boottime_sec();
+ time_t netdata_uptime_seconds = now - netdata_start_time;
+
+ buffer_json_member_add_int64(mcpc->result, "timestamp", (int64_t)now);
+
+ // Add system uptime info - both raw seconds and human-readable format
+ char human_readable[128];
+ duration_snprintf_time_t(human_readable, sizeof(human_readable), system_uptime_seconds);
+
+ buffer_json_member_add_object(mcpc->result, "system_uptime");
+ buffer_json_member_add_int64(mcpc->result, "seconds", (int64_t)system_uptime_seconds);
+ buffer_json_member_add_string(mcpc->result, "human", human_readable);
+ buffer_json_object_close(mcpc->result); // Close system_uptime
+
+ // Add netdata uptime info - both raw seconds and human-readable format
+ duration_snprintf_time_t(human_readable, sizeof(human_readable), netdata_uptime_seconds);
+
+ buffer_json_member_add_object(mcpc->result, "netdata_uptime");
+ buffer_json_member_add_int64(mcpc->result, "seconds", (int64_t)netdata_uptime_seconds);
+ buffer_json_member_add_string(mcpc->result, "human", human_readable);
+ buffer_json_object_close(mcpc->result); // Close netdata_uptime
+
+ // Add infrastructure statistics to metadata
+ buffer_json_member_add_object(mcpc->result, "infrastructure");
+
+ // Add nodes statistics
+ buffer_json_member_add_object(mcpc->result, "nodes");
+ buffer_json_member_add_int64(mcpc->result, "total", metadata.nodes.total);
+ buffer_json_member_add_int64(mcpc->result, "receiving_from_children", metadata.nodes.receiving);
+ buffer_json_member_add_int64(mcpc->result, "sending_to_next_parent", metadata.nodes.sending);
+ buffer_json_member_add_int64(mcpc->result, "archived_but_available_for_queries", metadata.nodes.archived);
+ buffer_json_member_add_string(mcpc->result, "info", "Nodes (or hosts, or servers, or devices) are Netdata Agent installations or virtual Netdata nodes or SNMP devices.");
+ buffer_json_object_close(mcpc->result); // Close nodes
+
+ // Add metrics statistics
+ buffer_json_member_add_object(mcpc->result, "metrics");
+ buffer_json_member_add_int64(mcpc->result, "currently_being_collected", metadata.metrics.collected);
+ buffer_json_member_add_int64(mcpc->result, "total_available_for_queries", metadata.metrics.available);
+ buffer_json_member_add_string(mcpc->result, "info", "Metrics are unique time-series in the Netdata time-series database.");
+ buffer_json_object_close(mcpc->result); // Close metrics
+
+ // Add instances statistics
+ buffer_json_member_add_object(mcpc->result, "instances");
+ buffer_json_member_add_int64(mcpc->result, "currently_being_collected", metadata.instances.collected);
+ buffer_json_member_add_int64(mcpc->result, "total_available_for_queries", metadata.instances.available);
+ buffer_json_member_add_string(mcpc->result, "info", "Instances are collections of metrics referring to a component (system, disk, network interface, application, process, container, etc).");
+ buffer_json_object_close(mcpc->result); // Close instances
+
+ // Add contexts statistics
+ buffer_json_member_add_object(mcpc->result, "contexts");
+ buffer_json_member_add_int64(mcpc->result, "unique_across_all_nodes", metadata.contexts.unique);
+ buffer_json_member_add_string(mcpc->result, "info", "Contexts are distinct charts shown on the Netdata dashboards, like system.cpu (system CPU utilization), or net.net (network interfaces bandwidth). When monitoring applications, the context usually includes the application name.");
+ buffer_json_object_close(mcpc->result); // Close contexts
+
+ // Add retention information
+ if (retention.storage_tiers > 0) {
+ buffer_json_member_add_object(mcpc->result, "retention");
+ buffer_json_member_add_array(mcpc->result, "tiers");
+
+ for (size_t i = 0; i < retention.storage_tiers; i++) {
+ RRD_STORAGE_TIER *tier_info = &retention.tiers[i];
+
+ // Skip empty tiers
+ if (tier_info->metrics == 0 && tier_info->samples == 0)
+ continue;
+
+ buffer_json_add_array_item_object(mcpc->result);
+
+ // Add basic tier info
+ buffer_json_member_add_int64(mcpc->result, "tier", tier_info->tier);
+ buffer_json_member_add_string(mcpc->result, "backend",
+ tier_info->backend == STORAGE_ENGINE_BACKEND_DBENGINE ? "dbengine" :
+ tier_info->backend == STORAGE_ENGINE_BACKEND_RRDDIM ? "ram" : "unknown");
+ buffer_json_member_add_int64(mcpc->result, "granularity", tier_info->group_seconds);
+ buffer_json_member_add_string(mcpc->result, "granularity_human", tier_info->granularity_human);
+
+ // Add metrics info
+ buffer_json_member_add_int64(mcpc->result, "metrics", tier_info->metrics);
+ buffer_json_member_add_int64(mcpc->result, "samples", tier_info->samples);
+
+ // Add storage info when available
+ if (tier_info->disk_max > 0) {
+ buffer_json_member_add_int64(mcpc->result, "disk_used", tier_info->disk_used);
+ buffer_json_member_add_int64(mcpc->result, "disk_max", tier_info->disk_max);
+ // Format disk_percent to have only 2 decimal places
+ double rounded_percent = floor(tier_info->disk_percent * 100.0 + 0.5) / 100.0;
+ buffer_json_member_add_double(mcpc->result, "disk_percent", rounded_percent);
+ }
+
+ // Add retention info
+ if (tier_info->retention > 0) {
+ buffer_json_member_add_int64(mcpc->result, "first_time_s", tier_info->first_time_s);
+ buffer_json_member_add_int64(mcpc->result, "last_time_s", tier_info->last_time_s);
+ buffer_json_member_add_int64(mcpc->result, "retention", tier_info->retention);
+ buffer_json_member_add_string(mcpc->result, "retention_human", tier_info->retention_human);
+
+ if (tier_info->requested_retention > 0) {
+ buffer_json_member_add_int64(mcpc->result, "requested_retention", tier_info->requested_retention);
+ buffer_json_member_add_string(mcpc->result, "requested_retention_human", tier_info->requested_retention_human);
+ }
+
+ if (tier_info->expected_retention > 0) {
+ buffer_json_member_add_int64(mcpc->result, "expected_retention", tier_info->expected_retention);
+ buffer_json_member_add_string(mcpc->result, "expected_retention_human", tier_info->expected_retention_human);
+ }
+ }
+
+ buffer_json_object_close(mcpc->result); // Close tier object
+ }
+
+ buffer_json_array_close(mcpc->result); // Close tiers array
+ buffer_json_member_add_string(mcpc->result, "info", "Metrics retention information for each storage tier in the Netdata database.\nHigher tiers can provide min, max, average, sum and anomaly rate with the same accuracy as tier 0.\nTiers are automatically selected during query.");
+ buffer_json_object_close(mcpc->result); // Close retention
+ }
+
+ buffer_json_object_close(mcpc->result); // Close infrastructure
+ buffer_json_object_close(mcpc->result); // Close _meta
+ buffer_json_object_close(mcpc->result); // Close result object
+ buffer_json_finalize(mcpc->result); // Finalize JSON
+
+ return MCP_RC_OK;
+}
diff --git a/src/web/mcp/mcp-initialize.h b/src/web/mcp/mcp-initialize.h
new file mode 100644
index 00000000000000..9abdf24c054c7d
--- /dev/null
+++ b/src/web/mcp/mcp-initialize.h
@@ -0,0 +1,11 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#ifndef NETDATA_MCP_INITIALIZE_H
+#define NETDATA_MCP_INITIALIZE_H
+
+#include "mcp.h"
+
+// Initialize method handler (transport-agnostic)
+MCP_RETURN_CODE mcp_method_initialize(MCP_CLIENT *mcpc, struct json_object *params, uint64_t id);
+
+#endif // NETDATA_MCP_INITIALIZE_H
\ No newline at end of file
diff --git a/src/web/mcp/mcp-notifications.c b/src/web/mcp/mcp-notifications.c
new file mode 100644
index 00000000000000..e7c7bc6e8929d8
--- /dev/null
+++ b/src/web/mcp/mcp-notifications.c
@@ -0,0 +1,131 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+/**
+ * MCP Notifications Namespace
+ *
+ * The MCP Notifications namespace provides methods for managing and handling notifications.
+ * In the MCP protocol, notifications enable real-time communication of events from server to client,
+ * and from client to server.
+ *
+ * Key features of the notifications namespace:
+ *
+ * 1. Initialization:
+ * - Clients notify the server when they're initialized (notifications/initialized)
+ * - This initiates the notification subsystem
+ *
+ * 2. Subscription Management:
+ * - Subscribe to specific notification types (notifications/subscribe)
+ * - Unsubscribe from notifications (notifications/unsubscribe)
+ * - Configure notification settings (notifications/getSettings)
+ *
+ * 3. Notification Handling:
+ * - Acknowledge received notifications (notifications/acknowledge)
+ * - View notification history (notifications/getHistory)
+ * - Send notifications from client to server (notifications/send)
+ *
+ * Notifications in MCP are bidirectional:
+ * - Server-to-client notifications inform about system events, alerts, changes
+ * - Client-to-server notifications provide user actions and status updates
+ *
+ * In the Netdata context, notifications include:
+ * - Health monitoring alerts
+ * - System status changes
+ * - Configuration changes
+ * - Resource availability updates
+ * - Client status updates
+ *
+ * Notifications can be transient or persistent, prioritized, and may require
+ * acknowledgment depending on their type and importance.
+ */
+
+#include "mcp-notifications.h"
+#include "mcp-initialize.h"
+
+// Implementation of notifications/initialized (transport-agnostic)
+static MCP_RETURN_CODE mcp_notifications_method_initialized(MCP_CLIENT *mcpc, struct json_object *params __maybe_unused, uint64_t id) {
+ // This is just a notification, just log it
+ netdata_log_debug(D_MCP, "Client sent notifications/initialized notification");
+
+ // No response needed if this is a notification (id == 0)
+ if (id == 0) return MCP_RC_OK;
+
+ // If it was a request (has id), send an empty success response
+ mcp_init_success_result(mcpc, id);
+ buffer_json_finalize(mcpc->result);
+
+ return MCP_RC_OK;
+}
+
+// Stub implementations for other notifications methods (transport-agnostic)
+static MCP_RETURN_CODE mcp_notifications_method_subscribe(MCP_CLIENT *mcpc, struct json_object *params __maybe_unused, uint64_t id __maybe_unused) {
+ buffer_sprintf(mcpc->error, "Method 'notifications/subscribe' not implemented yet");
+ return MCP_RC_NOT_IMPLEMENTED;
+}
+
+static MCP_RETURN_CODE mcp_notifications_method_unsubscribe(MCP_CLIENT *mcpc, struct json_object *params __maybe_unused, uint64_t id __maybe_unused) {
+ buffer_sprintf(mcpc->error, "Method 'notifications/unsubscribe' not implemented yet");
+ return MCP_RC_NOT_IMPLEMENTED;
+}
+
+static MCP_RETURN_CODE mcp_notifications_method_acknowledge(MCP_CLIENT *mcpc, struct json_object *params __maybe_unused, uint64_t id __maybe_unused) {
+ buffer_sprintf(mcpc->error, "Method 'notifications/acknowledge' not implemented yet");
+ return MCP_RC_NOT_IMPLEMENTED;
+}
+
+static MCP_RETURN_CODE mcp_notifications_method_getHistory(MCP_CLIENT *mcpc, struct json_object *params __maybe_unused, uint64_t id __maybe_unused) {
+ buffer_sprintf(mcpc->error, "Method 'notifications/getHistory' not implemented yet");
+ return MCP_RC_NOT_IMPLEMENTED;
+}
+
+static MCP_RETURN_CODE mcp_notifications_method_send(MCP_CLIENT *mcpc, struct json_object *params __maybe_unused, uint64_t id __maybe_unused) {
+ buffer_sprintf(mcpc->error, "Method 'notifications/send' not implemented yet");
+ return MCP_RC_NOT_IMPLEMENTED;
+}
+
+static MCP_RETURN_CODE mcp_notifications_method_getSettings(MCP_CLIENT *mcpc, struct json_object *params __maybe_unused, uint64_t id __maybe_unused) {
+ buffer_sprintf(mcpc->error, "Method 'notifications/getSettings' not implemented yet");
+ return MCP_RC_NOT_IMPLEMENTED;
+}
+
+// Notifications namespace method dispatcher (transport-agnostic)
+MCP_RETURN_CODE mcp_notifications_route(MCP_CLIENT *mcpc, const char *method, struct json_object *params, uint64_t id) {
+ if (!mcpc || !method) return MCP_RC_INTERNAL_ERROR;
+
+ netdata_log_debug(D_MCP, "MCP notifications method: %s", method);
+
+ // Flush previous buffers
+ buffer_flush(mcpc->result);
+ buffer_flush(mcpc->error);
+
+ MCP_RETURN_CODE rc;
+
+ if (strcmp(method, "initialized") == 0) {
+ rc = mcp_notifications_method_initialized(mcpc, params, id);
+ }
+ else if (strcmp(method, "subscribe") == 0) {
+ rc = mcp_notifications_method_subscribe(mcpc, params, id);
+ }
+ else if (strcmp(method, "unsubscribe") == 0) {
+ rc = mcp_notifications_method_unsubscribe(mcpc, params, id);
+ }
+ else if (strcmp(method, "acknowledge") == 0) {
+ rc = mcp_notifications_method_acknowledge(mcpc, params, id);
+ }
+ else if (strcmp(method, "getHistory") == 0) {
+ rc = mcp_notifications_method_getHistory(mcpc, params, id);
+ }
+ else if (strcmp(method, "send") == 0) {
+ rc = mcp_notifications_method_send(mcpc, params, id);
+ }
+ else if (strcmp(method, "getSettings") == 0) {
+ rc = mcp_notifications_method_getSettings(mcpc, params, id);
+ }
+ else {
+ // Method not found in notifications namespace
+ buffer_sprintf(mcpc->error, "Method 'notifications/%s' not implemented yet", method);
+ rc = MCP_RC_NOT_IMPLEMENTED;
+ }
+
+ return rc;
+}
+
diff --git a/src/web/mcp/mcp-notifications.h b/src/web/mcp/mcp-notifications.h
new file mode 100644
index 00000000000000..84c74005833d23
--- /dev/null
+++ b/src/web/mcp/mcp-notifications.h
@@ -0,0 +1,11 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#ifndef NETDATA_MCP_NOTIFICATIONS_H
+#define NETDATA_MCP_NOTIFICATIONS_H
+
+#include "mcp.h"
+
+// Notifications namespace method dispatcher (transport-agnostic)
+MCP_RETURN_CODE mcp_notifications_route(MCP_CLIENT *mcpc, const char *method, struct json_object *params, uint64_t id);
+
+#endif // NETDATA_MCP_NOTIFICATIONS_H
\ No newline at end of file
diff --git a/src/web/mcp/mcp-prompts.c b/src/web/mcp/mcp-prompts.c
new file mode 100644
index 00000000000000..ed9f8d48b0d84f
--- /dev/null
+++ b/src/web/mcp/mcp-prompts.c
@@ -0,0 +1,131 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+/**
+ * MCP Prompts Namespace
+ *
+ * The MCP Prompts namespace provides methods for managing and executing prompts.
+ * In the MCP protocol, prompts are text templates that guide AI generation for specific tasks.
+ * Prompts are user-controlled interactions that leverage AI capabilities in predefined ways.
+ *
+ * Key features of the prompts namespace:
+ *
+ * 1. Prompt Management:
+ * - List available prompts (prompts/list)
+ * - Get details about specific prompts (prompts/get)
+ * - Save custom prompts (prompts/save)
+ * - Delete prompts (prompts/delete)
+ * - Organize prompts into categories (prompts/getCategories)
+ *
+ * 2. Prompt Execution:
+ * - Execute prompts with input parameters (prompts/execute)
+ * - View execution history (prompts/getHistory)
+ *
+ * Prompts differ from tools in that they are:
+ * - More flexible and text-oriented
+ * - Designed for natural language processing
+ * - Often used for analysis and summarization
+ * - Usually invoked explicitly by users rather than by the model
+ *
+ * In the Netdata context, prompts might include:
+ * - Analyzing a time period of metrics for anomalies
+ * - Summarizing system health
+ * - Creating natural language explanations of charts
+ * - Helping users create custom alert configurations
+ * - Generating analysis reports
+ *
+ * Prompts typically use templating to insert user-provided context into predefined templates,
+ * making them powerful for specific analysis tasks while maintaining predictable outputs.
+ */
+
+#include "mcp-prompts.h"
+#include "mcp-initialize.h"
+
+// Implementation of prompts/list (transport-agnostic)
+static MCP_RETURN_CODE mcp_prompts_method_list(MCP_CLIENT *mcpc, struct json_object *params __maybe_unused, uint64_t id) {
+ if (!mcpc || id == 0) return MCP_RC_ERROR;
+
+ // Initialize success response
+ mcp_init_success_result(mcpc, id);
+
+ // Add empty prompts array
+ buffer_json_member_add_array(mcpc->result, "prompts");
+ buffer_json_array_close(mcpc->result); // Close prompts array
+
+ // Close the result object
+ buffer_json_finalize(mcpc->result);
+
+ return MCP_RC_OK;
+}
+
+// Stub implementations for other prompts methods (transport-agnostic)
+static MCP_RETURN_CODE mcp_prompts_method_execute(MCP_CLIENT *mcpc, struct json_object *params __maybe_unused, uint64_t id __maybe_unused) {
+ buffer_sprintf(mcpc->error, "Method 'prompts/execute' not implemented yet");
+ return MCP_RC_NOT_IMPLEMENTED;
+}
+
+static MCP_RETURN_CODE mcp_prompts_method_get(MCP_CLIENT *mcpc, struct json_object *params __maybe_unused, uint64_t id __maybe_unused) {
+ buffer_sprintf(mcpc->error, "Method 'prompts/get' not implemented yet");
+ return MCP_RC_NOT_IMPLEMENTED;
+}
+
+static MCP_RETURN_CODE mcp_prompts_method_save(MCP_CLIENT *mcpc, struct json_object *params __maybe_unused, uint64_t id __maybe_unused) {
+ buffer_sprintf(mcpc->error, "Method 'prompts/save' not implemented yet");
+ return MCP_RC_NOT_IMPLEMENTED;
+}
+
+static MCP_RETURN_CODE mcp_prompts_method_delete(MCP_CLIENT *mcpc, struct json_object *params __maybe_unused, uint64_t id __maybe_unused) {
+ buffer_sprintf(mcpc->error, "Method 'prompts/delete' not implemented yet");
+ return MCP_RC_NOT_IMPLEMENTED;
+}
+
+static MCP_RETURN_CODE mcp_prompts_method_getCategories(MCP_CLIENT *mcpc, struct json_object *params __maybe_unused, uint64_t id __maybe_unused) {
+ buffer_sprintf(mcpc->error, "Method 'prompts/getCategories' not implemented yet");
+ return MCP_RC_NOT_IMPLEMENTED;
+}
+
+static MCP_RETURN_CODE mcp_prompts_method_getHistory(MCP_CLIENT *mcpc, struct json_object *params __maybe_unused, uint64_t id __maybe_unused) {
+ buffer_sprintf(mcpc->error, "Method 'prompts/getHistory' not implemented yet");
+ return MCP_RC_NOT_IMPLEMENTED;
+}
+
+// Prompts namespace method dispatcher (transport-agnostic)
+MCP_RETURN_CODE mcp_prompts_route(MCP_CLIENT *mcpc, const char *method, struct json_object *params, uint64_t id) {
+ if (!mcpc || !method) return MCP_RC_INTERNAL_ERROR;
+
+ netdata_log_debug(D_MCP, "MCP prompts method: %s", method);
+
+ // Flush previous buffers
+ buffer_flush(mcpc->result);
+ buffer_flush(mcpc->error);
+
+ MCP_RETURN_CODE rc;
+
+ if (strcmp(method, "list") == 0) {
+ rc = mcp_prompts_method_list(mcpc, params, id);
+ }
+ else if (strcmp(method, "execute") == 0) {
+ rc = mcp_prompts_method_execute(mcpc, params, id);
+ }
+ else if (strcmp(method, "get") == 0) {
+ rc = mcp_prompts_method_get(mcpc, params, id);
+ }
+ else if (strcmp(method, "save") == 0) {
+ rc = mcp_prompts_method_save(mcpc, params, id);
+ }
+ else if (strcmp(method, "delete") == 0) {
+ rc = mcp_prompts_method_delete(mcpc, params, id);
+ }
+ else if (strcmp(method, "getCategories") == 0) {
+ rc = mcp_prompts_method_getCategories(mcpc, params, id);
+ }
+ else if (strcmp(method, "getHistory") == 0) {
+ rc = mcp_prompts_method_getHistory(mcpc, params, id);
+ }
+ else {
+ // Method not found in prompts namespace
+ buffer_sprintf(mcpc->error, "Method 'prompts/%s' not implemented yet", method);
+ rc = MCP_RC_NOT_IMPLEMENTED;
+ }
+
+ return rc;
+}
diff --git a/src/web/mcp/mcp-prompts.h b/src/web/mcp/mcp-prompts.h
new file mode 100644
index 00000000000000..84f4104a80688a
--- /dev/null
+++ b/src/web/mcp/mcp-prompts.h
@@ -0,0 +1,11 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#ifndef NETDATA_MCP_PROMPTS_H
+#define NETDATA_MCP_PROMPTS_H
+
+#include "mcp.h"
+
+// Prompts namespace method dispatcher (transport-agnostic)
+MCP_RETURN_CODE mcp_prompts_route(MCP_CLIENT *mcpc, const char *method, struct json_object *params, uint64_t id);
+
+#endif // NETDATA_MCP_PROMPTS_H
\ No newline at end of file
diff --git a/src/web/mcp/mcp-resources.c b/src/web/mcp/mcp-resources.c
new file mode 100644
index 00000000000000..29265f2f0cf2c3
--- /dev/null
+++ b/src/web/mcp/mcp-resources.c
@@ -0,0 +1,496 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+/**
+ * MCP Resources Namespace
+ *
+ * The MCP Resources namespace provides methods for accessing and managing resources on the server.
+ * In the MCP protocol, resources are application-controlled data stores that provide context to the model.
+ * Resources are passive, meaning they provide data but don't perform actions on their own.
+ *
+ * Key features of the resources namespace:
+ *
+ * 1. Resource Discovery:
+ * - Clients can list available resources (resources/list)
+ * - Get detailed descriptions and schemas (resources/describe, resources/getSchema)
+ * - Search for resources matching specific criteria (resources/search)
+ *
+ * 2. Resource Access:
+ * - Retrieve specific resources or portions of resources (resources/get)
+ * - Access resources by ID or path
+ * - Resources can be structured or unstructured
+ *
+ * 3. Resource Subscriptions:
+ * - Subscribe to updates for specific resources (resources/subscribe)
+ * - Unsubscribe from resources (resources/unsubscribe)
+ * - Get real-time updates when subscribed resources change
+ *
+ * In the Netdata context, resources include:
+ * - metrics: Time-series data collected from various sources
+ * - logs: Log entries from system and application logs
+ * - alerts: Health monitoring alerts and notifications
+ * - functions: Live infrastructure snapshots providing real-time views
+ * - nodes: Monitored infrastructure nodes with their metadata
+ *
+ * Resources can be hierarchical or flat, and may support different access patterns
+ * (e.g., time-based querying for metrics, full-text search for logs).
+ */
+
+#include "mcp-resources.h"
+#include "mcp-initialize.h"
+#include "database/contexts/rrdcontext.h"
+
+// Audience enum - bitmask for the intended audience of a resource
+typedef enum {
+ RESOURCE_AUDIENCE_USER = 1 << 0, // Resource useful for users
+ RESOURCE_AUDIENCE_ASSISTANT = 1 << 1, // Resource useful for assistants
+ RESOURCE_AUDIENCE_BOTH = RESOURCE_AUDIENCE_USER | RESOURCE_AUDIENCE_ASSISTANT
+} RESOURCE_AUDIENCE;
+
+// Function pointer type for resource read callbacks
+typedef MCP_RETURN_CODE (*resource_read_fn)(MCP_CLIENT *mcpc, struct json_object *params, uint64_t id);
+
+// Function pointer type for resource size callbacks
+typedef size_t (*resource_size_fn)(void);
+
+// Resource structure definition
+typedef struct {
+ const char *name; // Resource name
+ const char *uri; // Resource URI
+ const char *description; // Human-readable description
+ HTTP_CONTENT_TYPE content_type; // Content type enum
+ RESOURCE_AUDIENCE audience; // Intended audience
+ double priority; // Priority (0.0-1.0)
+ resource_read_fn read_fn; // Callback function to read the resource
+ resource_size_fn size_fn; // Optional callback function to return approximate size in bytes
+} MCP_RESOURCE;
+
+// Resource template structure definition
+typedef struct {
+ const char *name; // Template name
+ const char *uri_template; // URI template following RFC 6570
+ const char *description; // Human-readable description
+ HTTP_CONTENT_TYPE content_type; // Content type enum
+ RESOURCE_AUDIENCE audience; // Intended audience
+ double priority; // Priority (0.0-1.0)
+} MCP_RESOURCE_TEMPLATE;
+
+// Basic implementation of the contexts resource read function
+static MCP_RETURN_CODE mcp_resource_read_contexts(MCP_CLIENT *mcpc, struct json_object *params, uint64_t id) {
+ if (!mcpc || !params || id == 0) return MCP_RC_INTERNAL_ERROR;
+
+ // Extract URI from params to check for query parameters
+ struct json_object *uri_obj = NULL;
+ json_object_object_get_ex(params, "uri", &uri_obj);
+ const char *uri = json_object_get_string(uri_obj);
+
+ SIMPLE_PATTERN *pattern = NULL;
+
+ // Check if we have a query parameter
+ if (uri && strstr(uri, "?like=")) {
+ const char *like_param = strstr(uri, "?like=") + 6; // Skip past "?like="
+
+ // Decode the query parameter
+ const char *decoded_query = mcp_uri_decode(mcpc, like_param);
+
+ // Create a simple pattern
+ if (decoded_query && *decoded_query)
+ pattern = simple_pattern_create(decoded_query, "|", SIMPLE_PATTERN_EXACT, false);
+ }
+
+ mcp_init_success_result(mcpc, id);
+ buffer_json_member_add_object(mcpc->result, "result");
+
+ // Add the filtered contexts
+ rrdcontext_context_registry_json_mcp_array(mcpc->result, pattern);
+
+ // Add instructions
+ buffer_json_member_add_string(mcpc->result, "instructions",
+ "Additional information per context (like title, dimensions, unit, label\n"
+ "keys and possible values, the list of nodes collecting it, and its retention)\n"
+ "can be obtained by reading URIs in the format 'nd://contexts/{context}'\n"
+ "(like nd://context/system.cpu.user).\n\n"
+ "You can search contexts using glob-like patterns using the 'like' parameter:\n"
+ "nd://contexts?like=*sql*|*db*|*redis*|*mongo*\n"
+ "to find postgresql, mysql, mariadb and mongodb related contexts.\n\n"
+ "For a high-level overview of monitoring categories, use nd://context-categories");
+
+ buffer_json_finalize(mcpc->result);
+
+ if (pattern)
+ simple_pattern_free(pattern);
+
+ return MCP_RC_OK;
+}
+
+// Implementation of the context categories resource read function
+static MCP_RETURN_CODE mcp_resource_read_context_categories(MCP_CLIENT *mcpc, struct json_object *params, uint64_t id) {
+ if (!mcpc || !params || id == 0) return MCP_RC_INTERNAL_ERROR;
+
+ // Extract URI from params to check for query parameters
+ struct json_object *uri_obj = NULL;
+ json_object_object_get_ex(params, "uri", &uri_obj);
+ const char *uri = json_object_get_string(uri_obj);
+
+ SIMPLE_PATTERN *pattern = NULL;
+
+ // Check if we have a query parameter
+ if (uri && strstr(uri, "?like=")) {
+ const char *like_param = strstr(uri, "?like=") + 6; // Skip past "?like="
+
+ // Decode the query parameter
+ const char *decoded_query = mcp_uri_decode(mcpc, like_param);
+
+ // Create a simple pattern
+ if (decoded_query && *decoded_query)
+ pattern = simple_pattern_create(decoded_query, "|", SIMPLE_PATTERN_EXACT, false);
+ }
+
+ mcp_init_success_result(mcpc, id);
+ buffer_json_member_add_object(mcpc->result, "result");
+
+ // Add the filtered context categories
+ rrdcontext_context_registry_json_mcp_categories_array(mcpc->result, pattern);
+
+ // Add instructions
+ buffer_json_member_add_string(mcpc->result, "instructions",
+ "Context categories provide a high-level overview of what's being monitored.\n"
+ "Each category represents a group of related contexts (e.g., 'system.cpu' for CPU metrics).\n\n"
+ "To explore all contexts within a specific category, use the pattern:\n"
+ "nd://contexts?like={category}.*\n\n"
+ "For example, if the cateogy is 'redis' to see all Redis-related contexts:\n"
+ "nd://contexts?like=redis.*\n\n"
+ "You can search categories using glob-like patterns with the 'like' parameter:\n"
+ "nd://context-categories?like=*sql*|*db*|*mongo*\n"
+ "to find postgresql, mysql, mariadb and mongodb related categories.");
+
+ buffer_json_finalize(mcpc->result);
+
+ if (pattern)
+ simple_pattern_free(pattern);
+
+ return MCP_RC_OK;
+}
+
+// Size estimation functions for resources
+static size_t mcp_resource_contexts_size(void) {
+ CLEAN_BUFFER *wb = buffer_create(0, NULL);
+ buffer_json_initialize(wb, "\"", "\"", 0, true, BUFFER_JSON_OPTIONS_MINIFY);
+ rrdcontext_context_registry_json_mcp_array(wb, NULL);
+ buffer_json_finalize(wb);
+ return buffer_strlen(wb);
+}
+
+static size_t mcp_resource_context_categories_size(void) {
+ CLEAN_BUFFER *wb = buffer_create(0, NULL);
+ buffer_json_initialize(wb, "\"", "\"", 0, true, BUFFER_JSON_OPTIONS_MINIFY);
+ rrdcontext_context_registry_json_mcp_categories_array(wb, NULL);
+ buffer_json_finalize(wb);
+ return buffer_strlen(wb);
+}
+
+// Static array of all available resources
+static const MCP_RESOURCE mcp_resources[] = {
+ {
+ .name = "contexts",
+ .uri = "nd://contexts",
+ .description =
+ "Primary discovery mechanism for what's being monitored.\n"
+ "Contexts are the equivalent of charts in Netdata dashboards and they are multi-node and multi-instance.\n"
+ "Usually contexts have the same set of label keys and common or similar dimensions.\n"
+ "Supports searches for contexts using glob-like patterns with the 'like=' parameter.\n",
+ .content_type = CT_APPLICATION_JSON,
+ .audience = RESOURCE_AUDIENCE_BOTH,
+ .priority = 1.0,
+ .read_fn = mcp_resource_read_contexts,
+ .size_fn = mcp_resource_contexts_size
+ },
+ {
+ .name = "context-categories",
+ .uri = "nd://context-categories",
+ .description =
+ "High-level categories of contexts being monitored.\n"
+ "Provides a summarized view of monitoring domains by grouping contexts by their prefix.\n"
+ "Useful for getting a quick overview of what's being monitored without detailed breakdown.\n",
+ .content_type = CT_APPLICATION_JSON,
+ .audience = RESOURCE_AUDIENCE_BOTH,
+ .priority = 0.9,
+ .read_fn = mcp_resource_read_context_categories,
+ .size_fn = mcp_resource_context_categories_size
+ },
+ // Add more resources here as they are implemented
+ // Example:
+ // {
+ // .name = "nodes",
+ // .uri = "nd://nodes",
+ // .description = "Infrastructure discovery...",
+ // ...
+ // },
+};
+
+// Static array of all available resource templates
+static const MCP_RESOURCE_TEMPLATE mcp_resource_templates[] = {
+ {
+ .name = "Contexts Search",
+ .uri_template = "nd://contexts{?like}",
+ .description =
+ "Search for monitoring contexts by matching their names against glob-like patterns.\n"
+ "The 'like' parameter accepts pipe-separated patterns with wildcards\n"
+ "(e.g., '?like=*sql*|*db*|*redis*|*mongo*|*{db-name}*' for common database-related contexts).",
+ .content_type = CT_APPLICATION_JSON,
+ .audience = RESOURCE_AUDIENCE_BOTH,
+ .priority = 1.0
+ },
+ {
+ .name = "Context Categories Search",
+ .uri_template = "nd://context-categories{?like}",
+ .description =
+ "Search for high-level context categories by matching their names against glob-like patterns.\n"
+ "The 'like' parameter accepts pipe-separated patterns with wildcards\n"
+ "(e.g., '?like=*sql*|*db*|*redis*|*mongo*|*{db-name}*' for common database-related categories).",
+ .content_type = CT_APPLICATION_JSON,
+ .audience = RESOURCE_AUDIENCE_BOTH,
+ .priority = 0.9
+ },
+ // Add more templates here as they are implemented
+};
+
+// Implementation of resources/list (transport-agnostic)
+static MCP_RETURN_CODE mcp_resources_method_list(MCP_CLIENT *mcpc, struct json_object *params, uint64_t id) {
+ if (!mcpc || !params || !id) return MCP_RC_INTERNAL_ERROR;
+
+ // Initialize success response
+ mcp_init_success_result(mcpc, id);
+
+ // Create a resources array object
+ buffer_json_member_add_array(mcpc->result, "resources");
+
+ // Iterate through our resources array and add each one
+ for (size_t i = 0; i < _countof(mcp_resources); i++) {
+ const MCP_RESOURCE *resource = &mcp_resources[i];
+
+ buffer_json_add_array_item_object(mcpc->result);
+
+ // Add required fields
+ buffer_json_member_add_string(mcpc->result, "name", resource->name);
+ buffer_json_member_add_string(mcpc->result, "uri", resource->uri);
+
+ // Add optional fields
+ if (resource->description) {
+ buffer_json_member_add_string(mcpc->result, "description", resource->description);
+ }
+
+ // Convert the content_type enum to string
+ const char *mime_type = content_type_id2string(resource->content_type);
+ if (mime_type) {
+ buffer_json_member_add_string(mcpc->result, "mimeType", mime_type);
+ }
+
+ // Add size information if available
+ if (resource->size_fn) {
+ size_t size = resource->size_fn();
+ if (size > 0) {
+ buffer_json_member_add_uint64(mcpc->result, "size", size);
+ }
+ }
+
+ // Add audience annotations if specified
+ if (resource->audience != 0) {
+ buffer_json_member_add_object(mcpc->result, "annotations");
+
+ buffer_json_member_add_array(mcpc->result, "audience");
+
+ if (resource->audience & RESOURCE_AUDIENCE_USER) {
+ buffer_json_add_array_item_string(mcpc->result, "user");
+ }
+
+ if (resource->audience & RESOURCE_AUDIENCE_ASSISTANT) {
+ buffer_json_add_array_item_string(mcpc->result, "assistant");
+ }
+
+ buffer_json_array_close(mcpc->result); // Close audience array
+
+ // Add priority if it's non-zero
+ if (resource->priority > 0) {
+ buffer_json_member_add_double(mcpc->result, "priority", resource->priority);
+ }
+
+ buffer_json_object_close(mcpc->result); // Close annotations object
+ }
+
+ buffer_json_object_close(mcpc->result); // Close resource object
+ }
+
+ buffer_json_array_close(mcpc->result); // Close resources array
+ buffer_json_object_close(mcpc->result); // Close result object
+
+ // For now, no need for pagination since we have a small number of resources
+ // If we add many resources later, implement cursor-based pagination here
+
+ return MCP_RC_OK;
+}
+
+// Implementation of resources/read (transport-agnostic)
+static MCP_RETURN_CODE mcp_resources_method_read(MCP_CLIENT *mcpc, struct json_object *params, uint64_t id) {
+ if (!mcpc || id == 0 || !params) return MCP_RC_INTERNAL_ERROR;
+
+ // Extract URI from params
+ struct json_object *uri_obj = NULL;
+ if (!json_object_object_get_ex(params, "uri", &uri_obj)) {
+ buffer_strcat(mcpc->error, "Missing 'uri' parameter");
+ return MCP_RC_INVALID_PARAMS;
+ }
+
+ const char *uri = json_object_get_string(uri_obj);
+ if (!uri) {
+ buffer_strcat(mcpc->error, "Invalid 'uri' parameter");
+ return MCP_RC_INVALID_PARAMS;
+ }
+
+ netdata_log_debug(D_MCP, "MCP resources/read for URI: %s", uri);
+
+ // Find the matching resource in our array
+ for (size_t i = 0; i < _countof(mcp_resources); i++) {
+ const MCP_RESOURCE *resource = &mcp_resources[i];
+
+ // Get the URI without query parameters for matching
+ const char *query_start = strchr(uri, '?');
+ size_t base_uri_length = query_start ? (size_t)(query_start - uri) : strlen(uri);
+
+ // Check if the base URI matches the resource URI
+ if (strlen(resource->uri) == base_uri_length &&
+ strncmp(resource->uri, uri, base_uri_length) == 0) {
+
+ // Found matching resource, check if read function exists
+ if (resource->read_fn) {
+ // Call the resource-specific read function
+ return resource->read_fn(mcpc, params, id);
+ }
+ else {
+ // No read function implemented
+ buffer_strcat(mcpc->error, "Resource reading not implemented");
+ return MCP_RC_NOT_IMPLEMENTED;
+ }
+ }
+ }
+
+ // No matching resource found
+ buffer_sprintf(mcpc->error, "Unknown resource URI: %s", uri);
+ return MCP_RC_NOT_FOUND;
+}
+
+// Implementation of resources/templates/list (transport-agnostic)
+static MCP_RETURN_CODE mcp_resources_method_templates_list(MCP_CLIENT *mcpc, struct json_object *params, uint64_t id) {
+ if (!mcpc || !params || !id) return MCP_RC_INTERNAL_ERROR;
+
+ // Initialize success response
+ mcp_init_success_result(mcpc, id);
+
+ // Create a resourceTemplates array object
+ buffer_json_member_add_object(mcpc->result, "result");
+ buffer_json_member_add_array(mcpc->result, "resourceTemplates");
+
+ // Iterate through our templates array and add each one
+ for (size_t i = 0; i < _countof(mcp_resource_templates); i++) {
+ const MCP_RESOURCE_TEMPLATE *template = &mcp_resource_templates[i];
+
+ buffer_json_add_array_item_object(mcpc->result);
+
+ // Add required fields
+ buffer_json_member_add_string(mcpc->result, "name", template->name);
+ buffer_json_member_add_string(mcpc->result, "uriTemplate", template->uri_template);
+
+ // Add optional fields
+ if (template->description) {
+ buffer_json_member_add_string(mcpc->result, "description", template->description);
+ }
+
+ // Convert the content_type enum to string
+ const char *mime_type = content_type_id2string(template->content_type);
+ if (mime_type) {
+ buffer_json_member_add_string(mcpc->result, "mimeType", mime_type);
+ }
+
+ // Add audience annotations if specified
+ if (template->audience != 0) {
+ buffer_json_member_add_object(mcpc->result, "annotations");
+
+ buffer_json_member_add_array(mcpc->result, "audience");
+
+ if (template->audience & RESOURCE_AUDIENCE_USER) {
+ buffer_json_add_array_item_string(mcpc->result, "user");
+ }
+
+ if (template->audience & RESOURCE_AUDIENCE_ASSISTANT) {
+ buffer_json_add_array_item_string(mcpc->result, "assistant");
+ }
+
+ buffer_json_array_close(mcpc->result); // Close audience array
+
+ // Add priority if it's non-zero
+ if (template->priority > 0) {
+ buffer_json_member_add_double(mcpc->result, "priority", template->priority);
+ }
+
+ buffer_json_object_close(mcpc->result); // Close annotations object
+ }
+
+ buffer_json_object_close(mcpc->result); // Close template object
+ }
+
+ buffer_json_array_close(mcpc->result); // Close resourceTemplates array
+ buffer_json_object_close(mcpc->result); // Close result object
+ buffer_json_finalize(mcpc->result);
+
+ // For now, no need for pagination since we have a small number of templates
+ // If we add many templates later, implement cursor-based pagination here
+
+ return MCP_RC_OK;
+}
+
+// Implementation of resources/subscribe (transport-agnostic)
+static MCP_RETURN_CODE mcp_resources_method_subscribe(MCP_CLIENT *mcpc, struct json_object *params, uint64_t id) {
+ if (!mcpc || !id || !params) return MCP_RC_INTERNAL_ERROR;
+ return MCP_RC_NOT_IMPLEMENTED;
+}
+
+// Implementation of resources/unsubscribe (transport-agnostic)
+static MCP_RETURN_CODE mcp_resources_method_unsubscribe(MCP_CLIENT *mcpc, struct json_object *params, uint64_t id) {
+ if (!mcpc || id == 0 || !params) return MCP_RC_INTERNAL_ERROR;
+ return MCP_RC_NOT_IMPLEMENTED;
+}
+
+// Resources namespace method dispatcher (transport-agnostic)
+MCP_RETURN_CODE mcp_resources_route(MCP_CLIENT *mcpc, const char *method, struct json_object *params, uint64_t id) {
+ if (!mcpc || !method) return MCP_RC_INTERNAL_ERROR;
+
+ netdata_log_debug(D_MCP, "MCP resources method: %s", method);
+
+ // Clear previous buffers
+ buffer_flush(mcpc->result);
+ buffer_flush(mcpc->error);
+
+ MCP_RETURN_CODE rc;
+
+ if (strcmp(method, "list") == 0) {
+ rc = mcp_resources_method_list(mcpc, params, id);
+ }
+ else if (strcmp(method, "read") == 0) {
+ rc = mcp_resources_method_read(mcpc, params, id);
+ }
+ else if (strcmp(method, "templates/list") == 0) {
+ rc = mcp_resources_method_templates_list(mcpc, params, id);
+ }
+ else if (strcmp(method, "subscribe") == 0) {
+ rc = mcp_resources_method_subscribe(mcpc, params, id);
+ }
+ else if (strcmp(method, "unsubscribe") == 0) {
+ rc = mcp_resources_method_unsubscribe(mcpc, params, id);
+ }
+ else {
+ // Method not found in resources namespace
+ buffer_sprintf(mcpc->error, "Method 'resources/%s' not implemented yet", method);
+ rc = MCP_RC_NOT_IMPLEMENTED;
+ }
+
+ return rc;
+}
diff --git a/src/web/mcp/mcp-resources.h b/src/web/mcp/mcp-resources.h
new file mode 100644
index 00000000000000..97570ea37b351e
--- /dev/null
+++ b/src/web/mcp/mcp-resources.h
@@ -0,0 +1,11 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#ifndef NETDATA_MCP_RESOURCES_H
+#define NETDATA_MCP_RESOURCES_H
+
+#include "mcp.h"
+
+// Resources namespace method dispatcher (transport-agnostic)
+MCP_RETURN_CODE mcp_resources_route(MCP_CLIENT *mcpc, const char *method, struct json_object *params, uint64_t id);
+
+#endif // NETDATA_MCP_RESOURCES_H
\ No newline at end of file
diff --git a/src/web/mcp/mcp-system.c b/src/web/mcp/mcp-system.c
new file mode 100644
index 00000000000000..3918e82b0330a2
--- /dev/null
+++ b/src/web/mcp/mcp-system.c
@@ -0,0 +1,114 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+/**
+ * MCP System Namespace
+ *
+ * The MCP System namespace provides methods for querying and managing the server system.
+ * These methods provide information about the server's state, health, and performance,
+ * and allow for basic administrative operations.
+ *
+ * Key features of the system namespace:
+ *
+ * 1. System Information:
+ * - Get server health status (system/health)
+ * - Get detailed version information (system/version)
+ * - Get server performance metrics (system/metrics)
+ * - Get current system status (system/status)
+ *
+ * 2. System Management:
+ * - Request server restart (system/restart)
+ *
+ * System methods typically require elevated permissions, as they can affect
+ * the operation of the server and may provide sensitive information.
+ *
+ * In the Netdata context, system methods provide:
+ * - Netdata Agent version and build information
+ * - Server metrics (CPU, memory usage, uptime, etc.)
+ * - Runtime configuration status
+ * - Agent health and operational status
+ * - Administrative operations for authorized users
+ *
+ * These methods are particularly useful for:
+ * - System administrators monitoring Netdata servers
+ * - Tools that need to check for version compatibility
+ * - Health monitoring systems tracking Netdata itself
+ * - Administrative interfaces
+ */
+
+#include "mcp-system.h"
+#include "mcp-initialize.h"
+#include "config.h" // Include config.h for NETDATA_VERSION
+
+// Stub implementations for system methods (transport-agnostic)
+static MCP_RETURN_CODE mcp_system_method_health(MCP_CLIENT *mcpc, struct json_object *params __maybe_unused, uint64_t id) {
+ buffer_sprintf(mcpc->error, "Method 'system/health' not implemented yet");
+ return MCP_RC_NOT_IMPLEMENTED;
+}
+
+static MCP_RETURN_CODE mcp_system_method_version(MCP_CLIENT *mcpc, struct json_object *params __maybe_unused, uint64_t id) {
+ if (!mcpc || id == 0) return MCP_RC_ERROR;
+
+ // Initialize success response
+ mcp_init_success_result(mcpc, id);
+
+ // Add version information
+ buffer_json_member_add_string(mcpc->result, "name", "Netdata");
+ buffer_json_member_add_string(mcpc->result, "version", NETDATA_VERSION);
+ buffer_json_member_add_string(mcpc->result, "mcpVersion", MCP_PROTOCOL_VERSION_2str(MCP_PROTOCOL_VERSION_LATEST));
+
+ // Close the result object
+ buffer_json_finalize(mcpc->result);
+
+ return MCP_RC_OK;
+}
+
+static MCP_RETURN_CODE mcp_system_method_metrics(MCP_CLIENT *mcpc, struct json_object *params __maybe_unused, uint64_t id) {
+ buffer_sprintf(mcpc->error, "Method 'system/metrics' not implemented yet");
+ return MCP_RC_NOT_IMPLEMENTED;
+}
+
+static MCP_RETURN_CODE mcp_system_method_restart(MCP_CLIENT *mcpc, struct json_object *params __maybe_unused, uint64_t id) {
+ buffer_sprintf(mcpc->error, "Method 'system/restart' not implemented yet");
+ return MCP_RC_NOT_IMPLEMENTED;
+}
+
+static MCP_RETURN_CODE mcp_system_method_status(MCP_CLIENT *mcpc, struct json_object *params __maybe_unused, uint64_t id) {
+ buffer_sprintf(mcpc->error, "Method 'system/status' not implemented yet");
+ return MCP_RC_NOT_IMPLEMENTED;
+}
+
+// System namespace method dispatcher (transport-agnostic)
+MCP_RETURN_CODE mcp_system_route(MCP_CLIENT *mcpc, const char *method, struct json_object *params, uint64_t id) {
+ if (!mcpc || !method) return MCP_RC_INTERNAL_ERROR;
+
+ netdata_log_debug(D_MCP, "MCP system method: %s", method);
+
+ // Flush previous buffers
+ buffer_flush(mcpc->result);
+ buffer_flush(mcpc->error);
+
+ MCP_RETURN_CODE rc;
+
+ if (strcmp(method, "health") == 0) {
+ rc = mcp_system_method_health(mcpc, params, id);
+ }
+ else if (strcmp(method, "version") == 0) {
+ rc = mcp_system_method_version(mcpc, params, id);
+ }
+ else if (strcmp(method, "metrics") == 0) {
+ rc = mcp_system_method_metrics(mcpc, params, id);
+ }
+ else if (strcmp(method, "restart") == 0) {
+ rc = mcp_system_method_restart(mcpc, params, id);
+ }
+ else if (strcmp(method, "status") == 0) {
+ rc = mcp_system_method_status(mcpc, params, id);
+ }
+ else {
+ // Method not found in system namespace
+ buffer_sprintf(mcpc->error, "Method 'system/%s' not implemented yet", method);
+ rc = MCP_RC_NOT_IMPLEMENTED;
+ }
+
+ return rc;
+}
diff --git a/src/web/mcp/mcp-system.h b/src/web/mcp/mcp-system.h
new file mode 100644
index 00000000000000..1477407f6a72de
--- /dev/null
+++ b/src/web/mcp/mcp-system.h
@@ -0,0 +1,11 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#ifndef NETDATA_MCP_SYSTEM_H
+#define NETDATA_MCP_SYSTEM_H
+
+#include "mcp.h"
+
+// System namespace method dispatcher (transport-agnostic)
+MCP_RETURN_CODE mcp_system_route(MCP_CLIENT *mcpc, const char *method, struct json_object *params, uint64_t id);
+
+#endif // NETDATA_MCP_SYSTEM_H
\ No newline at end of file
diff --git a/src/web/mcp/mcp-tools.c b/src/web/mcp/mcp-tools.c
new file mode 100644
index 00000000000000..ae81cf91234144
--- /dev/null
+++ b/src/web/mcp/mcp-tools.c
@@ -0,0 +1,218 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+/**
+ * MCP Tools Namespace
+ *
+ * The MCP Tools namespace provides methods for discovering and executing tools offered by the server.
+ * In the MCP protocol, tools are discrete operations that clients can invoke to perform specific actions.
+ *
+ * Tools are model-controlled actions - meaning the AI decides when and how to use them based on context.
+ * Each tool has a defined input schema that specifies required and optional parameters.
+ *
+ * Key features of the tools namespace:
+ *
+ * 1. Tool Discovery:
+ * - Clients can list available tools (tools/list)
+ * - Get detailed descriptions of specific tools (tools/describe)
+ * - Understand what parameters a tool requires (through JSON Schema)
+ *
+ * 2. Tool Execution:
+ * - Execute tools with specific parameters (tools/execute)
+ * - Validate parameters without execution (tools/validate)
+ * - Asynchronous execution is supported for long-running tools
+ *
+ * 3. Execution Management:
+ * - Check execution status (tools/status)
+ * - Cancel running executions (tools/cancel)
+ *
+ * In the Netdata context, tools provide access to operations like:
+ * - Exploring metrics and their relationships
+ * - Analyzing time-series data patterns
+ * - Finding correlations between metrics
+ * - Root cause analysis for anomalies
+ * - Summarizing system health
+ *
+ * Each tool execution is assigned a unique ID, allowing clients to track and manage executions.
+ */
+
+#include "mcp-tools.h"
+#include "mcp-initialize.h"
+
+// Return a list of available tools (transport-agnostic)
+static MCP_RETURN_CODE mcp_tools_method_list(MCP_CLIENT *mcpc, struct json_object *params __maybe_unused, uint64_t id) {
+ if (!mcpc || id == 0) return MCP_RC_ERROR;
+
+ // Initialize success response
+ mcp_init_success_result(mcpc, id);
+
+ // Create tools array
+ buffer_json_member_add_array(mcpc->result, "tools");
+
+ // Add explore_metrics tool
+ buffer_json_add_array_item_object(mcpc->result);
+ buffer_json_member_add_string(mcpc->result, "name", "explore_metrics");
+ buffer_json_member_add_string(mcpc->result, "description",
+ "Explore Netdata's time-series metrics with support for high-resolution data");
+
+ // Add input schema for metrics tool
+ buffer_json_member_add_object(mcpc->result, "inputSchema");
+ buffer_json_member_add_string(mcpc->result, "type", "object");
+ buffer_json_member_add_string(mcpc->result, "title", "MetricsQuery");
+
+ // Properties
+ buffer_json_member_add_object(mcpc->result, "properties");
+
+ // Context property
+ buffer_json_member_add_object(mcpc->result, "context");
+ buffer_json_member_add_string(mcpc->result, "type", "string");
+ buffer_json_member_add_string(mcpc->result, "title", "Context");
+ buffer_json_object_close(mcpc->result); // Close context
+
+ // After property
+ buffer_json_member_add_object(mcpc->result, "after");
+ buffer_json_member_add_string(mcpc->result, "type", "integer");
+ buffer_json_member_add_string(mcpc->result, "title", "After");
+ buffer_json_object_close(mcpc->result); // Close after
+
+ // Before property
+ buffer_json_member_add_object(mcpc->result, "before");
+ buffer_json_member_add_string(mcpc->result, "type", "integer");
+ buffer_json_member_add_string(mcpc->result, "title", "Before");
+ buffer_json_object_close(mcpc->result); // Close before
+
+ // Points property
+ buffer_json_member_add_object(mcpc->result, "points");
+ buffer_json_member_add_string(mcpc->result, "type", "integer");
+ buffer_json_member_add_string(mcpc->result, "title", "Points");
+ buffer_json_object_close(mcpc->result); // Close points
+
+ // Group property
+ buffer_json_member_add_object(mcpc->result, "group");
+ buffer_json_member_add_string(mcpc->result, "type", "string");
+ buffer_json_member_add_string(mcpc->result, "title", "Group");
+ buffer_json_object_close(mcpc->result); // Close group
+
+ buffer_json_object_close(mcpc->result); // Close properties
+
+ // Required properties
+ buffer_json_member_add_array(mcpc->result, "required");
+ buffer_json_add_array_item_string(mcpc->result, "context");
+ buffer_json_array_close(mcpc->result); // Close required
+
+ buffer_json_object_close(mcpc->result); // Close inputSchema
+ buffer_json_object_close(mcpc->result); // Close explore_metrics tool
+
+ // Add explore_nodes tool
+ buffer_json_add_array_item_object(mcpc->result);
+ buffer_json_member_add_string(mcpc->result, "name", "explore_nodes");
+ buffer_json_member_add_string(mcpc->result, "description",
+ "Discover and explore all monitored nodes in your infrastructure");
+
+ // Add input schema for nodes tool
+ buffer_json_member_add_object(mcpc->result, "inputSchema");
+ buffer_json_member_add_string(mcpc->result, "type", "object");
+ buffer_json_member_add_string(mcpc->result, "title", "NodesQuery");
+
+ // Properties
+ buffer_json_member_add_object(mcpc->result, "properties");
+
+ // Filter property
+ buffer_json_member_add_object(mcpc->result, "filter");
+ buffer_json_member_add_string(mcpc->result, "type", "string");
+ buffer_json_member_add_string(mcpc->result, "title", "Filter");
+ buffer_json_object_close(mcpc->result); // Close filter
+
+ buffer_json_object_close(mcpc->result); // Close properties
+ buffer_json_object_close(mcpc->result); // Close inputSchema
+ buffer_json_object_close(mcpc->result); // Close explore_nodes tool
+
+ buffer_json_array_close(mcpc->result); // Close tools array
+ buffer_json_finalize(mcpc->result); // Finalize the JSON
+
+ return MCP_RC_OK;
+}
+
+// Stub implementations for other tools methods (transport-agnostic)
+static MCP_RETURN_CODE mcp_tools_method_execute(MCP_CLIENT *mcpc, struct json_object *params __maybe_unused, uint64_t id __maybe_unused) {
+ buffer_sprintf(mcpc->error, "Method 'tools/execute' not implemented yet");
+ return MCP_RC_NOT_IMPLEMENTED;
+}
+
+static MCP_RETURN_CODE mcp_tools_method_cancel(MCP_CLIENT *mcpc, struct json_object *params __maybe_unused, uint64_t id __maybe_unused) {
+ buffer_sprintf(mcpc->error, "Method 'tools/cancel' not implemented yet");
+ return MCP_RC_NOT_IMPLEMENTED;
+}
+
+static MCP_RETURN_CODE mcp_tools_method_status(MCP_CLIENT *mcpc, struct json_object *params __maybe_unused, uint64_t id __maybe_unused) {
+ buffer_sprintf(mcpc->error, "Method 'tools/status' not implemented yet");
+ return MCP_RC_NOT_IMPLEMENTED;
+}
+
+static MCP_RETURN_CODE mcp_tools_method_validate(MCP_CLIENT *mcpc, struct json_object *params __maybe_unused, uint64_t id __maybe_unused) {
+ buffer_sprintf(mcpc->error, "Method 'tools/validate' not implemented yet");
+ return MCP_RC_NOT_IMPLEMENTED;
+}
+
+static MCP_RETURN_CODE mcp_tools_method_describe(MCP_CLIENT *mcpc, struct json_object *params __maybe_unused, uint64_t id __maybe_unused) {
+ buffer_sprintf(mcpc->error, "Method 'tools/describe' not implemented yet");
+ return MCP_RC_NOT_IMPLEMENTED;
+}
+
+static MCP_RETURN_CODE mcp_tools_method_getCapabilities(MCP_CLIENT *mcpc, struct json_object *params __maybe_unused, uint64_t id) {
+ if (!mcpc || id == 0) return MCP_RC_ERROR;
+
+ // Initialize success response
+ mcp_init_success_result(mcpc, id);
+
+ // Add capabilities as result object properties
+ buffer_json_member_add_boolean(mcpc->result, "listChanged", false);
+ buffer_json_member_add_boolean(mcpc->result, "asyncExecution", true);
+ buffer_json_member_add_boolean(mcpc->result, "batchExecution", true);
+
+ // Close the result object
+ buffer_json_finalize(mcpc->result);
+
+ return MCP_RC_OK;
+}
+
+// Tools namespace method dispatcher (transport-agnostic)
+MCP_RETURN_CODE mcp_tools_route(MCP_CLIENT *mcpc, const char *method, struct json_object *params, uint64_t id) {
+ if (!mcpc || !method) return MCP_RC_INTERNAL_ERROR;
+
+ netdata_log_debug(D_MCP, "MCP tools method: %s", method);
+
+ // Flush previous buffers
+ buffer_flush(mcpc->result);
+ buffer_flush(mcpc->error);
+
+ MCP_RETURN_CODE rc;
+
+ if (strcmp(method, "list") == 0) {
+ rc = mcp_tools_method_list(mcpc, params, id);
+ }
+ else if (strcmp(method, "execute") == 0) {
+ rc = mcp_tools_method_execute(mcpc, params, id);
+ }
+ else if (strcmp(method, "cancel") == 0) {
+ rc = mcp_tools_method_cancel(mcpc, params, id);
+ }
+ else if (strcmp(method, "status") == 0) {
+ rc = mcp_tools_method_status(mcpc, params, id);
+ }
+ else if (strcmp(method, "validate") == 0) {
+ rc = mcp_tools_method_validate(mcpc, params, id);
+ }
+ else if (strcmp(method, "describe") == 0) {
+ rc = mcp_tools_method_describe(mcpc, params, id);
+ }
+ else if (strcmp(method, "getCapabilities") == 0) {
+ rc = mcp_tools_method_getCapabilities(mcpc, params, id);
+ }
+ else {
+ // Method not found in tools namespace
+ buffer_sprintf(mcpc->error, "Method 'tools/%s' not implemented yet", method);
+ rc = MCP_RC_NOT_IMPLEMENTED;
+ }
+
+ return rc;
+}
diff --git a/src/web/mcp/mcp-tools.h b/src/web/mcp/mcp-tools.h
new file mode 100644
index 00000000000000..7071467f166195
--- /dev/null
+++ b/src/web/mcp/mcp-tools.h
@@ -0,0 +1,11 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#ifndef NETDATA_MCP_TOOLS_H
+#define NETDATA_MCP_TOOLS_H
+
+#include "mcp.h"
+
+// Tools namespace method dispatcher (transport-agnostic)
+MCP_RETURN_CODE mcp_tools_route(MCP_CLIENT *mcpc, const char *method, struct json_object *params, uint64_t id);
+
+#endif // NETDATA_MCP_TOOLS_H
\ No newline at end of file
diff --git a/src/web/mcp/mcp-websocket-test.html b/src/web/mcp/mcp-websocket-test.html
new file mode 100644
index 00000000000000..02fb2954824c85
--- /dev/null
+++ b/src/web/mcp/mcp-websocket-test.html
@@ -0,0 +1,1272 @@
+
+
+
+
+
+ Codestin Search App
+
+
+
+
Netdata MCP WebSocket Test
+
+
+
+
+
+
+ Disconnected
+
+
+
+
+
+
Flows
+
+
Initialize
+
Tools
+
Resources
+
Prompts
+
Notifications
+
Context
+
System
+
+
+
+
+
+
Methods
+
+
+
+
+
+
+
+
+ Request
+ ⓘ
+
+ JSON-RPC 2.0 request format requires "jsonrpc": "2.0", "method" and an optional "id".
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Response
+
+
No response yet.
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/src/web/mcp/mcp.c b/src/web/mcp/mcp.c
new file mode 100644
index 00000000000000..19a927fbf7be0c
--- /dev/null
+++ b/src/web/mcp/mcp.c
@@ -0,0 +1,436 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#include "mcp.h"
+#include "mcp-initialize.h"
+#include "mcp-tools.h"
+#include "mcp-resources.h"
+#include "mcp-prompts.h"
+#include "mcp-notifications.h"
+#include "mcp-context.h"
+#include "mcp-system.h"
+#include "adapters/mcp-websocket.h"
+
+// Define the enum to string mapping for protocol versions
+ENUM_STR_MAP_DEFINE(MCP_PROTOCOL_VERSION) = {
+ { .id = MCP_PROTOCOL_VERSION_2024_11_05, .name = "2024-11-05" },
+ { .id = MCP_PROTOCOL_VERSION_2025_03_26, .name = "2025-03-26" },
+ { .id = MCP_PROTOCOL_VERSION_UNKNOWN, .name = "unknown" },
+
+ // terminator
+ { .name = NULL, .id = 0 }
+};
+ENUM_STR_DEFINE_FUNCTIONS(MCP_PROTOCOL_VERSION, MCP_PROTOCOL_VERSION_UNKNOWN, "unknown");
+
+// Define the enum to string mapping for return codes
+ENUM_STR_MAP_DEFINE(MCP_RETURN_CODE) = {
+ { .id = MCP_RC_OK, .name = "OK" },
+ { .id = MCP_RC_ERROR, .name = "ERROR" },
+ { .id = MCP_RC_INVALID_PARAMS, .name = "INVALID_PARAMS" },
+ { .id = MCP_RC_NOT_FOUND, .name = "NOT_FOUND" },
+ { .id = MCP_RC_INTERNAL_ERROR, .name = "INTERNAL_ERROR" },
+ { .id = MCP_RC_NOT_IMPLEMENTED, .name = "NOT_IMPLEMENTED" },
+
+ // terminator
+ { .name = NULL, .id = 0 }
+};
+ENUM_STR_DEFINE_FUNCTIONS(MCP_RETURN_CODE, MCP_RC_ERROR, "ERROR");
+
+// Decode a URI component using mcpc's pre-allocated buffer
+// Returns a pointer to the decoded string which is valid until the next call
+const char *mcp_uri_decode(MCP_CLIENT *mcpc, const char *src) {
+ if(!mcpc || !src || !*src)
+ return src;
+
+ // Prepare the buffer
+ buffer_flush(mcpc->uri);
+ buffer_need_bytes(mcpc->uri, strlen(src) + 1);
+
+ // Perform URL decoding
+ char *d = url_decode_r(mcpc->uri->buffer, src, mcpc->uri->size);
+ if (!d || !*d)
+ return src;
+
+ // Ensure the buffer's length is updated
+ mcpc->uri->len = strlen(d);
+
+ return buffer_tostring(mcpc->uri);
+}
+
+// Create a response context for a transport session
+MCP_CLIENT *mcp_create_client(MCP_TRANSPORT transport, void *transport_ctx) {
+ MCP_CLIENT *ctx = callocz(1, sizeof(MCP_CLIENT));
+
+ ctx->transport = transport;
+ ctx->protocol_version = MCP_PROTOCOL_VERSION_UNKNOWN; // Will be set during initialization
+
+ // Set capabilities based on transport type
+ switch (transport) {
+ case MCP_TRANSPORT_WEBSOCKET:
+ ctx->websocket = (struct websocket_server_client *)transport_ctx;
+ ctx->capabilities = MCP_CAPABILITY_ASYNC_COMMUNICATION |
+ MCP_CAPABILITY_SUBSCRIPTIONS |
+ MCP_CAPABILITY_NOTIFICATIONS;
+ break;
+
+ case MCP_TRANSPORT_HTTP:
+ ctx->http = (struct web_client *)transport_ctx;
+ ctx->capabilities = MCP_CAPABILITY_NONE; // HTTP has no special capabilities
+ break;
+
+ default:
+ ctx->generic = transport_ctx;
+ ctx->capabilities = MCP_CAPABILITY_NONE;
+ break;
+ }
+
+ // Default client info (will be updated later from actual client)
+ ctx->client_name = string_strdupz("unknown");
+ ctx->client_version = string_strdupz("0.0.0");
+
+ // Initialize response buffers
+ ctx->result = buffer_create(4096, NULL);
+ ctx->error = buffer_create(1024, NULL);
+
+ // Initialize utility buffers
+ ctx->uri = buffer_create(1024, NULL);
+
+ return ctx;
+}
+
+// Free a response context
+void mcp_free_client(MCP_CLIENT *mcpc) {
+ if (mcpc) {
+ string_freez(mcpc->client_name);
+ string_freez(mcpc->client_version);
+
+ // Free response buffers
+ buffer_free(mcpc->result);
+ buffer_free(mcpc->error);
+
+ // Free utility buffers
+ buffer_free(mcpc->uri);
+
+ freez(mcpc);
+ }
+}
+
+// Map internal MCP_RETURN_CODE to JSON-RPC error code
+static int mcp_map_return_code_to_jsonrpc_error(MCP_RETURN_CODE rc) {
+ switch (rc) {
+ case MCP_RC_OK:
+ return 0; // Not an error
+ case MCP_RC_INVALID_PARAMS:
+ return -32602; // JSON-RPC Invalid params
+ case MCP_RC_NOT_FOUND:
+ return -32601; // JSON-RPC Method not found
+ case MCP_RC_INTERNAL_ERROR:
+ return -32603; // JSON-RPC Internal error
+ case MCP_RC_NOT_IMPLEMENTED:
+ return -32601; // Use method not found for not implemented
+ case MCP_RC_ERROR:
+ default:
+ return -32000; // JSON-RPC Server error
+ }
+}
+
+void mcp_init_success_result(MCP_CLIENT *mcpc, uint64_t id) {
+ buffer_flush(mcpc->result);;
+ buffer_json_initialize(mcpc->result, "\"", "\"", 0, true, BUFFER_JSON_OPTIONS_MINIFY);
+ buffer_json_member_add_string(mcpc->result, "jsonrpc", "2.0");
+
+ if(id)
+ buffer_json_member_add_uint64(mcpc->result, "id", id);
+
+ buffer_flush(mcpc->error);
+}
+
+void mcp_jsonrpc_error(BUFFER *result, const char *error, uint64_t id, int jsonrpc_code) {
+ buffer_flush(result);
+ buffer_json_initialize(result, "\"", "\"", 0, true, BUFFER_JSON_OPTIONS_MINIFY);
+ buffer_json_member_add_string(result, "jsonrpc", "2.0");
+
+ if (id)
+ buffer_json_member_add_uint64(result, "id", id);
+
+ buffer_json_member_add_int64(result, "error", jsonrpc_code);
+
+ if(error && *error)
+ buffer_json_member_add_string(result, "message", error);
+
+ buffer_json_finalize(result);
+}
+
+MCP_RETURN_CODE mcp_error_result(MCP_CLIENT *mcpc, uint64_t id, MCP_RETURN_CODE rc) {
+ mcp_jsonrpc_error(mcpc->result,
+ buffer_strlen(mcpc->error) ? buffer_tostring(mcpc->error) : MCP_RETURN_CODE_2str(rc),
+ id, mcp_map_return_code_to_jsonrpc_error(rc));
+ return rc;
+}
+
+// Send the content of a buffer using the appropriate transport
+int mcp_send_response_buffer(MCP_CLIENT *mcpc) {
+ if (!mcpc || !mcpc->result || !buffer_strlen(mcpc->result)) return -1;
+
+ switch (mcpc->transport) {
+ case MCP_TRANSPORT_WEBSOCKET:
+ return mcp_websocket_send_buffer(mcpc->websocket, mcpc->result);
+
+ case MCP_TRANSPORT_HTTP:
+ netdata_log_error("MCP: HTTP adapter not implemented yet");
+ return -1;
+
+ default:
+ netdata_log_error("MCP: Unknown transport type %u", mcpc->transport);
+ return -1;
+ }
+}
+
+// Parse and extract client info from initialize request params
+static void mcp_extract_client_info(MCP_CLIENT *ctx, struct json_object *params) {
+ if (!ctx || !params) return;
+
+ struct json_object *client_info_obj = NULL;
+ struct json_object *client_name_obj = NULL;
+ struct json_object *client_version_obj = NULL;
+
+ if (json_object_object_get_ex(params, "clientInfo", &client_info_obj)) {
+ if (json_object_object_get_ex(client_info_obj, "name", &client_name_obj)) {
+ string_freez(ctx->client_name);
+ ctx->client_name = string_strdupz(json_object_get_string(client_name_obj));
+ }
+ if (json_object_object_get_ex(client_info_obj, "version", &client_version_obj)) {
+ string_freez(ctx->client_version);
+ ctx->client_version = string_strdupz(json_object_get_string(client_version_obj));
+ }
+ }
+}
+
+// Handle a JSON-RPC method call - the result is always filled with a jsonrpc response
+static MCP_RETURN_CODE mcp_single_request(MCP_CLIENT *mcpc, struct json_object *request) {
+ if (!mcpc || !request) {
+ return MCP_RC_ERROR;
+ }
+
+ // Flush buffers before processing the request
+ buffer_flush(mcpc->result);
+ buffer_flush(mcpc->error);
+
+ // Extract JSON-RPC fields
+ struct json_object *method_obj = NULL;
+ struct json_object *params_obj = NULL;
+ struct json_object *id_obj = NULL;
+ struct json_object *jsonrpc_obj = NULL;
+
+ // Validate jsonrpc version
+ if (!json_object_object_get_ex(request, "jsonrpc", &jsonrpc_obj) ||
+ strcmp(json_object_get_string(jsonrpc_obj), "2.0") != 0) {
+ buffer_strcat(mcpc->error, "Invalid or missing jsonrpc version");
+ mcp_error_result(mcpc, 0, MCP_RC_INVALID_PARAMS);
+ return MCP_RC_INVALID_PARAMS;
+ }
+
+ // Extract method
+ if (!json_object_object_get_ex(request, "method", &method_obj)) {
+ buffer_strcat(mcpc->error, "Missing method field");
+ mcp_error_result(mcpc, 0, MCP_RC_INVALID_PARAMS);
+ return MCP_RC_INVALID_PARAMS;
+ }
+
+ const char *method = json_object_get_string(method_obj);
+
+ // Extract params (optional)
+ if (json_object_object_get_ex(request, "params", ¶ms_obj)) {
+ if (json_object_get_type(params_obj) != json_type_object) {
+ buffer_strcat(mcpc->error, "params must be an object");
+ mcp_error_result(mcpc, 0, MCP_RC_INVALID_PARAMS);
+ return MCP_RC_INVALID_PARAMS;
+ }
+ } else {
+ // Create an empty params object if none provided
+ params_obj = json_object_new_object();
+ }
+
+ // Extract ID (optional, for notifications)
+ uint64_t id = 0;
+ bool has_id = json_object_object_get_ex(request, "id", &id_obj);
+
+ if (has_id) {
+ if (json_object_get_type(id_obj) == json_type_int) {
+ id = json_object_get_int64(id_obj);
+ }
+ else if (json_object_get_type(id_obj) == json_type_string) {
+ const char *id_str = json_object_get_string(id_obj);
+ char *endptr;
+ id = strtoull(id_str, &endptr, 10);
+ if (*endptr != '\0') {
+ // If the string is not a number, use a hash of the string as the ID
+ id = 0;
+ while (*id_str) {
+ id = id * 31 + (*id_str++);
+ }
+ }
+ }
+ }
+
+ netdata_log_debug(D_WEB_CLIENT, "MCP: Handling method call: %s (id: %"PRIu64")", method, id);
+
+ // Handle method calls based on namespace
+ MCP_RETURN_CODE rc;
+
+ if(!method || !*method) {
+ buffer_strcat(mcpc->error, "Empty method name");
+ rc = MCP_RC_INVALID_PARAMS;
+ }
+ else if (strncmp(method, "tools/", 6) == 0) {
+ // Tools namespace
+ rc = mcp_tools_route(mcpc, method + 6, params_obj, id);
+ }
+ else if (strncmp(method, "resources/", 10) == 0) {
+ // Resources namespace
+ rc = mcp_resources_route(mcpc, method + 10, params_obj, id);
+ }
+ else if (strncmp(method, "notifications/", 14) == 0) {
+ // Notifications namespace
+ rc = mcp_notifications_route(mcpc, method + 14, params_obj, id);
+ }
+ else if (strncmp(method, "prompts/", 8) == 0) {
+ // Prompts namespace
+ rc = mcp_prompts_route(mcpc, method + 8, params_obj, id);
+ }
+ else if (strncmp(method, "context/", 8) == 0) {
+ // Context namespace
+ rc = mcp_context_route(mcpc, method + 8, params_obj, id);
+ }
+ else if (strncmp(method, "system/", 7) == 0) {
+ // System namespace
+ rc = mcp_system_route(mcpc, method + 7, params_obj, id);
+ }
+ else if (strcmp(method, "initialize") == 0) {
+ // Extract client info from initialize request
+ mcp_extract_client_info(mcpc, params_obj);
+ netdata_log_debug(D_WEB_CLIENT, "MCP initialize request from client %s v%s",
+ string2str(mcpc->client_name), string2str(mcpc->client_version));
+
+ // Handle initialize method
+ rc = mcp_method_initialize(mcpc, params_obj, id);
+ }
+ else {
+ buffer_sprintf(mcpc->error, "Method '%s' not found", method);
+ rc = MCP_RC_NOT_FOUND;
+ }
+
+ // If this is a notification (no ID), don't generate a response
+ if (!has_id) {
+ return rc;
+ }
+
+ // For requests with IDs, ensure we have a valid response
+ if (rc != MCP_RC_OK && !buffer_strlen(mcpc->result)) {
+ mcp_error_result(mcpc, id, rc);
+ }
+
+ if (!buffer_strlen(mcpc->result)) {
+ buffer_strcat(mcpc->error, "method generated empty result");
+ mcp_error_result(mcpc, id, MCP_RC_INTERNAL_ERROR);
+ }
+
+ return rc;
+}
+
+// Main MCP entry point - handle a JSON-RPC request (can be single or batch)
+MCP_RETURN_CODE mcp_handle_request(MCP_CLIENT *mcpc, struct json_object *request) {
+ if (!mcpc || !request) return MCP_RC_INTERNAL_ERROR;
+
+ // Clear previous response buffers
+ buffer_flush(mcpc->result);
+ buffer_flush(mcpc->error);
+
+ // Check if this is a batch request (JSON array)
+ if (json_object_get_type(request) == json_type_array) {
+ int array_len = json_object_array_length(request);
+
+ // Empty batch should return nothing according to JSON-RPC 2.0 spec
+ if (array_len == 0) {
+ return MCP_RC_OK;
+ }
+
+ // Create a temporary buffer for building the batch response
+ BUFFER *batch_buffer = buffer_create(4096, NULL);
+ buffer_flush(batch_buffer);
+
+ // Start the JSON array for batch response
+ buffer_strcat(batch_buffer, "[");
+
+ // Track if we've added any responses (for comma handling)
+ bool has_responses = false;
+
+ // Process each request in the batch
+ for (int i = 0; i < array_len; i++) {
+ struct json_object *req_item = json_object_array_get_idx(request, i);
+
+ // Process the individual request
+ buffer_flush(mcpc->result);
+ buffer_flush(mcpc->error);
+
+ // Extract ID to determine if it's a request or notification
+ struct json_object *id_obj = NULL;
+ bool has_id = json_object_object_get_ex(req_item, "id", &id_obj);
+
+ // Call the single request handler
+ mcp_single_request(mcpc, req_item);
+
+ // For notifications (no id), don't add to response
+ if (!has_id || buffer_strlen(mcpc->result) == 0) {
+ continue;
+ }
+
+ // Add comma if this isn't the first response
+ if (has_responses) {
+ buffer_strcat(batch_buffer, ", ");
+ }
+
+ // Add the response to the batch
+ buffer_strcat(batch_buffer, buffer_tostring(mcpc->result));
+ has_responses = true;
+ }
+
+ // If no responses were added (all notifications), don't send anything per JSON-RPC spec
+ if (!has_responses) {
+ buffer_free(batch_buffer);
+ return MCP_RC_OK;
+ }
+
+ // Close the JSON array
+ buffer_strcat(batch_buffer, "]");
+
+ // Copy batch response to client's result buffer
+ buffer_flush(mcpc->result);
+ buffer_strcat(mcpc->result, buffer_tostring(batch_buffer));
+ buffer_free(batch_buffer);
+
+ // Send the batch response
+ mcp_send_response_buffer(mcpc);
+
+ return MCP_RC_OK;
+ }
+ else {
+ // Handle single request
+ MCP_RETURN_CODE rc = mcp_single_request(mcpc, request);
+
+ // Extract ID to determine if it's a request or notification
+ struct json_object *id_obj = NULL;
+ bool has_id = json_object_object_get_ex(request, "id", &id_obj);
+
+ // Only send responses for requests with IDs, not for notifications
+ if (has_id && buffer_strlen(mcpc->result) > 0) {
+ mcp_send_response_buffer(mcpc);
+ }
+
+ return rc;
+ }
+}
+
+// Initialize the MCP subsystem
+void mcp_initialize_subsystem(void) {
+ netdata_log_info("MCP subsystem initialized");
+}
diff --git a/src/web/mcp/mcp.h b/src/web/mcp/mcp.h
new file mode 100644
index 00000000000000..cc20eb73d40346
--- /dev/null
+++ b/src/web/mcp/mcp.h
@@ -0,0 +1,134 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#ifndef NETDATA_MCP_H
+#define NETDATA_MCP_H
+
+#include "libnetdata/libnetdata.h"
+#include
+
+// MCP protocol versions
+typedef enum {
+ MCP_PROTOCOL_VERSION_UNKNOWN = 0,
+ MCP_PROTOCOL_VERSION_2024_11_05 = 20241105, // Using numeric date format for natural ordering
+ MCP_PROTOCOL_VERSION_2025_03_26 = 20250326,
+ // Add future versions here
+
+ // Always keep this pointing to the latest version
+ MCP_PROTOCOL_VERSION_LATEST = MCP_PROTOCOL_VERSION_2025_03_26
+} MCP_PROTOCOL_VERSION;
+ENUM_STR_DEFINE_FUNCTIONS_EXTERN(MCP_PROTOCOL_VERSION);
+
+// JSON-RPC error codes (standard)
+#define MCP_ERROR_PARSE_ERROR -32700
+#define MCP_ERROR_INVALID_REQUEST -32600
+#define MCP_ERROR_METHOD_NOT_FOUND -32601
+#define MCP_ERROR_INVALID_PARAMS -32602
+#define MCP_ERROR_INTERNAL_ERROR -32603
+// Server error codes (implementation-defined)
+#define MCP_ERROR_SERVER_ERROR_MIN -32099
+#define MCP_ERROR_SERVER_ERROR_MAX -32000
+
+// Content types (for messages and tool responses)
+typedef enum {
+ MCP_CONTENT_TYPE_TEXT = 0,
+ MCP_CONTENT_TYPE_IMAGE = 1,
+ MCP_CONTENT_TYPE_AUDIO = 2, // New in 2025-03-26
+} MCP_CONTENT_TYPE;
+
+// Forward declarations for transport-specific types
+struct websocket_server_client;
+struct web_client;
+
+// Transport types for MCP
+typedef enum {
+ MCP_TRANSPORT_UNKNOWN = 0,
+ MCP_TRANSPORT_WEBSOCKET,
+ MCP_TRANSPORT_HTTP,
+ // Add more as needed
+} MCP_TRANSPORT;
+
+// Transport capabilities
+typedef enum {
+ MCP_CAPABILITY_NONE = 0,
+ MCP_CAPABILITY_ASYNC_COMMUNICATION = (1 << 0), // Can send messages at any time
+ MCP_CAPABILITY_SUBSCRIPTIONS = (1 << 1), // Supports subscriptions
+ MCP_CAPABILITY_NOTIFICATIONS = (1 << 2), // Supports notifications
+ // Add more as needed
+} MCP_CAPABILITY;
+
+// Return codes for MCP functions
+typedef enum {
+ MCP_RC_OK = 0, // Success, result buffer contains valid response
+ MCP_RC_ERROR = 1, // Generic error, error buffer contains message
+ MCP_RC_INVALID_PARAMS = 2, // Invalid parameters in request
+ MCP_RC_NOT_FOUND = 3, // Resource or method not found
+ MCP_RC_INTERNAL_ERROR = 4, // Internal server error
+ MCP_RC_NOT_IMPLEMENTED = 5 // Method not implemented
+ // Can add more specific errors as needed
+} MCP_RETURN_CODE;
+ENUM_STR_DEFINE_FUNCTIONS_EXTERN(MCP_RETURN_CODE);
+
+// Response handling context
+typedef struct mcp_client {
+ // Transport type and capabilities
+ MCP_TRANSPORT transport;
+ MCP_CAPABILITY capabilities;
+
+ // Protocol version (detected during initialization)
+ MCP_PROTOCOL_VERSION protocol_version;
+
+ // Transport-specific context
+ union {
+ struct websocket_server_client *websocket; // WebSocket client
+ struct web_client *http; // HTTP client
+ void *generic; // Generic context
+ };
+
+ // Client information
+ STRING *client_name; // Client name (for logging, interned)
+ STRING *client_version; // Client version (for logging, interned)
+
+ // Response buffers
+ BUFFER *result; // Pre-allocated buffer for success responses
+ BUFFER *error; // Pre-allocated buffer for error messages
+
+ // Utility buffers
+ BUFFER *uri; // Pre-allocated buffer for URI decoding
+} MCP_CLIENT;
+
+// Helper function to convert string version to numeric version
+MCP_PROTOCOL_VERSION mcp_protocol_version_from_string(const char *version_str);
+
+// Helper function to convert numeric version to string version
+const char *mcp_protocol_version_to_string(MCP_PROTOCOL_VERSION version);
+
+// Create a response context for a transport session
+MCP_CLIENT *mcp_create_client(MCP_TRANSPORT transport, void *transport_ctx);
+
+// Free a response context
+void mcp_free_client(MCP_CLIENT *mcpc);
+
+// Helper functions for creating and sending JSON-RPC responses
+
+// Functions to initialize and build MCP responses
+void mcp_init_success_result(MCP_CLIENT *mcpc, uint64_t id);
+MCP_RETURN_CODE mcp_error_result(MCP_CLIENT *mcpc, uint64_t id, MCP_RETURN_CODE rc);
+void mcp_jsonrpc_error(BUFFER *result, const char *error, uint64_t id, int jsonrpc_code);
+
+// Send prepared buffer content as response
+int mcp_send_response_buffer(MCP_CLIENT *mcpc);
+
+// Check if a capability is supported by the transport
+static inline bool mcp_has_capability(MCP_CLIENT *mcpc, MCP_CAPABILITY capability) {
+ return mcpc && (mcpc->capabilities & capability);
+}
+
+// Initialize the MCP subsystem
+void mcp_initialize_subsystem(void);
+
+// Main MCP entry point - handle a JSON-RPC request (single or batch)
+MCP_RETURN_CODE mcp_handle_request(MCP_CLIENT *mcpc, struct json_object *request);
+
+const char *mcp_uri_decode(MCP_CLIENT *mcpc, const char *src);
+
+#endif // NETDATA_MCP_H
diff --git a/src/web/server/static/static-threaded.c b/src/web/server/static/static-threaded.c
index 963aae58e05dac..aa1692e33a676e 100644
--- a/src/web/server/static/static-threaded.c
+++ b/src/web/server/static/static-threaded.c
@@ -166,6 +166,13 @@ static void web_server_del_callback(POLLINFO *pi) {
worker_is_idle();
}
+static __thread POLLINFO *current_thread_pollinfo = NULL;
+
+void web_server_remove_current_socket_from_poll(void) {
+ if(!current_thread_pollinfo) return;
+ poll_process_remove_from_poll(current_thread_pollinfo);
+}
+
static int web_server_rcv_callback(POLLINFO *pi, nd_poll_event_t *events) {
int ret = -1;
worker_is_busy(WORKER_JOB_RCV_DATA);
@@ -184,7 +191,9 @@ static int web_server_rcv_callback(POLLINFO *pi, nd_poll_event_t *events) {
netdata_log_debug(D_WEB_CLIENT, "%llu: processing received data on fd %d.", w->id, fd);
worker_is_idle();
worker_is_busy(WORKER_JOB_PROCESS);
+ current_thread_pollinfo = pi;
web_client_process_request_from_web_server(w);
+ current_thread_pollinfo = NULL;
if (unlikely(w->mode == HTTP_REQUEST_MODE_STREAM)) {
ssize_t rc = web_client_send(w);
@@ -226,7 +235,9 @@ static int web_server_snd_callback(POLLINFO *pi, nd_poll_event_t *events) {
netdata_log_debug(D_WEB_CLIENT, "%llu: sending data on fd %d.", w->id, fd);
+ current_thread_pollinfo = pi;
ssize_t ret = web_client_send(w);
+ current_thread_pollinfo = NULL;
if(unlikely(ret < 0)) {
retval = -1;
@@ -298,8 +309,7 @@ void *socket_listen_main_static_threaded_worker(void *ptr) {
, NULL
, web_client_first_request_timeout
, web_client_timeout
- ,
- nd_profile.update_every * 1000 // timer_milliseconds
+ , nd_profile.update_every * 1000 // timer_milliseconds
, ptr // timer_data
, worker_private->max_sockets
);
diff --git a/src/web/server/static/static-threaded.h b/src/web/server/static/static-threaded.h
index a8c5335ef567d1..0019ad5bb364db 100644
--- a/src/web/server/static/static-threaded.h
+++ b/src/web/server/static/static-threaded.h
@@ -6,5 +6,6 @@
#include "web/server/web_server.h"
void *socket_listen_main_static_threaded(void *ptr);
+void web_server_remove_current_socket_from_poll(void);
#endif //NETDATA_WEB_SERVER_STATIC_THREADED_H
diff --git a/src/web/server/web_client.c b/src/web/server/web_client.c
index da3b8529b14c92..bc6ed72dc3b335 100644
--- a/src/web/server/web_client.c
+++ b/src/web/server/web_client.c
@@ -1,6 +1,7 @@
// SPDX-License-Identifier: GPL-3.0-or-later
#include "web_client.h"
+#include "web/websocket/websocket.h"
// this is an async I/O implementation of the web server request parser
// it is used by all netdata web servers
@@ -163,6 +164,15 @@ static void web_client_reset_allocations(struct web_client *w, bool free_all) {
freez(w->auth_bearer_token);
w->auth_bearer_token = NULL;
+
+ // Free WebSocket resources
+ freez(w->websocket.key);
+ w->websocket.key = NULL;
+
+ w->websocket.ext_flags = WS_EXTENSION_NONE;
+ w->websocket.protocol = WS_PROTOCOL_DEFAULT;
+ w->websocket.client_max_window_bits = 0;
+ w->websocket.server_max_window_bits = 0;
// if we had enabled compression, release it
if(w->response.zinitialized) {
@@ -909,7 +919,11 @@ void web_client_build_http_header(struct web_client *w) {
}
static inline void web_client_send_http_header(struct web_client *w) {
- web_client_build_http_header(w);
+ // For WebSocket handshake, the header is already fully prepared in websocket_handle_handshake
+ // For standard HTTP responses, we need to build the header
+ if (w->response.code != HTTP_RESP_WEBSOCKET_HANDSHAKE) {
+ web_client_build_http_header(w);
+ }
// sent the HTTP header
netdata_log_debug(D_WEB_DATA, "%llu: Sending response HTTP header of size %zu: '%s'"
@@ -1216,8 +1230,6 @@ static inline int web_client_process_url(RRDHOST *host, struct web_client *w, ch
return HTTP_RESP_NOT_FOUND;
}
- debug_flags |= D_RRD_STATS;
-
if(rrdset_flag_check(st, RRDSET_FLAG_DEBUG))
rrdset_flag_clear(st, RRDSET_FLAG_DEBUG);
else
@@ -1303,6 +1315,14 @@ void web_client_process_request_from_web_server(struct web_client *w) {
w->forwarded_for ? w->forwarded_for : w->client_ip);
}
+ // Check if this is a WebSocket upgrade request
+ // The full WebSocket handshake detection will happen in the header parsing,
+ // but we need to set the initial mode to GET for processing to continue
+ if (w->mode == HTTP_REQUEST_MODE_GET && web_client_has_websocket_handshake(w) && web_client_is_websocket(w)) {
+ w->mode = HTTP_REQUEST_MODE_WEBSOCKET;
+ netdata_log_debug(D_WEB_CLIENT, "%llu: Detected WebSocket handshake request", w->id);
+ }
+
switch(w->mode) {
case HTTP_REQUEST_MODE_STREAM:
if(unlikely(!http_can_access_stream(w))) {
@@ -1313,6 +1333,21 @@ void web_client_process_request_from_web_server(struct web_client *w) {
w->response.code = stream_receiver_accept_connection(
w, (char *)buffer_tostring(w->url_query_string_decoded), NULL);
return;
+
+ case HTTP_REQUEST_MODE_WEBSOCKET:
+ if(unlikely(!http_can_access_dashboard(w))) {
+ web_client_permission_denied_acl(w);
+ return;
+ }
+
+ // Handle WebSocket handshake - this will take over the socket
+ // similar to how stream_receiver_accept_connection works
+ w->response.code = websocket_handle_handshake(w);
+
+ // After this point the socket has been taken over
+ // No need to send a response as the WebSocket handler
+ // has already sent the handshake response
+ return;
case HTTP_REQUEST_MODE_OPTIONS:
if(unlikely(
@@ -1398,7 +1433,7 @@ void web_client_process_request_from_web_server(struct web_client *w) {
// wait for more data
// set to normal to prevent web_server_rcv_callback
// from going into stream mode
- if (w->mode == HTTP_REQUEST_MODE_STREAM)
+ if (w->mode == HTTP_REQUEST_MODE_STREAM || w->mode == HTTP_REQUEST_MODE_WEBSOCKET)
w->mode = HTTP_REQUEST_MODE_GET;
return;
}
@@ -1459,6 +1494,10 @@ void web_client_process_request_from_web_server(struct web_client *w) {
netdata_log_debug(D_WEB_CLIENT, "%llu: STREAM done.", w->id);
break;
+ case HTTP_REQUEST_MODE_WEBSOCKET:
+ netdata_log_debug(D_WEB_CLIENT, "%llu: Done preparing the WEBSOCKET response..", w->id);
+ break;
+
case HTTP_REQUEST_MODE_OPTIONS:
netdata_log_debug(D_WEB_CLIENT,
"%llu: Done preparing the OPTIONS response. Sending data (%zu bytes) to client.",
diff --git a/src/web/server/web_client.h b/src/web/server/web_client.h
index 9ff3e9d93d851f..f9748f9f6d797e 100644
--- a/src/web/server/web_client.h
+++ b/src/web/server/web_client.h
@@ -4,6 +4,7 @@
#define NETDATA_WEB_CLIENT_H 1
#include "libnetdata/libnetdata.h"
+#include "../websocket/websocket.h"
struct web_client;
@@ -66,6 +67,10 @@ typedef enum __attribute__((packed)) {
// transient settings
WEB_CLIENT_FLAG_PROGRESS_TRACKING = (1 << 25), // flag to avoid redoing progress work
+
+ // websocket flags
+ WEB_CLIENT_FLAG_WEBSOCKET_CLIENT = (1 << 26), // this is a websocket client
+ WEB_CLIENT_FLAG_WEBSOCKET_HANDSHAKE = (1 << 27), // websocket handshake detected
} WEB_CLIENT_FLAGS;
#define WEB_CLIENT_FLAG_PATH_WITH_VERSION (WEB_CLIENT_FLAG_PATH_IS_V0|WEB_CLIENT_FLAG_PATH_IS_V1|WEB_CLIENT_FLAG_PATH_IS_V2|WEB_CLIENT_FLAG_PATH_IS_V3)
@@ -116,6 +121,14 @@ typedef enum __attribute__((packed)) {
#define web_client_flags_check_auth(w) web_client_flag_check(w, WEB_CLIENT_FLAG_ALL_AUTHS)
#define web_client_flags_clear_auth(w) web_client_flag_clear(w, WEB_CLIENT_FLAG_ALL_AUTHS)
+#define web_client_is_websocket(w) web_client_flag_check(w, WEB_CLIENT_FLAG_WEBSOCKET_CLIENT)
+#define web_client_set_websocket(w) web_client_flag_set(w, WEB_CLIENT_FLAG_WEBSOCKET_CLIENT)
+#define web_client_clear_websocket(w) web_client_flag_clear(w, WEB_CLIENT_FLAG_WEBSOCKET_CLIENT)
+
+#define web_client_has_websocket_handshake(w) web_client_flag_check(w, WEB_CLIENT_FLAG_WEBSOCKET_HANDSHAKE)
+#define web_client_set_websocket_handshake(w) web_client_flag_set(w, WEB_CLIENT_FLAG_WEBSOCKET_HANDSHAKE)
+#define web_client_clear_websocket_handshake(w) web_client_flag_clear(w, WEB_CLIENT_FLAG_WEBSOCKET_HANDSHAKE)
+
void web_client_reset_permissions(struct web_client *w);
void web_client_set_permissions(struct web_client *w, HTTP_ACCESS access, HTTP_USER_ROLE role, WEB_CLIENT_FLAGS auth);
@@ -187,6 +200,15 @@ struct web_client {
char *origin; // the Origin: header
char *user_agent; // the User-Agent: header
+ // WebSocket related data - NEED TO BE FREED
+ struct {
+ char *key; // the Sec-WebSocket-Key header
+ WEBSOCKET_PROTOCOL protocol; // the selected subprotocol
+ WEBSOCKET_EXTENSION ext_flags; // bit flags for supported extensions
+ uint8_t client_max_window_bits; // client_max_window_bits parameter (8-15)
+ uint8_t server_max_window_bits; // server_max_window_bits parameter (8-15)
+ } websocket;
+
BUFFER *payload; // when this request is a POST, this has the payload
NETDATA_SSL ssl;
diff --git a/src/web/server/web_server.c b/src/web/server/web_server.c
index 4f307ee3476617..058289ed4f3167 100644
--- a/src/web/server/web_server.c
+++ b/src/web/server/web_server.c
@@ -69,6 +69,8 @@ void web_server_listen_sockets_setup(void) {
if(unlikely(debug_flags & D_WEB_CLIENT))
debug_sockets();
+
+ websocket_initialize();
}
diff --git a/src/web/websocket/autobahn-test-suite/config/fuzzingclient.json b/src/web/websocket/autobahn-test-suite/config/fuzzingclient.json
new file mode 100644
index 00000000000000..e49739bce31abb
--- /dev/null
+++ b/src/web/websocket/autobahn-test-suite/config/fuzzingclient.json
@@ -0,0 +1,15 @@
+{
+ "outdir": "./reports/clients",
+ "servers": [
+ {
+ "agent": "Netdata WebSocket Server",
+ "url": "ws://localhost:19999/echo",
+ "options": {
+ "version": 18
+ }
+ }
+ ],
+ "cases": ["*"],
+ "exclude-cases": [],
+ "exclude-agent-cases": {}
+}
diff --git a/src/web/websocket/autobahn-test-suite/run-test.sh b/src/web/websocket/autobahn-test-suite/run-test.sh
new file mode 100755
index 00000000000000..9ec2362f06d07e
--- /dev/null
+++ b/src/web/websocket/autobahn-test-suite/run-test.sh
@@ -0,0 +1,15 @@
+#!/usr/bin/env bash
+
+if [ ! -d "config" ]; then
+ echo "Please create a config directory with the necessary configuration files."
+ exit 1
+fi
+
+mkdir -p reports/clients
+
+docker run -it --rm \
+ -v ${PWD}/config:/config \
+ -v ${PWD}/reports:/reports \
+ -p 9001:9001 \
+ crossbario/autobahn-testsuite \
+ wstest -m fuzzingclient -s /config/fuzzingclient.json
diff --git a/src/web/websocket/websocket-buffer.h b/src/web/websocket/websocket-buffer.h
new file mode 100644
index 00000000000000..714a550209f70e
--- /dev/null
+++ b/src/web/websocket/websocket-buffer.h
@@ -0,0 +1,221 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#ifndef NETDATA_WEBSOCKET_BUFFER_H
+#define NETDATA_WEBSOCKET_BUFFER_H
+
+#include "websocket-internal.h"
+
+ALWAYS_INLINE
+static void websocket_unmask(char *dst, const char *src, size_t length, const unsigned char *mask_key) {
+ for (size_t i = 0; i < length; i++)
+ dst[i] = (char)((unsigned char)src[i] ^ mask_key[i % 4]);
+}
+
+// Initialize an already allocated buffer structure
+ALWAYS_INLINE
+static void wsb_init(WS_BUF *wsb, size_t initial_size) {
+ if (!wsb) return;
+ wsb->data = mallocz(initial_size);
+ wsb->size = initial_size;
+ wsb->length = 0;
+}
+
+// Clean up an embedded buffer (free data but not the buffer structure itself)
+ALWAYS_INLINE
+static void wsb_cleanup(WS_BUF *wsb) {
+ if (!wsb) return;
+ freez(wsb->data);
+ wsb->data = NULL;
+ wsb->size = 0;
+ wsb->length = 0;
+}
+
+// Initialize buffer structure
+ALWAYS_INLINE
+static WS_BUF *wsb_create(size_t initial_size) {
+ WS_BUF *buffer = mallocz(sizeof(WS_BUF));
+ wsb_init(buffer, MAX(initial_size, 1024));
+ return buffer;
+}
+
+// Free buffer structure
+ALWAYS_INLINE
+static void wsb_free(WS_BUF *wsb) {
+ if (!wsb) return;
+ wsb_cleanup(wsb);
+ freez(wsb);
+}
+
+// Resize buffer to a new size
+ALWAYS_INLINE
+static void wsb_resize(WS_BUF *wsb, size_t new_size) {
+ if (new_size <= wsb->size) return;
+ wsb->data = reallocz(wsb->data, new_size);
+ wsb->size = new_size;
+}
+
+// Append data to buffer
+ALWAYS_INLINE
+static void wsb_need_bytes(WS_BUF *wsb, size_t bytes) {
+ if (!wsb) return;
+
+ // 1 for null + 4 for the final decompression padding
+ size_t wanted_size = wsb->length + bytes + 1 + 4;
+ if (wanted_size < wsb->size)
+ return;
+
+ size_t new_size = wsb->size * 2;
+ if (new_size < wanted_size)
+ new_size = wanted_size;
+
+ wsb_resize(wsb, new_size);
+}
+
+// Reset buffer
+ALWAYS_INLINE
+static void wsb_reset(WS_BUF *wsb) {
+ if (!wsb) return;
+ wsb->length = 0;
+}
+
+// Ensure buffer has null termination for text data
+ALWAYS_INLINE
+static void wsb_null_terminate(WS_BUF *wsb) {
+ if (!wsb) return;
+ wsb_need_bytes(wsb, 1);
+ wsb->data[wsb->length] = '\0';
+}
+
+// Check if buffer is empty
+ALWAYS_INLINE
+static bool wsb_is_empty(const WS_BUF *wsb) {
+ return (!wsb || wsb->length == 0);
+}
+
+// Check if buffer has data
+ALWAYS_INLINE
+static bool wsb_has_data(const WS_BUF *wsb) {
+ return (wsb && wsb->data && wsb->length > 0);
+}
+
+// Get pointer to buffer data
+ALWAYS_INLINE
+static char *wsb_data(WS_BUF *wsb) {
+ return wsb ? wsb->data : NULL;
+}
+
+// Get current buffer length
+ALWAYS_INLINE
+static size_t wsb_length(const WS_BUF *wsb) {
+ return wsb ? wsb->length : 0;
+}
+
+// Get allocated buffer size
+ALWAYS_INLINE
+static size_t wsb_size(const WS_BUF *wsb) {
+ return wsb ? wsb->size : 0;
+}
+
+// Set buffer length (must be <= buffer size)
+ALWAYS_INLINE
+static void wsb_set_length(WS_BUF *wsb, size_t length) {
+ if (!wsb) return;
+
+ if (length > wsb->size)
+ fatal("WEBSOCKET: trying to set length to %zu, but buffer size is %zu", length, wsb->size);
+
+ wsb->length = length;
+}
+
+// Append data to a buffer
+ALWAYS_INLINE
+static char *wsb_append(WS_BUF *wsb, const void *data, size_t length) {
+ if (!wsb || !data || !length)
+ return NULL;
+
+ // Ensure buffer is large enough
+ wsb_need_bytes(wsb, length);
+
+ char *dst = wsb->data + wsb->length;
+
+ // Copy data to end of buffer
+ memcpy(dst, data, length);
+
+ // Update length
+ wsb->length += length;
+
+ return dst;
+}
+
+// Unmask and append binary data to a buffer, returns pointer to beginning of the unmasked data
+ALWAYS_INLINE
+static char *wsb_unmask_and_append(WS_BUF *wsb, const void *masked_data,
+ size_t length, const unsigned char *mask_key) {
+ if (!wsb || !masked_data || !length || !mask_key)
+ return NULL;
+
+ // Ensure buffer is large enough for the new data
+ wsb_need_bytes(wsb, length);
+
+ // Get a pointer to the destination in the expanded buffer
+ char *dst = wsb->data + wsb->length;
+
+ // Unmask the data directly into the buffer by calling websocket_unmask
+ websocket_unmask(dst, (const char *)masked_data, length, mask_key);
+
+ // Update buffer length
+ wsb->length += length;
+
+ return dst;
+}
+
+// Append data to a buffer but don't change the length (use for padding)
+// Returns pointer to beginning of the appended data area
+ALWAYS_INLINE
+static char *wsb_append_padding(WS_BUF *wsb, const void *data, size_t length) {
+ if (!wsb || !data || !length)
+ return NULL;
+
+ // Ensure buffer is large enough for the new data
+ wsb_need_bytes(wsb, length);
+
+ // Get pointer to where the data will be stored
+ char *dst = wsb->data + wsb->length;
+
+ // Copy data to end of buffer
+ memcpy(dst, data, length);
+
+ // Don't update length - this is the difference from wsb_append()
+ // This allows adding "padding" data after the logical end of the buffer
+
+ return dst;
+}
+
+// Remove bytes from the front of the buffer, shifting remaining content forward
+// Returns the number of bytes actually trimmed (may be less than requested if buffer is smaller)
+ALWAYS_INLINE
+static size_t wsb_trim_front(WS_BUF *wsb, size_t bytes_to_trim) {
+ if (!wsb || !wsb->data || bytes_to_trim == 0 || wsb->length == 0)
+ return 0;
+
+ // Cap the trim size to the actual buffer length
+ size_t actual_trim = (bytes_to_trim > wsb->length) ? wsb->length : bytes_to_trim;
+
+ if (actual_trim < wsb->length) {
+ // More data in buffer - shift remaining data to beginning
+ size_t remaining = wsb->length - actual_trim;
+
+ // Shift the remaining data to the beginning of the buffer
+ memmove(wsb->data, wsb->data + actual_trim, remaining);
+
+ // Update buffer length to reflect the shift
+ wsb->length = remaining;
+ } else {
+ // All data was trimmed or the buffer is empty - reset length
+ wsb->length = 0;
+ }
+
+ return actual_trim;
+}
+
+#endif //NETDATA_WEBSOCKET_BUFFER_H
diff --git a/src/web/websocket/websocket-compression.c b/src/web/websocket/websocket-compression.c
new file mode 100644
index 00000000000000..693c27fa9863bb
--- /dev/null
+++ b/src/web/websocket/websocket-compression.c
@@ -0,0 +1,248 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#include "websocket-internal.h"
+
+// Initialize compression resources using the parsed options
+bool websocket_compression_init(WS_CLIENT *wsc) {
+ internal_fatal(wsc->wth->tid != gettid_cached(), "Function %s() should only be used by the websocket thread", __FUNCTION__ );
+
+ if (!wsc->compression.enabled) {
+ websocket_debug(wsc, "Compression is disabled");
+ return false;
+ }
+
+ // Initialize deflate (compression) context for server-to-client messages
+ wsc->compression.deflate_stream = mallocz(sizeof(z_stream));
+ wsc->compression.deflate_stream->zalloc = Z_NULL;
+ wsc->compression.deflate_stream->zfree = Z_NULL;
+ wsc->compression.deflate_stream->opaque = Z_NULL;
+
+ // Initialize with negative window bits for raw deflate (no zlib/gzip header)
+ // Use server_max_window_bits for outgoing (server-to-client) messages
+ int ret = deflateInit2(
+ wsc->compression.deflate_stream,
+ wsc->compression.compression_level,
+ Z_DEFLATED,
+ -wsc->compression.server_max_window_bits,
+ WS_COMPRESS_MEMLEVEL,
+ Z_DEFAULT_STRATEGY
+ );
+
+ if (ret != Z_OK) {
+ websocket_error(wsc, "Failed to initialize deflate context: %s (%d)",
+ zError(ret), ret);
+ freez(wsc->compression.deflate_stream);
+ wsc->compression.deflate_stream = NULL;
+ return false;
+ }
+
+ websocket_debug(wsc, "Compression initialized (server window bits: %d)",
+ wsc->compression.server_max_window_bits);
+
+ return true;
+}
+
+// Initialize decompression resources for a client
+bool websocket_decompression_init(WS_CLIENT *wsc) {
+ internal_fatal(wsc->wth->tid != gettid_cached(), "Function %s() should only be used by the websocket thread", __FUNCTION__ );
+
+ if (!wsc->compression.enabled) {
+ websocket_debug(wsc, "Decompression is disabled");
+ return false;
+ }
+
+ // Create a new inflate stream
+ wsc->compression.inflate_stream = mallocz(sizeof(z_stream));
+ wsc->compression.inflate_stream->zalloc = Z_NULL;
+ wsc->compression.inflate_stream->zfree = Z_NULL;
+ wsc->compression.inflate_stream->opaque = Z_NULL;
+
+ // Initialize with negative window bits for raw deflate (no zlib/gzip header)
+ // Use client_max_window_bits for incoming (client-to-server) messages
+ int init_ret = inflateInit2(wsc->compression.inflate_stream, -wsc->compression.client_max_window_bits);
+
+ if (init_ret != Z_OK) {
+ websocket_error(wsc, "Failed to initialize inflate stream: %s (%d)",
+ zError(init_ret), init_ret);
+ freez(wsc->compression.inflate_stream);
+ wsc->compression.inflate_stream = NULL;
+ return false;
+ }
+
+ websocket_debug(wsc, "Decompression initialized (client window bits: %d)",
+ wsc->compression.client_max_window_bits);
+
+ return true;
+}
+
+// Clean up compression resources for a WebSocket client
+void websocket_compression_cleanup(WS_CLIENT *wsc) {
+ // Clean up deflate context
+ if (!wsc->compression.deflate_stream)
+ return;
+
+ internal_fatal(wsc->wth->tid != gettid_cached(), "Function %s() should only be used by the websocket thread", __FUNCTION__ );
+
+ // Set up dummy I/O pointers to ensure clean state
+ unsigned char dummy_buffer[16] = {0};
+ wsc->compression.deflate_stream->next_in = dummy_buffer;
+ wsc->compression.deflate_stream->avail_in = 0;
+ wsc->compression.deflate_stream->next_out = dummy_buffer;
+ wsc->compression.deflate_stream->avail_out = sizeof(dummy_buffer);
+
+ // Always call deflateEnd to release internal zlib resources
+ // Don't bother with deflateReset as deflateEnd will clean up properly
+ int ret = deflateEnd(wsc->compression.deflate_stream);
+
+ if (ret != Z_OK && ret != Z_DATA_ERROR) {
+ // Z_DATA_ERROR can happen in some edge cases, it's not critical here
+ // as we're cleaning up anyway
+ websocket_debug(wsc, "deflateEnd returned %d: %s", ret, zError(ret));
+ }
+
+ // Free the stream structure
+ freez(wsc->compression.deflate_stream);
+ wsc->compression.deflate_stream = NULL;
+
+ websocket_debug(wsc, "Compression resources cleaned up");
+}
+
+// Clean up decompression resources for a client's inflate stream
+void websocket_decompression_cleanup(WS_CLIENT *wsc) {
+ if (!wsc->compression.inflate_stream)
+ return;
+
+ internal_fatal(wsc->wth->tid != gettid_cached(), "Function %s() should only be used by the websocket thread", __FUNCTION__ );
+
+ // End the current inflate stream and free its resources
+ inflateEnd(wsc->compression.inflate_stream);
+ freez(wsc->compression.inflate_stream);
+ wsc->compression.inflate_stream = NULL;
+
+ websocket_debug(wsc, "Decompression resources cleaned up");
+}
+
+// Reset compression resources for a client - calls cleanup and init
+ALWAYS_INLINE
+bool websocket_compression_reset(WS_CLIENT *wsc) {
+ websocket_compression_cleanup(wsc);
+ return websocket_compression_init(wsc);
+}
+
+// Reset decompression resources for a client - calls cleanup and init
+ALWAYS_INLINE
+bool websocket_decompression_reset(WS_CLIENT *wsc) {
+ websocket_decompression_cleanup(wsc);
+ return websocket_decompression_init(wsc);
+}
+
+// Decompress a client's message from payload to u_payload
+bool websocket_client_decompress_message(WS_CLIENT *wsc) {
+ internal_fatal(wsc->wth->tid != gettid_cached(), "Function %s() should only be used by the websocket thread", __FUNCTION__ );
+
+ if (!wsc->is_compressed || !wsc->compression.enabled || !wsc->compression.inflate_stream)
+ return false;
+
+ if (wsb_is_empty(&wsc->payload)) {
+ websocket_debug(wsc, "Empty compressed message");
+ wsb_reset(&wsc->u_payload);
+ wsb_null_terminate(&wsc->u_payload);
+ return true;
+ }
+
+ websocket_debug(wsc, "Decompressing message (%zu bytes)", wsb_length(&wsc->payload));
+
+ z_stream *zstrm = wsc->compression.inflate_stream;
+ wsb_reset(&wsc->u_payload);
+
+ // Per RFC 7692, we need to append 4 bytes (00 00 FF FF) to the compressed data
+ // to ensure the inflate operation completes
+ static const unsigned char trailer[4] = {0x00, 0x00, 0xFF, 0xFF};
+ wsb_append_padding(&wsc->payload, trailer, 4);
+
+ zstrm->next_in = (Bytef *)wsb_data(&wsc->payload);
+ zstrm->avail_in = wsb_length(&wsc->payload) + 4;
+ zstrm->next_out = (Bytef *)wsb_data(&wsc->u_payload);
+ zstrm->avail_out = wsb_size(&wsc->u_payload);
+ zstrm->total_in = 0;
+ zstrm->total_out = 0;
+
+ // Decompress with loop for multiple buffer expansions if needed
+ int ret = Z_MEM_ERROR;
+ bool success = false;
+ int retries = 24;
+ size_t wanted_size = MAX(wsb_size(&wsc->u_payload), wsb_length(&wsc->payload) * 2);
+ do {
+ wsb_resize(&wsc->u_payload, wanted_size);
+
+ // Position next_out to point to the end of the currently decompressed data
+ zstrm->next_out = (Bytef *)wsb_data(&wsc->u_payload) + wsb_length(&wsc->u_payload);
+
+ // Only make the newly available space available to zlib
+ zstrm->avail_out = wsb_size(&wsc->u_payload) - wsb_length(&wsc->u_payload);
+
+ // Try to decompress
+ ret = inflate(zstrm, Z_SYNC_FLUSH);
+
+ websocket_debug(wsc, "inflate() returned %d (%s), "
+ "avail_in=%u, avail_out=%u, total_in=%lu, total_out=%lu",
+ ret, zError(ret),
+ zstrm->avail_in, zstrm->avail_out, zstrm->total_in, zstrm->total_out);
+
+ // Handle different return codes from inflate()
+ // Z_STREAM_END - Complete decompression success
+ // Z_OK - Partial success, all input processed or output buffer full
+ // Z_BUF_ERROR - Need more output space
+
+ success = ret == Z_STREAM_END ||
+ (zstrm->avail_in == 0 && zstrm->avail_out > 0 && (ret == Z_OK || ret == Z_BUF_ERROR));
+
+ // Update the buffer's length to include the newly written data
+ wsb_set_length(&wsc->u_payload, wsb_size(&wsc->u_payload) - zstrm->avail_out);
+
+ // Check if we need more output space
+ if (!success && (ret == Z_BUF_ERROR || ret == Z_OK)) {
+ wanted_size = MIN(wanted_size * 2, WEBSOCKET_MAX_UNCOMPRESSED_SIZE);
+ if (wanted_size == WEBSOCKET_MAX_UNCOMPRESSED_SIZE && wanted_size == wsb_size(&wsc->u_payload))
+ break; // we cannot resize more
+ }
+ } while (!success && retries-- > 0);
+
+ if(!success) {
+ // Decompression failed
+ websocket_error(wsc, "Decompression failed: %s (ret = %d, avail_in = %u)", zError(ret), ret, zstrm->avail_in);
+ wsb_reset(&wsc->u_payload);
+ websocket_decompression_reset(wsc);
+ return false;
+ }
+
+ // Log successful decompression with detailed information
+ websocket_debug(wsc, "Successfully decompressed %zu bytes to %zu bytes (ratio: %.2fx)",
+ wsb_length(&wsc->payload), wsb_length(&wsc->u_payload),
+ (double)wsb_length(&wsc->u_payload) / (double)wsb_length(&wsc->payload));
+
+ // Show a preview of the decompressed data
+ websocket_dump_debug(wsc, wsb_data(&wsc->u_payload), wsb_length(&wsc->u_payload), "RX UNCOMPRESSED PAYLOAD");
+
+ // when client context takeover is disabled, reset the decompressor
+ if (!wsc->compression.client_context_takeover) {
+ websocket_debug(wsc, "resetting compression");
+ if(inflateReset2(zstrm, -wsc->compression.client_max_window_bits) != Z_OK) {
+ websocket_debug(wsc, "reset failed, re-initializing compression");
+ if (!websocket_decompression_reset(wsc)) {
+ websocket_debug(wsc, "re-initializing failed, reporting failure");
+ return false;
+ }
+ zstrm = wsc->compression.inflate_stream;
+ }
+ }
+
+ zstrm->next_in = NULL;
+ zstrm->next_out = NULL;
+ zstrm->avail_in = 0;
+ zstrm->avail_out = 0;
+ zstrm->total_in = 0;
+ zstrm->total_out = 0;
+
+ return true;
+}
diff --git a/src/web/websocket/websocket-compression.h b/src/web/websocket/websocket-compression.h
new file mode 100644
index 00000000000000..feb37baaae688a
--- /dev/null
+++ b/src/web/websocket/websocket-compression.h
@@ -0,0 +1,58 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#ifndef NETDATA_WEBSOCKET_COMPRESSION_H
+#define NETDATA_WEBSOCKET_COMPRESSION_H
+
+#include "websocket-internal.h"
+
+// WebSocket compression constants
+#define WS_COMPRESS_WINDOW_BITS 15 // Default window bits (RFC 7692)
+#define WS_COMPRESS_MEMLEVEL 8 // Default memory level for zlib
+#define WS_COMPRESS_DEFAULT_LEVEL Z_DEFAULT_COMPRESSION // Default compression level
+#define WS_COMPRESS_MIN_SIZE 64 // Don't compress payloads smaller than this
+
+// WebSocket compression extension types
+typedef enum {
+ WS_COMPRESS_NONE = 0, // No compression
+ WS_COMPRESS_DEFLATE = 1 // permessage-deflate extension
+} WEBSOCKET_COMPRESSION_TYPE;
+
+// WebSocket compression context structure
+typedef struct websocket_compression_context {
+ WEBSOCKET_COMPRESSION_TYPE type; // Compression type
+ bool enabled; // Whether compression is enabled
+ bool client_context_takeover; // Client context takeover
+ bool server_context_takeover; // Server context takeover
+ int client_max_window_bits; // Max window bits for client-to-server messages (8-15)
+ int server_max_window_bits; // Max window bits for server-to-client messages (8-15)
+ int compression_level; // Compression level
+ z_stream *deflate_stream; // Deflate context for outgoing messages (server-to-client)
+ z_stream *inflate_stream; // Inflate context for incoming messages (client-to-server)
+} WEBSOCKET_COMPRESSION_CTX;
+
+#define WEBSOCKET_COMPRESSION_DEFAULTS (WEBSOCKET_COMPRESSION_CTX){ \
+ .type = WS_COMPRESS_NONE, \
+ .enabled = false, \
+ .client_context_takeover = true, \
+ .server_context_takeover = true, \
+ .client_max_window_bits = WS_COMPRESS_WINDOW_BITS, \
+ .server_max_window_bits = WS_COMPRESS_WINDOW_BITS, \
+ .compression_level = WS_COMPRESS_DEFAULT_LEVEL, \
+ .deflate_stream = NULL, \
+ .inflate_stream = NULL, \
+}
+
+// Forward declaration
+struct websocket_server_client;
+
+// Function declarations
+bool websocket_compression_init(struct websocket_server_client *wsc);
+void websocket_compression_cleanup(struct websocket_server_client *wsc);
+bool websocket_compression_reset(struct websocket_server_client *wsc);
+
+// Decompression-specific functions
+bool websocket_decompression_init(struct websocket_server_client *wsc);
+void websocket_decompression_cleanup(struct websocket_server_client *wsc);
+bool websocket_decompression_reset(struct websocket_server_client *wsc);
+
+#endif // NETDATA_WEBSOCKET_COMPRESSION_H
\ No newline at end of file
diff --git a/src/web/websocket/websocket-echo-test.html b/src/web/websocket/websocket-echo-test.html
new file mode 100644
index 00000000000000..d1bd0de9db4713
--- /dev/null
+++ b/src/web/websocket/websocket-echo-test.html
@@ -0,0 +1,1659 @@
+
+
+
+
+
+
+
+ Codestin Search App
+
+
+
+
Netdata WebSocket Test Client
+
+
+
+
Disconnected
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Protocol:-
+
Extensions:-
+
Compression:Disabled
+
+
+
+
+
+
Basic Testing
+
Stress Test
+
+
+
+
+
+
+
+
+
+
+
+
+
Message Compression Characteristics:
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Use binary mode only when sending data that is not valid UTF-8 text.
+ For most test cases, text mode works fine.
+
+
+
+
+
Messages
+
+
+
+
+
+
+
+
WebSocket Stress Tester
+
Configure the stress test parameters below. This will send random messages for the specified duration and verify responses.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ Check to use a consistent message size for better testing
+
+
+
+
+
+
+ 1
+ 5
+ 100
+
+
+ NOTE: Maximum rate is limited to 100 messages/second for reliable testing,
+ as browsers cannot reliably process WebSocket messages at higher rates.
+
+
+
+
+
+
+
+
+
+
+
+
+
Ready to start
+
+
+
+
+
0
+
Messages Sent
+
+
+
0
+
Messages Received
+
+
+
0
+
Errors
+
+
+
0 ms
+
Average Latency
+
+
+
0 KB
+
Data Sent
+
+
+
0 KB
+
Data Received
+
+
+
+
Test Log
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/src/web/websocket/websocket-echo.c b/src/web/websocket/websocket-echo.c
new file mode 100644
index 00000000000000..56bb758d19c01d
--- /dev/null
+++ b/src/web/websocket/websocket-echo.c
@@ -0,0 +1,56 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#include "websocket-echo.h"
+
+// Called when a client is connected and ready to exchange messages
+void echo_on_connect(struct websocket_server_client *wsc) {
+ if (!wsc) return;
+
+ websocket_debug(wsc, "Echo protocol client connected");
+
+ // Send a welcome message
+ // websocket_protocol_send_text(wsc, "Welcome to Netdata Echo WebSocket Server");
+}
+
+// Called when a message is received from the client
+void echo_on_message_callback(struct websocket_server_client *wsc, const char *message, size_t length, WEBSOCKET_OPCODE opcode) {
+ if (!wsc || !message)
+ return;
+
+ websocket_debug(wsc, "Echo protocol handling message: type=%s, length=%zu",
+ (opcode == WS_OPCODE_BINARY) ? "binary" : "text",
+ length);
+
+ // Simply echo back the same message with the same opcode
+ websocket_protocol_send_frame(wsc, message, length, opcode, true);
+}
+
+// Called before sending a close frame to the client
+void echo_on_close(struct websocket_server_client *wsc, WEBSOCKET_CLOSE_CODE code, const char *reason) {
+ if (!wsc) return;
+
+ websocket_debug(wsc, "Echo protocol client closing with code %d (%s): %s",
+ code,
+ code == WS_CLOSE_NORMAL ? "Normal" :
+ code == WS_CLOSE_GOING_AWAY ? "Going Away" :
+ code == WS_CLOSE_PROTOCOL_ERROR ? "Protocol Error" :
+ code == WS_CLOSE_INTERNAL_ERROR ? "Internal Error" : "Other",
+ reason ? reason : "No reason provided");
+
+ // Optional: Send a goodbye message
+ // websocket_protocol_send_text(wsc, "Goodbye from Netdata Echo WebSocket Server");
+}
+
+// Called when a client is about to be disconnected
+void echo_on_disconnect(struct websocket_server_client *wsc) {
+ if (!wsc) return;
+
+ websocket_debug(wsc, "Echo protocol client disconnected");
+
+ // No cleanup needed for the Echo protocol since it doesn't maintain any state
+}
+
+// Initialize the Echo protocol
+void websocket_echo_initialize(void) {
+ netdata_log_info("Echo protocol initialized");
+}
\ No newline at end of file
diff --git a/src/web/websocket/websocket-echo.h b/src/web/websocket/websocket-echo.h
new file mode 100644
index 00000000000000..d198f045ee32db
--- /dev/null
+++ b/src/web/websocket/websocket-echo.h
@@ -0,0 +1,17 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#ifndef NETDATA_WEBSOCKET_ECHO_H
+#define NETDATA_WEBSOCKET_ECHO_H
+
+#include "websocket-internal.h"
+
+// WebSocket protocol handler callbacks for Echo protocol
+void echo_on_connect(struct websocket_server_client *wsc);
+void echo_on_message_callback(struct websocket_server_client *wsc, const char *message, size_t length, WEBSOCKET_OPCODE opcode);
+void echo_on_close(struct websocket_server_client *wsc, WEBSOCKET_CLOSE_CODE code, const char *reason);
+void echo_on_disconnect(struct websocket_server_client *wsc);
+
+// Initialize Echo protocol - called during WebSocket subsystem initialization
+void websocket_echo_initialize(void);
+
+#endif // NETDATA_WEBSOCKET_ECHO_H
\ No newline at end of file
diff --git a/src/web/websocket/websocket-handshake.c b/src/web/websocket/websocket-handshake.c
new file mode 100644
index 00000000000000..8dd5f43f755540
--- /dev/null
+++ b/src/web/websocket/websocket-handshake.c
@@ -0,0 +1,436 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#include "web/server/web_client.h"
+#include "websocket-internal.h"
+#include "websocket-jsonrpc.h"
+#include "websocket-echo.h"
+#include "../mcp/adapters/mcp-websocket.h"
+
+// Global array of WebSocket threads
+WEBSOCKET_THREAD websocket_threads[WEBSOCKET_MAX_THREADS];
+
+// Initialize WebSocket thread system
+void websocket_threads_init(void) {
+ for(size_t i = 0; i < WEBSOCKET_MAX_THREADS; i++) {
+ websocket_threads[i].id = i;
+ websocket_threads[i].thread = NULL;
+ websocket_threads[i].running = false;
+ spinlock_init(&websocket_threads[i].spinlock);
+ websocket_threads[i].clients_current = 0;
+ spinlock_init(&websocket_threads[i].clients_spinlock);
+ websocket_threads[i].clients = NULL;
+ websocket_threads[i].ndpl = NULL;
+ websocket_threads[i].cmd.pipe[PIPE_READ] = -1;
+ websocket_threads[i].cmd.pipe[PIPE_WRITE] = -1;
+ }
+}
+
+// Find the thread with the minimum client load and atomically increment its count
+NEVERNULL
+static WEBSOCKET_THREAD *websocket_thread_get_min_load(void) {
+ // Static spinlock to protect the critical section of thread selection
+ static SPINLOCK assign_spinlock = SPINLOCK_INITIALIZER;
+ size_t slot = 0;
+
+ // Critical section: find thread with minimum load and increment its count atomically
+ spinlock_lock(&assign_spinlock);
+
+ // Find the minimum load thread
+ size_t min_clients = websocket_threads[0].clients_current;
+
+ for(size_t i = 1; i < WEBSOCKET_MAX_THREADS; i++) {
+ // Check if this thread has fewer clients
+ if(websocket_threads[i].clients_current < min_clients) {
+ min_clients = websocket_threads[i].clients_current;
+ slot = i;
+ }
+ }
+
+ // Preemptively increment the client count to prevent race conditions
+ // This ensures concurrent client assignments will be properly distributed
+ websocket_threads[slot].clients_current++;
+
+ spinlock_unlock(&assign_spinlock);
+
+ return &websocket_threads[slot];
+}
+
+// Handle socket takeover from web client - similar to stream_receiver_takeover_web_connection
+static void websocket_takeover_web_connection(struct web_client *w, WS_CLIENT *wsc) {
+ // Set the file descriptor and ssl from the web client
+ wsc->sock.fd = w->fd;
+ wsc->sock.ssl = w->ssl;
+
+ w->ssl = NETDATA_SSL_UNSET_CONNECTION;
+
+ WEB_CLIENT_IS_DEAD(w);
+
+ if(web_server_mode == WEB_SERVER_MODE_STATIC_THREADED) {
+ web_client_flag_set(w, WEB_CLIENT_FLAG_DONT_CLOSE_SOCKET);
+ }
+ else {
+ w->fd = -1;
+ }
+
+ // Clear web client buffer
+ buffer_flush(w->response.data);
+
+ web_server_remove_current_socket_from_poll();
+}
+
+// Initialize a thread's poll
+static bool websocket_thread_init_poll(WEBSOCKET_THREAD *wth) {
+ // Create poll instance
+ if(!wth->ndpl) {
+ wth->ndpl = nd_poll_create();
+ if (!wth->ndpl) {
+ netdata_log_error("WEBSOCKET[%zu]: Failed to create poll", wth->id);
+ goto cleanup;
+ }
+ }
+
+ // Create command pipe
+ if(wth->cmd.pipe[PIPE_READ] == -1 || wth->cmd.pipe[PIPE_WRITE] == -1) {
+ if (pipe(wth->cmd.pipe) == -1) {
+ netdata_log_error("WEBSOCKET[%zu]: Failed to create command pipe: %s", wth->id, strerror(errno));
+ goto cleanup;
+ }
+
+ // Set pipe to non-blocking
+ if(fcntl(wth->cmd.pipe[PIPE_READ], F_SETFL, O_NONBLOCK) == -1) {
+ netdata_log_error("WEBSOCKET[%zu]: Failed to set command pipe to non-blocking: %s", wth->id, strerror(errno));
+ goto cleanup;
+ }
+
+ // Add command pipe to poll
+ bool added = nd_poll_add(wth->ndpl, wth->cmd.pipe[PIPE_READ], ND_POLL_READ, &wth->cmd);
+ if(!added) {
+ netdata_log_error("WEBSOCKET[%zu]: Failed to add command pipe to poll", wth->id);
+ goto cleanup;
+ }
+ }
+
+ return true;
+
+cleanup:
+ if(wth->cmd.pipe[PIPE_READ] != -1) {
+ close(wth->cmd.pipe[PIPE_READ]);
+ wth->cmd.pipe[PIPE_READ] = -1;
+ }
+ if(wth->cmd.pipe[PIPE_WRITE] != -1) {
+ close(wth->cmd.pipe[PIPE_WRITE]);
+ wth->cmd.pipe[PIPE_WRITE] = -1;
+ }
+ if(wth->ndpl) {
+ nd_poll_destroy(wth->ndpl);
+ wth->ndpl = NULL;
+ }
+ return false;
+}
+
+// Assign a client to a thread
+static WEBSOCKET_THREAD *websocket_thread_assign_client(WS_CLIENT *wsc) {
+ // Get the thread with the minimum load
+ // Note: client count is already atomically incremented inside this function
+ WEBSOCKET_THREAD *wth = websocket_thread_get_min_load();
+
+ // Lock the thread for initialization
+ spinlock_lock(&wth->spinlock);
+
+ // Start the thread if not running
+ if(!wth->thread) {
+ // Initialize poll
+ if(!websocket_thread_init_poll(wth)) {
+ spinlock_unlock(&wth->spinlock);
+ netdata_log_error("WEBSOCKET[%zu]: Failed to initialize poll", wth->id);
+ goto undo;
+ }
+
+ char thread_name[32];
+ snprintf(thread_name, sizeof(thread_name), "WEBSOCK[%zu]", wth->id);
+ wth->thread = nd_thread_create(thread_name, NETDATA_THREAD_OPTION_DEFAULT, websocket_thread, wth);
+ wth->running = true;
+ }
+
+ // Release the thread lock
+ spinlock_unlock(&wth->spinlock);
+
+ // Link thread to client
+ wsc->wth = wth;
+
+ // Send command to add client
+ if(!websocket_thread_send_command(wth, WEBSOCKET_THREAD_CMD_ADD_CLIENT, wsc->id)) {
+ netdata_log_error("WEBSOCKET[%zu]: Failed to send add client command", wth->id);
+ goto undo;
+ }
+
+ return wth;
+
+undo:
+ // Roll back the client count increment since assignment failed
+ wsc->wth = NULL;
+
+ if(wth) {
+ spinlock_lock(&wth->clients_spinlock);
+ if (wth->clients_current > 0)
+ wth->clients_current--;
+ spinlock_unlock(&wth->clients_spinlock);
+ }
+
+ return NULL;
+}
+
+// Cancel all WebSocket threads
+void websocket_threads_join(void) {
+ for(size_t i = 0; i < WEBSOCKET_MAX_THREADS; i++) {
+ if(websocket_threads[i].thread) {
+ // Send exit command
+ websocket_thread_send_command(&websocket_threads[i], WEBSOCKET_THREAD_CMD_EXIT, 0);
+
+ // Signal thread to cancel
+ nd_thread_signal_cancel(websocket_threads[i].thread);
+ }
+ }
+
+ // Wait for all threads to exit
+ for(size_t i = 0; i < WEBSOCKET_MAX_THREADS; i++) {
+ if(websocket_threads[i].thread) {
+ nd_thread_join(websocket_threads[i].thread);
+ websocket_threads[i].thread = NULL;
+ websocket_threads[i].running = false;
+ }
+ }
+}
+
+// Check if the current HTTP request is a WebSocket handshake request
+static bool websocket_detect_handshake_request(struct web_client *w) {
+ // We need a valid key and to be flagged as a WebSocket request
+ if (!web_client_is_websocket(w) || !w->websocket.key)
+ return false;
+
+ return true;
+}
+
+// Generate the WebSocket accept key as per RFC 6455
+static char *websocket_generate_handshake_key(const char *client_key) {
+ if (!client_key)
+ return NULL;
+
+ // Concatenate the key with the WebSocket GUID
+ char concat_key[256];
+ snprintfz(concat_key, sizeof(concat_key), "%s%s", client_key, WS_GUID);
+
+ // Create SHA-1 hash
+ unsigned char sha_hash[SHA_DIGEST_LENGTH];
+ SHA1((unsigned char *)concat_key, strlen(concat_key), sha_hash);
+
+ // Convert to base64
+ char *accept_key = mallocz(33); // Base64 of SHA-1 is 28 chars + null term
+ netdata_base64_encode((unsigned char *)accept_key, sha_hash, SHA_DIGEST_LENGTH);
+
+ return accept_key;
+}
+
+static bool websocket_send_first_response(WS_CLIENT *wsc, const char *accept_key, WEBSOCKET_EXTENSION ext_flags, bool url_protocol) {
+ CLEAN_BUFFER *wb = buffer_create(1024, NULL);
+
+ buffer_sprintf(wb,
+ "HTTP/1.1 101 Switching Protocols\r\n"
+ "Server: Netdata\r\n"
+ "Upgrade: websocket\r\n"
+ "Connection: Upgrade\r\n"
+ "Sec-WebSocket-Accept: %s\r\n",
+ accept_key
+ );
+
+ // Add the selected subprotocol
+ if(!url_protocol && wsc->protocol != WS_PROTOCOL_UNKNOWN && wsc->protocol != WS_PROTOCOL_DEFAULT)
+ buffer_sprintf(wb, "Sec-WebSocket-Protocol: %s\r\n", WEBSOCKET_PROTOCOL_2str(wsc->protocol));
+
+ switch (wsc->compression.type) {
+ case WS_COMPRESS_DEFLATE:
+ buffer_strcat(wb, "Sec-WebSocket-Extensions: permessage-deflate");
+
+ // Add parameters if different from defaults
+ if (!wsc->compression.client_context_takeover)
+ buffer_strcat(wb, "; client_no_context_takeover");
+
+ if (!wsc->compression.server_context_takeover)
+ buffer_strcat(wb, "; server_no_context_takeover");
+
+ if(ext_flags & WS_EXTENSION_SERVER_MAX_WINDOW_BITS)
+ buffer_sprintf(wb, "; server_max_window_bits=%d", wsc->compression.server_max_window_bits);
+
+ if(ext_flags & WS_EXTENSION_CLIENT_MAX_WINDOW_BITS)
+ buffer_sprintf(wb, "; client_max_window_bits=%d", wsc->compression.client_max_window_bits);
+
+ buffer_strcat(wb, "\r\n");
+ break;
+
+ default:
+ break;
+ }
+
+ // End of headers
+ buffer_strcat(wb, "Sec-WebSocket-Version: 13\r\n");
+ buffer_strcat(wb, "\r\n");
+
+ // Send the handshake response using ND_SOCK - we're still in the web server thread,
+ // so we need to use the persist version to ensure the complete handshake is sent
+ const char *header_str = buffer_tostring(wb);
+ size_t header_len = buffer_strlen(wb);
+ ssize_t bytes = nd_sock_write_persist(&wsc->sock, header_str, header_len, 20);
+
+ websocket_debug(wsc, "Sent WebSocket handshake response: %zd bytes out of %zu bytes", bytes, header_len);
+ return bytes == (ssize_t)header_len;
+}
+
+// Handle the WebSocket handshake procedure
+short int websocket_handle_handshake(struct web_client *w) {
+ if (!websocket_detect_handshake_request(w))
+ return HTTP_RESP_BAD_REQUEST;
+
+ // Generate the accept key
+ char *accept_key = websocket_generate_handshake_key(w->websocket.key);
+ if (!accept_key)
+ return HTTP_RESP_INTERNAL_SERVER_ERROR;
+
+ // Create the WebSocket client object early so we can set up compression
+ WS_CLIENT *wsc = websocket_client_create();
+
+ // Copy client information
+ strncpyz(wsc->client_ip, w->client_ip, sizeof(wsc->client_ip));
+ strncpyz(wsc->client_port, w->client_port, sizeof(wsc->client_port));
+
+ bool url_protocol = false;
+ wsc->protocol = w->websocket.protocol;
+
+ if(wsc->protocol == WS_PROTOCOL_DEFAULT) {
+ const char *path = buffer_tostring(w->url_path_decoded);
+ if (path && path[0] == '/' && path[1])
+ wsc->protocol = WEBSOCKET_PROTOCOL_2id(&path[1]);
+
+ url_protocol = true;
+ }
+
+ // If no protocol is selected by either URL or subprotocol, reject the connection
+ if(wsc->protocol == WS_PROTOCOL_UNKNOWN || wsc->protocol == WS_PROTOCOL_DEFAULT) {
+ netdata_log_error("WEBSOCKET: No valid protocol selected by either URL or subprotocol");
+ freez(accept_key);
+ websocket_client_free(wsc);
+ return HTTP_RESP_BAD_REQUEST;
+ }
+
+ // Take over the connection immediately
+ websocket_takeover_web_connection(w, wsc);
+
+ if((w->websocket.ext_flags & WS_EXTENSION_PERMESSAGE_DEFLATE)) {
+ wsc->compression.enabled = true;
+ wsc->compression.type = WS_COMPRESS_DEFLATE;
+
+ if (w->websocket.ext_flags & WS_EXTENSION_CLIENT_NO_CONTEXT_TAKEOVER)
+ wsc->compression.client_context_takeover = false;
+ else
+ wsc->compression.client_context_takeover = true;
+
+ if (w->websocket.ext_flags & WS_EXTENSION_SERVER_NO_CONTEXT_TAKEOVER)
+ wsc->compression.server_context_takeover = false;
+ else
+ wsc->compression.server_context_takeover = true;
+
+ // Set window bits for both client-to-server and server-to-client directions
+ wsc->compression.client_max_window_bits = w->websocket.client_max_window_bits ? w->websocket.client_max_window_bits : WS_COMPRESS_WINDOW_BITS;
+ wsc->compression.server_max_window_bits = w->websocket.server_max_window_bits ? w->websocket.server_max_window_bits : WS_COMPRESS_WINDOW_BITS;
+ }
+
+ if(!websocket_send_first_response(wsc, accept_key, w->websocket.ext_flags, url_protocol)) {
+ netdata_log_error("WEBSOCKET: Failed to send complete WebSocket handshake response"); // No client yet
+ freez(accept_key);
+ websocket_client_free(wsc);
+ return HTTP_RESP_INTERNAL_SERVER_ERROR;
+ }
+
+ freez(accept_key);
+
+ // Now that we've sent the handshake response successfully, set the connection state to open
+ wsc->state = WS_STATE_OPEN;
+
+ // Set up protocol-specific callbacks based on the selected protocol
+ switch (wsc->protocol) {
+#ifdef NETDATA_INTERNAL_CHECKS
+ case WS_PROTOCOL_JSONRPC:
+ // Set up callbacks for jsonrpc protocol
+ wsc->on_connect = jsonrpc_on_connect;
+ wsc->on_message = jsonrpc_on_message_callback;
+ wsc->on_close = jsonrpc_on_close;
+ wsc->on_disconnect = jsonrpc_on_disconnect;
+ websocket_debug(wsc, "Setting up jsonrpc protocol callbacks");
+ break;
+
+ case WS_PROTOCOL_ECHO:
+ // Set up callbacks for echo protocol
+ wsc->on_connect = echo_on_connect;
+ wsc->on_message = echo_on_message_callback;
+ wsc->on_close = echo_on_close;
+ wsc->on_disconnect = echo_on_disconnect;
+ websocket_debug(wsc, "Setting up echo protocol callbacks");
+ break;
+
+ case WS_PROTOCOL_MCP:
+ // Set up callbacks for MCP protocol
+ wsc->on_connect = mcp_websocket_on_connect;
+ wsc->on_message = mcp_websocket_on_message;
+ wsc->on_close = mcp_websocket_on_close;
+ wsc->on_disconnect = mcp_websocket_on_disconnect;
+ websocket_debug(wsc, "Setting up MCP protocol callbacks");
+ break;
+#endif
+
+ default:
+ // No protocol handler available - this shouldn't happen as we check earlier
+ netdata_log_error("WEBSOCKET: No handler available for protocol %d", wsc->protocol);
+ websocket_client_free(wsc);
+ return HTTP_RESP_BAD_REQUEST;
+ }
+
+ // Register the client in our registry
+ if (!websocket_client_register(wsc)) {
+ websocket_error(wsc, "Failed to register WebSocket client");
+ websocket_client_free(wsc);
+ return HTTP_RESP_WEBSOCKET_HANDSHAKE;
+ }
+
+ // Message structures are already initialized in websocket_client_create()
+
+ // Set socket to non-blocking mode
+ if (fcntl(wsc->sock.fd, F_SETFL, O_NONBLOCK) == -1) {
+ websocket_error(wsc, "Failed to set WebSocket socket to non-blocking mode");
+ websocket_client_free(wsc);
+ return HTTP_RESP_WEBSOCKET_HANDSHAKE;
+ }
+
+ // Assign to a thread
+ WEBSOCKET_THREAD *wth = websocket_thread_assign_client(wsc);
+ if (!wth) {
+ websocket_error(wsc, "Failed to assign WebSocket client to a thread");
+ websocket_client_free(wsc);
+ return HTTP_RESP_WEBSOCKET_HANDSHAKE;
+ }
+
+ nd_log(NDLS_DAEMON, NDLP_DEBUG,
+ "WebSocket connection established with %s:%s using protocol: %s (client ID: %u, thread: %zu), "
+ "compression: %s (client context takeover: %s, server context takeover: %s, "
+ "client window bits: %d, server window bits: %d)",
+ wsc->client_ip, wsc->client_port,
+ WEBSOCKET_PROTOCOL_2str(wsc->protocol),
+ wsc->id, wth->id,
+ wsc->compression.enabled ? "enabled" : "disabled",
+ wsc->compression.client_context_takeover ? "enabled" : "disabled",
+ wsc->compression.server_context_takeover ? "enabled" : "disabled",
+ wsc->compression.client_max_window_bits,
+ wsc->compression.server_max_window_bits);
+
+ // Important: This code doesn't actually get sent to the client since we've already
+ // taken over the socket. It's just used by the caller to identify what happened.
+ return HTTP_RESP_WEBSOCKET_HANDSHAKE;
+}
diff --git a/src/web/websocket/websocket-internal.h b/src/web/websocket/websocket-internal.h
new file mode 100644
index 00000000000000..39e35b9d9cc92f
--- /dev/null
+++ b/src/web/websocket/websocket-internal.h
@@ -0,0 +1,252 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#ifndef NETDATA_WEBSOCKET_INTERNAL_H
+#define NETDATA_WEBSOCKET_INTERNAL_H
+
+#include "websocket.h"
+
+// Maximum number of WebSocket threads
+#define WEBSOCKET_MAX_THREADS 2
+
+#define WORKERS_WEBSOCKET_POLL 0
+#define WORKERS_WEBSOCKET_CMD_READ 1
+#define WORKERS_WEBSOCKET_CMD_EXIT 2
+#define WORKERS_WEBSOCKET_CMD_ADD 3
+#define WORKERS_WEBSOCKET_CMD_DEL 4
+#define WORKERS_WEBSOCKET_CMD_BROADCAST 5
+#define WORKERS_WEBSOCKET_CMD_UNKNOWN 6
+#define WORKERS_WEBSOCKET_SOCK_RECEIVE 7
+#define WORKERS_WEBSOCKET_SOCK_SEND 8
+#define WORKERS_WEBSOCKET_SOCK_ERROR 9
+#define WORKERS_WEBSOCKET_CLIENT_TIMEOUT 10
+#define WORKERS_WEBSOCKET_SEND_PING 11
+#define WORKERS_WEBSOCKET_CLIENT_STUCK 12
+
+#define WORKERS_WEBSOCKET_INCOMPLETE_FRAME 13
+#define WORKERS_WEBSOCKET_COMPLETE_FRAME 14
+#define WORKERS_WEBSOCKET_MESSAGE 15
+#define WORKERS_WEBSOCKET_MSG_PING 16
+#define WORKERS_WEBSOCKET_MSG_PONG 17
+#define WORKERS_WEBSOCKET_MSG_CLOSE 18
+#define WORKERS_WEBSOCKET_MSG_INVALID 19
+
+// Forward declaration for thread structure
+struct websocket_thread;
+
+#include "websocket-compression.h"
+
+// WebSocket protocol constants
+#define WS_GUID "258EAFA5-E914-47DA-95CA-C5AB0DC85B11"
+
+// WebSocket frame constants
+#define WS_FIN 0x80 // Final frame bit
+#define WS_RSV1 0x40 // Reserved bit 1 (used for compression)
+#define WS_MASK 0x80 // Mask bit
+// Frame size limit - affects fragmentation but not total message size
+#define WS_MAX_FRAME_LENGTH (20 * 1024 * 1024) // 20MB max frame size
+
+// Total message size limits - these prevent resource exhaustion
+#define WEBSOCKET_MAX_COMPRESSED_SIZE (20ULL * 1024 * 1024) // 20MB max compressed message
+#define WEBSOCKET_MAX_UNCOMPRESSED_SIZE (200ULL * 1024 * 1024) // 200MB max uncompressed message
+
+// WebSocket frame header structure - used for processing frame headers
+typedef struct websocket_frame_header {
+ unsigned char fin:1;
+ unsigned char rsv1:1;
+ unsigned char rsv2:1;
+ unsigned char rsv3:1;
+ unsigned char opcode:4;
+ unsigned char mask:1;
+ unsigned char len:7;
+
+ unsigned char mask_key[4]; // Masking key (if present)
+ size_t frame_size; // Size of the entire frame
+ size_t header_size; // Size of the header
+ size_t payload_length; // Length of the payload data
+ void *payload; // Pointer to the payload data
+} WEBSOCKET_FRAME_HEADER;
+
+// Buffer for message data (used for reassembly of fragmented messages)
+typedef struct websocket_buffer {
+ char *data; // Buffer holding message data
+ size_t length; // Current buffer length
+ size_t size; // Allocated buffer size
+} WS_BUF;
+
+// Forward declaration for client structure
+struct websocket_server_client;
+
+// Function prototypes for buffer handling
+
+// Message and payload processing functions
+void websocket_client_message_reset(struct websocket_server_client *wsc);
+bool websocket_client_process_message(struct websocket_server_client *wsc);
+bool websocket_client_decompress_message(struct websocket_server_client *wsc);
+
+// Additional helper functions
+bool websocket_frame_is_control_opcode(WEBSOCKET_OPCODE opcode);
+bool websocket_validate_utf8(const char *data, size_t length);
+
+#include "websocket-buffer.h"
+
+// WebSocket connection context - full structure definition
+struct websocket_server_client {
+ WEBSOCKET_STATE state;
+ ND_SOCK sock; // Socket with SSL abstraction
+ uint32_t id; // Unique client ID
+ size_t max_message_size;
+ time_t connected_t; // Connection timestamp
+ time_t last_activity_t; // Last activity timestamp
+
+ // Buffer for I/O data
+ struct circular_buffer in_buffer; // Incoming raw data (circular buffer)
+ struct circular_buffer out_buffer; // Outgoing raw data (circular buffer)
+ size_t next_frame_size; // The size of the next complete frame to read
+
+ // Connection info
+ char client_ip[INET6_ADDRSTRLEN];
+ char client_port[NI_MAXSERV];
+ WEBSOCKET_PROTOCOL protocol; // The negotiated subprotocol
+
+ // Thread management
+ struct websocket_thread *wth; // The thread handling this client
+ struct websocket_server_client *prev; // Linked list for thread's client management
+ struct websocket_server_client *next; // Linked list for thread's client management
+
+ // Message processing state
+ WS_BUF payload; // Pre-allocated buffer for message data
+ WS_BUF u_payload; // Pre-allocated buffer for uncompressed message data
+ WEBSOCKET_OPCODE opcode; // Current message opcode
+ bool is_compressed; // Whether the current message is compressed
+ bool message_complete; // Whether the current message is complete
+ size_t message_id; // Sequential ID for messages, starting from 0
+ size_t frame_id; // Sequential ID for frames within current message
+
+ // Compression state
+ WEBSOCKET_COMPRESSION_CTX compression;
+
+ // Connection closing state
+ bool flush_and_remove_client; // Flag to indicate we're just flushing buffer before close
+
+ // Protocol handler callbacks
+ void (*on_connect)(struct websocket_server_client *wsc); // Called when a client is successfully connected
+ void (*on_message)(struct websocket_server_client *wsc, const char *message, size_t length, WEBSOCKET_OPCODE opcode); // Called when a message is received
+ void (*on_close)(struct websocket_server_client *wsc, WEBSOCKET_CLOSE_CODE code, const char *reason); // Called BEFORE sending close frame
+ void (*on_disconnect)(struct websocket_server_client *wsc); // Called when a client is disconnected
+
+ // User data for application use
+ void *user_data;
+};
+
+// Forward declarations for websocket client
+typedef struct websocket_server_client WS_CLIENT;
+
+// WebSocket thread structure
+typedef struct websocket_thread {
+ size_t id; // Thread ID
+ pid_t tid;
+
+ struct {
+ ND_THREAD *thread; // Thread handle
+ bool running; // Thread running status
+ SPINLOCK spinlock; // Thread spinlock
+ };
+
+ size_t clients_current; // Current number of clients in the thread
+ SPINLOCK clients_spinlock; // Spinlock for client operations
+ struct websocket_server_client *clients; // Head of the clients double-linked list
+
+ nd_poll_t *ndpl; // Poll instance
+
+ struct {
+ int pipe[2]; // Command pipe [0] = read, [1] = write
+ } cmd;
+
+} WEBSOCKET_THREAD;
+
+// Global array of WebSocket threads
+extern WEBSOCKET_THREAD websocket_threads[WEBSOCKET_MAX_THREADS];
+
+// Define JudyL typed structure for WebSocket clients
+DEFINE_JUDYL_TYPED(WS_CLIENTS, struct websocket_server_client *);
+
+// WebSocket thread commands
+#define WEBSOCKET_THREAD_CMD_EXIT 1
+#define WEBSOCKET_THREAD_CMD_ADD_CLIENT 2
+#define WEBSOCKET_THREAD_CMD_REMOVE_CLIENT 3
+#define WEBSOCKET_THREAD_CMD_BROADCAST 4
+
+// Buffer size definitions for WebSocket operations
+#define WEBSOCKET_RECEIVE_BUFFER_SIZE 4096 // Size used for network read operations
+
+// Initial buffer sizes
+#define WEBSOCKET_IN_BUFFER_INITIAL_SIZE 8192UL // Initial size for incoming data buffer
+#define WEBSOCKET_OUT_BUFFER_INITIAL_SIZE 16384UL // Initial size for outgoing data buffer
+#define WEBSOCKET_PAYLOAD_INITIAL_SIZE 8192UL // Initial size for message payload buffer
+#define WEBSOCKET_UNPACKED_INITIAL_SIZE 16384UL // Initial size for uncompressed message buffer
+
+// Maximum buffer sizes to protect against memory exhaustion
+#define WEBSOCKET_IN_BUFFER_MAX_SIZE (20UL * 1024 * 1024) // 10MiB max for incoming data buffer
+#define WEBSOCKET_OUT_BUFFER_MAX_SIZE (20UL * 1024 * 1024) // 10MiB max for outgoing data buffer
+
+NEVERNULL
+WS_CLIENT *websocket_client_create(void);
+
+// Thread management
+void websocket_threads_init(void);
+void websocket_threads_join(void);
+bool websocket_thread_send_command(WEBSOCKET_THREAD *wth, uint8_t cmd, uint32_t id);
+bool websocket_thread_send_broadcast(WEBSOCKET_THREAD *wth, WEBSOCKET_OPCODE opcode, const char *message);
+void *websocket_thread(void *ptr);
+void websocket_thread_enqueue_client(WEBSOCKET_THREAD *wth, struct websocket_server_client *wsc);
+bool websocket_thread_update_client_poll_flags(struct websocket_server_client *wsc);
+
+// Client registry internals
+void websocket_client_free(WS_CLIENT *wsc);
+bool websocket_client_register(struct websocket_server_client *wsc);
+void websocket_client_unregister(struct websocket_server_client *wsc);
+struct websocket_server_client *websocket_client_find_by_id(size_t id);
+
+// Utility functions
+// Validates a WebSocket close code according to RFC 6455
+bool websocket_validate_close_code(uint16_t code);
+void websocket_debug(WS_CLIENT *wsc, const char *format, ...);
+void websocket_info(WS_CLIENT *wsc, const char *format, ...);
+void websocket_error(WS_CLIENT *wsc, const char *format, ...);
+void websocket_dump_debug(WS_CLIENT *wsc, const char *payload, size_t payload_length, const char *format, ...);
+
+// Frame processing result codes
+typedef enum {
+ WS_FRAME_ERROR = -1, // Processing error occurred
+ WS_FRAME_COMPLETE = 0, // Frame processing completed successfully
+ WS_FRAME_NEED_MORE_DATA = 1, // Need more data to complete frame processing
+ WS_FRAME_MESSAGE_READY = 2 // A complete message is ready to be processed
+} WEBSOCKET_FRAME_RESULT;
+
+// Centralized protocol validation functions
+void websocket_protocol_exception(WS_CLIENT *wsc, WEBSOCKET_CLOSE_CODE reason_code, const char *reason_txt);
+
+// Protocol receiver functions - websocket-protocol-rcv.c
+ssize_t websocket_protocol_got_data(WS_CLIENT *wsc, char *data, size_t length);
+
+// Protocol sender functions - websocket-protocol-snd.c
+int websocket_protocol_send_frame(
+ WS_CLIENT *wsc, const char *payload,
+ size_t payload_len, WEBSOCKET_OPCODE opcode, bool use_compression);
+int websocket_protocol_send_text(WS_CLIENT *wsc, const char *text);
+int websocket_protocol_send_binary(WS_CLIENT *wsc, const void *data, size_t length);
+int websocket_protocol_send_close(WS_CLIENT *wsc, WEBSOCKET_CLOSE_CODE code, const char *reason);
+int websocket_protocol_send_ping(WS_CLIENT *wsc, const char *data, size_t length);
+int websocket_protocol_send_pong(WS_CLIENT *wsc, const char *data, size_t length);
+
+// IO functions from old implementation - will be refactored
+ssize_t websocket_receive_data(struct websocket_server_client *wsc);
+ssize_t websocket_write_data(struct websocket_server_client *wsc);
+
+// WebSocket message sending functions
+int websocket_send_message(WS_CLIENT *wsc, const char *message, size_t length, WEBSOCKET_OPCODE opcode);
+int websocket_broadcast_message(const char *message, WEBSOCKET_OPCODE opcode);
+
+bool websocket_protocol_parse_header_from_buffer(const char *buffer, size_t length,
+ WEBSOCKET_FRAME_HEADER *header);
+#endif // NETDATA_WEBSOCKET_INTERNAL_H
\ No newline at end of file
diff --git a/src/web/websocket/websocket-jsonrpc.c b/src/web/websocket/websocket-jsonrpc.c
new file mode 100644
index 00000000000000..2ae35311bc11a7
--- /dev/null
+++ b/src/web/websocket/websocket-jsonrpc.c
@@ -0,0 +1,311 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#include "websocket-jsonrpc.h"
+
+static int websocket_client_send_json(struct websocket_server_client *wsc, struct json_object *json) {
+ if (!wsc || !json)
+ return -1;
+
+ websocket_debug(wsc, "Sending JSON message");
+
+ // Convert JSON to string
+ const char *json_str = json_object_to_json_string_ext(json, JSON_C_TO_STRING_PLAIN);
+ if (!json_str) {
+ websocket_error(wsc, "Failed to convert JSON to string");
+ return -1;
+ }
+
+ // Send as text message
+ int result = websocket_protocol_send_text(wsc, json_str);
+
+ websocket_debug(wsc, "Sent JSON message, result=%d", result);
+ return result;
+}
+
+// Called when a client is connected and ready to exchange messages
+void jsonrpc_on_connect(struct websocket_server_client *wsc) {
+ if (!wsc) return;
+
+ websocket_debug(wsc, "JSON-RPC client connected");
+
+ // Future implementation - can send a welcome message or initialize client state
+}
+
+// Called when a client is about to be disconnected
+void jsonrpc_on_disconnect(struct websocket_server_client *wsc) {
+ if (!wsc) return;
+
+ websocket_debug(wsc, "JSON-RPC client disconnected");
+
+ // Future implementation - can clean up any client-specific resources
+}
+
+// Called before sending a close frame to the client
+void jsonrpc_on_close(struct websocket_server_client *wsc, WEBSOCKET_CLOSE_CODE code, const char *reason) {
+ if (!wsc) return;
+
+ websocket_debug(wsc, "JSON-RPC client closing with code %d (%s): %s",
+ code,
+ code == WS_CLOSE_NORMAL ? "Normal" :
+ code == WS_CLOSE_GOING_AWAY ? "Going Away" :
+ code == WS_CLOSE_PROTOCOL_ERROR ? "Protocol Error" :
+ code == WS_CLOSE_INTERNAL_ERROR ? "Internal Error" : "Other",
+ reason ? reason : "No reason provided");
+
+ // Future implementation - can send a final message before closing
+}
+
+// Adapter function for the on_message callback to match WS_CLIENT callback signature
+void jsonrpc_on_message_callback(struct websocket_server_client *wsc, const char *message, size_t length, WEBSOCKET_OPCODE opcode) {
+ if (!wsc || !message || length == 0)
+ return;
+
+ // JSON-RPC only works with text messages
+ if (opcode != WS_OPCODE_TEXT) {
+ websocket_error(wsc, "JSON-RPC protocol received non-text message, ignoring");
+ return;
+ }
+
+ websocket_debug(wsc, "JSON-RPC callback processing message: length=%zu", length);
+
+ // Process the message
+ websocket_jsonrpc_process_message(wsc, message, length);
+}
+
+// Utility function to extract parameters from a request
+struct json_object *websocket_jsonrpc_get_params(struct json_object *request) {
+ if (!request)
+ return NULL;
+
+ struct json_object *params = NULL;
+ if (json_object_object_get_ex(request, "params", ¶ms)) {
+ return params; // Return the params object if it exists
+ }
+
+ return NULL; // No params found
+}
+
+// Handler for the "echo" method - simply returns the params as the result
+static void jsonrpc_echo_handler(WS_CLIENT *wsc, struct json_object *request, uint64_t id) {
+ // Get the params if available
+ struct json_object *params = websocket_jsonrpc_get_params(request);
+
+ // Clone the params to avoid ownership issues
+ struct json_object *result = params ? json_object_get(params) : json_object_new_object();
+
+ // Send response
+ websocket_jsonrpc_response_result(wsc, result, id);
+}
+
+// Define a fixed array of method handlers
+static struct {
+ const char *method;
+ jsonrpc_method_handler handler;
+} jsonrpc_methods[] = {
+ { "echo", jsonrpc_echo_handler },
+
+ // Add more methods here as needed
+ // { "method_name", method_handler_function },
+
+ // Terminator
+ { NULL, NULL }
+};
+
+// Initialize the JSON-RPC protocol
+void websocket_jsonrpc_initialize(void) {
+ netdata_log_info("JSON-RPC protocol initialized with built-in methods");
+}
+
+// Find a method handler
+static jsonrpc_method_handler find_method_handler(const char *method) {
+ if (!method)
+ return NULL;
+
+ // Simple linear search through the fixed array
+ for (int i = 0; jsonrpc_methods[i].method != NULL; i++) {
+ if (strcmp(jsonrpc_methods[i].method, method) == 0) {
+ return jsonrpc_methods[i].handler;
+ }
+ }
+
+ return NULL;
+}
+
+// Validate JSON-RPC request according to specification
+bool websocket_jsonrpc_validate_request(struct json_object *request) {
+ if (!request || json_object_get_type(request) != json_type_object)
+ return false;
+
+ // Check for required fields
+ struct json_object *jsonrpc, *method;
+
+ if (!json_object_object_get_ex(request, "jsonrpc", &jsonrpc) ||
+ !json_object_object_get_ex(request, "method", &method))
+ return false;
+
+ // Validate jsonrpc version
+ if (json_object_get_type(jsonrpc) != json_type_string ||
+ strcmp(json_object_get_string(jsonrpc), JSONRPC_VERSION) != 0)
+ return false;
+
+ // Validate method
+ if (json_object_get_type(method) != json_type_string)
+ return false;
+
+ return true;
+}
+
+// Process a JSON-RPC request
+static void process_jsonrpc_request(WS_CLIENT *wsc, struct json_object *request) {
+ if (!websocket_jsonrpc_validate_request(request)) {
+ websocket_jsonrpc_response_error(wsc, JSONRPC_ERROR_INVALID_REQUEST,
+ "Invalid JSON-RPC request", 0);
+ return;
+ }
+
+ // Extract request components
+ struct json_object *method_obj, *id_obj = NULL;
+
+ json_object_object_get_ex(request, "method", &method_obj);
+ const char *method = json_object_get_string(method_obj);
+
+ // Get ID if present (0 indicates a notification that requires no response)
+ uint64_t id = 0;
+ bool has_id = json_object_object_get_ex(request, "id", &id_obj);
+ if (has_id && id_obj && json_object_get_type(id_obj) != json_type_null) {
+ if (json_object_get_type(id_obj) == json_type_int)
+ id = json_object_get_int64(id_obj);
+ else if (json_object_get_type(id_obj) == json_type_string) {
+ // Try to convert string ID to integer if possible
+ const char *id_str = json_object_get_string(id_obj);
+ char *endptr;
+ id = (uint64_t)strtoll(id_str, &endptr, 10);
+ if (*endptr != '\0') {
+ // Not a number, just hash the string for an ID
+ id = simple_hash(id_str);
+ }
+ }
+ }
+
+ // Find handler for the requested method
+ jsonrpc_method_handler handler = find_method_handler(method);
+ if (!handler) {
+ if (has_id) {
+ websocket_jsonrpc_response_error(wsc, JSONRPC_ERROR_METHOD_NOT_FOUND,
+ "Method not found", id);
+ }
+ return;
+ }
+
+ // Call the handler with the request
+ handler(wsc, request, id);
+}
+
+// Process a WebSocket message as JSON-RPC
+bool websocket_jsonrpc_process_message(WS_CLIENT *wsc, const char *message, size_t length) {
+ if (!wsc || !message || length == 0)
+ return false;
+
+ websocket_debug(wsc, "Processing JSON-RPC message: length=%zu", length);
+
+ // Parse the JSON
+ struct json_object *json = json_tokener_parse(message);
+ if (!json) {
+ websocket_error(wsc, "Failed to parse JSON-RPC message");
+ websocket_jsonrpc_response_error(wsc, JSONRPC_ERROR_PARSE_ERROR,
+ "Parse error", 0);
+ return false;
+ }
+
+ // Process based on message type
+ if (json_object_get_type(json) == json_type_array) {
+ // Batch request
+ websocket_debug(wsc, "Processing JSON-RPC batch request");
+
+ int array_len = json_object_array_length(json);
+ for (int i = 0; i < array_len; i++) {
+ struct json_object *request = json_object_array_get_idx(json, i);
+ process_jsonrpc_request(wsc, request);
+ }
+ }
+ else if (json_object_get_type(json) == json_type_object) {
+ // Single request
+ process_jsonrpc_request(wsc, json);
+ }
+ else {
+ // Invalid request
+ websocket_jsonrpc_response_error(wsc, JSONRPC_ERROR_INVALID_REQUEST,
+ "Invalid request", 0);
+ json_object_put(json);
+ return false;
+ }
+
+ json_object_put(json);
+ return true;
+}
+
+// Create and send a JSON-RPC success response
+void websocket_jsonrpc_response_result(WS_CLIENT *wsc, struct json_object *result, uint64_t id) {
+ if (!wsc || id == 0) // No response for notifications (id == 0)
+ return;
+
+ struct json_object *response = json_object_new_object();
+
+ // Add required fields
+ json_object_object_add(response, "jsonrpc", json_object_new_string(JSONRPC_VERSION));
+
+ // Add result (takes ownership of the result object)
+ if (result) {
+ json_object_object_add(response, "result", result);
+ } else {
+ json_object_object_add(response, "result", json_object_new_object());
+ }
+
+ // Add ID
+ json_object_object_add(response, "id", json_object_new_int64(id));
+
+ // Send the response
+ websocket_client_send_json(wsc, response);
+
+ // Free the response object
+ json_object_put(response);
+}
+
+// Create and send a JSON-RPC error response
+void websocket_jsonrpc_response_error(WS_CLIENT *wsc, JSONRPC_ERROR_CODE code, const char *message, uint64_t id) {
+ websocket_jsonrpc_response_error_with_data(wsc, code, message, NULL, id);
+}
+
+// Create and send a JSON-RPC error response with additional data
+void websocket_jsonrpc_response_error_with_data(WS_CLIENT *wsc, JSONRPC_ERROR_CODE code, const char *message,
+ struct json_object *data, uint64_t id) {
+ if (!wsc || id == 0) // No response for notifications (id == 0)
+ return;
+
+ struct json_object *response = json_object_new_object();
+ struct json_object *error = json_object_new_object();
+
+ // Add required fields
+ json_object_object_add(response, "jsonrpc", json_object_new_string(JSONRPC_VERSION));
+
+ // Add error code and message
+ json_object_object_add(error, "code", json_object_new_int(code));
+ json_object_object_add(error, "message", json_object_new_string(message ? message : "Unknown error"));
+
+ // Add error data if provided
+ if (data) {
+ json_object_object_add(error, "data", data);
+ }
+
+ // Add error object to response
+ json_object_object_add(response, "error", error);
+
+ // Add ID
+ json_object_object_add(response, "id", json_object_new_int64(id));
+
+ // Send the response
+ websocket_client_send_json(wsc, response);
+
+ // Free the response object
+ json_object_put(response);
+}
diff --git a/src/web/websocket/websocket-jsonrpc.h b/src/web/websocket/websocket-jsonrpc.h
new file mode 100644
index 00000000000000..4800a019e27a73
--- /dev/null
+++ b/src/web/websocket/websocket-jsonrpc.h
@@ -0,0 +1,56 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#ifndef NETDATA_WEBSOCKET_JSONRPC_H
+#define NETDATA_WEBSOCKET_JSONRPC_H
+
+#include "websocket-internal.h"
+
+// JSON-RPC 2.0 protocol constants
+#define JSONRPC_VERSION "2.0"
+
+// JSON-RPC error codes as per specification
+typedef enum {
+ // Official JSON-RPC 2.0 error codes
+ JSONRPC_ERROR_PARSE_ERROR = -32700, // Invalid JSON was received by the server
+ JSONRPC_ERROR_INVALID_REQUEST = -32600, // The JSON sent is not a valid Request object
+ JSONRPC_ERROR_METHOD_NOT_FOUND = -32601, // The method does not exist / is not available
+ JSONRPC_ERROR_INVALID_PARAMS = -32602, // Invalid method parameter(s)
+ JSONRPC_ERROR_INTERNAL_ERROR = -32603, // Internal JSON-RPC error
+
+ // -32000 to -32099 are reserved for implementation-defined server-errors
+ JSONRPC_ERROR_SERVER_ERROR = -32000, // Generic server error
+
+ // Netdata specific error codes (using reserved server-error range)
+ JSONRPC_ERROR_NETDATA_PERMISSION_DENIED = -32030, // Permission denied
+ JSONRPC_ERROR_NETDATA_NOT_SUPPORTED = -32031, // Feature not supported
+ JSONRPC_ERROR_NETDATA_RATE_LIMIT = -32032, // Rate limit exceeded
+} JSONRPC_ERROR_CODE;
+
+// Method handler function type
+typedef void (*jsonrpc_method_handler)(WS_CLIENT *wsc, struct json_object *request, uint64_t id);
+
+// Initialize WebSocket JSON-RPC protocol
+void websocket_jsonrpc_initialize(void);
+
+// Process a WebSocket message as JSON-RPC
+bool websocket_jsonrpc_process_message(WS_CLIENT *wsc, const char *message, size_t length);
+
+// WebSocket protocol handler callbacks for JSON-RPC
+void jsonrpc_on_connect(struct websocket_server_client *wsc);
+void jsonrpc_on_message_callback(struct websocket_server_client *wsc, const char *message, size_t length, WEBSOCKET_OPCODE opcode);
+void jsonrpc_on_close(struct websocket_server_client *wsc, WEBSOCKET_CLOSE_CODE code, const char *reason);
+void jsonrpc_on_disconnect(struct websocket_server_client *wsc);
+
+// Response functions
+void websocket_jsonrpc_response_result(WS_CLIENT *wsc, struct json_object *result, uint64_t id);
+void websocket_jsonrpc_response_error(WS_CLIENT *wsc, JSONRPC_ERROR_CODE code, const char *message, uint64_t id);
+void websocket_jsonrpc_response_error_with_data(WS_CLIENT *wsc, JSONRPC_ERROR_CODE code, const char *message,
+ struct json_object *data, uint64_t id);
+
+// Helper functions
+bool websocket_jsonrpc_validate_request(struct json_object *request);
+
+// Utility function to extract parameters from a request
+struct json_object *websocket_jsonrpc_get_params(struct json_object *request);
+
+#endif // NETDATA_WEBSOCKET_JSONRPC_H
diff --git a/src/web/websocket/websocket-message.c b/src/web/websocket/websocket-message.c
new file mode 100644
index 00000000000000..9283d32b7581eb
--- /dev/null
+++ b/src/web/websocket/websocket-message.c
@@ -0,0 +1,190 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#include "websocket-internal.h"
+
+// Helper function to determine if an opcode is a control opcode
+bool websocket_frame_is_control_opcode(WEBSOCKET_OPCODE opcode) {
+ return (opcode == WS_OPCODE_CLOSE ||
+ opcode == WS_OPCODE_PING ||
+ opcode == WS_OPCODE_PONG);
+}
+
+// Validates that a buffer contains valid UTF-8 encoded data
+// Returns true if the data is valid UTF-8, false otherwise
+bool websocket_validate_utf8(const char *data, size_t length) {
+ if (!data)
+ return length == 0; // Empty data is valid
+
+ const unsigned char *bytes = (const unsigned char *)data;
+ size_t i = 0;
+
+ while (i < length) {
+ // Check for ASCII (single-byte character)
+ if (bytes[i] <= 0x7F) {
+ i++;
+ continue;
+ }
+
+ // Check for 2-byte sequence
+ else if ((bytes[i] & 0xE0) == 0xC0) {
+ // Need at least 2 bytes
+ if (i + 1 >= length)
+ return false;
+
+ // Second byte must be a continuation byte
+ if ((bytes[i+1] & 0xC0) != 0x80)
+ return false;
+
+ // Must not be overlong encoding
+ if (bytes[i] < 0xC2)
+ return false;
+
+ i += 2;
+ }
+
+ // Check for 3-byte sequence
+ else if ((bytes[i] & 0xF0) == 0xE0) {
+ // Need at least 3 bytes
+ if (i + 2 >= length)
+ return false;
+
+ // Second and third bytes must be continuation bytes
+ if ((bytes[i+1] & 0xC0) != 0x80 || (bytes[i+2] & 0xC0) != 0x80)
+ return false;
+
+ // Check for overlong encoding
+ if (bytes[i] == 0xE0 && (bytes[i+1] & 0xE0) == 0x80)
+ return false;
+
+ // Check for UTF-16 surrogates (not allowed in UTF-8)
+ if (bytes[i] == 0xED && (bytes[i+1] & 0xE0) == 0xA0)
+ return false;
+
+ i += 3;
+ }
+
+ // Check for 4-byte sequence
+ else if ((bytes[i] & 0xF8) == 0xF0) {
+ // Need at least 4 bytes
+ if (i + 3 >= length)
+ return false;
+
+ // Second, third, and fourth bytes must be continuation bytes
+ if ((bytes[i+1] & 0xC0) != 0x80 ||
+ (bytes[i+2] & 0xC0) != 0x80 ||
+ (bytes[i+3] & 0xC0) != 0x80)
+ return false;
+
+ // Check for overlong encoding
+ if (bytes[i] == 0xF0 && (bytes[i+1] & 0xF0) == 0x80)
+ return false;
+
+ // Check for values outside Unicode range
+ if (bytes[i] > 0xF4 || (bytes[i] == 0xF4 && bytes[i+1] > 0x8F))
+ return false;
+
+ i += 4;
+ }
+
+ // Invalid UTF-8 leading byte
+ else {
+ return false;
+ }
+ }
+
+ return true;
+}
+
+// Reset a client's message state for a new message
+void websocket_client_message_reset(WS_CLIENT *wsc) {
+ if (!wsc)
+ return;
+
+ // Reset message buffer
+ wsb_reset(&wsc->payload);
+
+ // Also reset uncompressed buffer to avoid keeping stale data
+ wsb_reset(&wsc->u_payload);
+
+ // Reset client's message state
+ // We set message_complete to true by default (no fragmented message in progress),
+ // but this will be overridden based on the FIN bit for actual frames
+ wsc->message_complete = true;
+ wsc->is_compressed = false;
+ wsc->opcode = WS_OPCODE_TEXT; // Default opcode
+ wsc->frame_id = 0;
+}
+
+// Process a complete message (decompress if needed and call handler)
+bool websocket_client_process_message(WS_CLIENT *wsc) {
+ if (!wsc || !wsc->message_complete)
+ return false;
+
+ worker_is_busy(WORKERS_WEBSOCKET_MESSAGE);
+
+ websocket_debug(wsc, "Processing message (opcode=0x%x, is_compressed=%d, length=%zu)",
+ wsc->opcode, wsc->is_compressed,
+ wsb_length(&wsc->payload));
+
+ // Handle control frames immediately
+ if (wsc->opcode != WS_OPCODE_TEXT && wsc->opcode != WS_OPCODE_BINARY) {
+ websocket_debug(wsc, "Control frame (opcode=0x%x) should not be handled by %s()", wsc->opcode, __FUNCTION__);
+ return false;
+ }
+
+ // At this point, we know we're dealing with a data frame (text or binary)
+ WS_BUF *wsb;
+
+ // Handle decompression if needed
+ if (wsc->is_compressed) {
+ if (!websocket_client_decompress_message(wsc)) {
+ websocket_protocol_exception(wsc, WS_CLOSE_INTERNAL_ERROR, "Decompression failed");
+ return false;
+ }
+ wsb = &wsc->u_payload;
+ }
+ else
+ wsb = &wsc->payload;
+
+ // For uncompressed messages, we just use payload buffer directly
+ if (wsc->opcode == WS_OPCODE_TEXT) {
+ wsb_null_terminate(wsb);
+
+ if (!websocket_validate_utf8(wsb_data(wsb), wsb_length(wsb))) {
+ websocket_protocol_exception(wsc, WS_CLOSE_INVALID_PAYLOAD,
+ "Invalid UTF-8 data in text message");
+ return false;
+ }
+ }
+
+ // Now handle the uncompressed message - using the new function
+ // that contains the actual handler logic
+
+ websocket_debug(wsc, "Handling message: type=%s, length=%zu, protocol=%d",
+ (wsc->opcode == WS_OPCODE_BINARY) ? "binary" : "text",
+ wsb_length(wsb), wsc->protocol);
+
+ // Ensure text messages are null-terminated
+ if (wsc->opcode == WS_OPCODE_TEXT)
+ wsb_null_terminate(wsb);
+
+ // Call the message callback if set - this allows protocols to be handled dynamically
+ if (wsc->on_message) {
+ websocket_debug(wsc, "Calling client message handler for protocol %d", wsc->protocol);
+ wsc->on_message(wsc, wsb_data(wsb), wsb_length(wsb), wsc->opcode);
+ }
+ else {
+ // No handler registered - this should not happen as we check during handshake
+ websocket_error(wsc, "No message handler registered for protocol %d", wsc->protocol);
+ return false;
+ }
+
+ // Update client message stats
+ wsc->message_id++;
+ wsc->frame_id = 0;
+
+ // Reset for the next message
+ websocket_client_message_reset(wsc);
+
+ return true;
+}
diff --git a/src/web/websocket/websocket-receive.c b/src/web/websocket/websocket-receive.c
new file mode 100644
index 00000000000000..a78eb1efa72bac
--- /dev/null
+++ b/src/web/websocket/websocket-receive.c
@@ -0,0 +1,882 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#include "websocket-internal.h"
+
+// --------------------------------------------------------------------------------------------------------------------
+// reading from the socket
+
+static inline bool cbuffer_has_enough_data_for_next_frame(WS_CLIENT *wsc) {
+ return wsc->next_frame_size > 0 &&
+ cbuffer_used_size_unsafe(&wsc->in_buffer) >= wsc->next_frame_size;
+}
+
+static inline bool cbuffer_next_frame_is_fragmented(WS_CLIENT *wsc) {
+ return cbuffer_has_enough_data_for_next_frame(wsc) &&
+ cbuffer_next_unsafe(&wsc->in_buffer, NULL) < wsc->next_frame_size;
+}
+
+static ssize_t websocket_received_data_process(WS_CLIENT *wsc, ssize_t bytes_read) {
+ if(cbuffer_next_frame_is_fragmented(wsc))
+ cbuffer_ensure_unwrapped_size(&wsc->in_buffer, wsc->next_frame_size);
+
+ char *buffer_pos;
+ size_t contiguous_input = cbuffer_next_unsafe(&wsc->in_buffer, &buffer_pos);
+
+ // Now we have contiguous data for processing
+ ssize_t bytes_consumed = websocket_protocol_got_data(wsc, buffer_pos, contiguous_input);
+ if (bytes_consumed < 0) {
+ if (bytes_consumed < -1) {
+ bytes_consumed = -bytes_consumed;
+ cbuffer_remove_unsafe(&wsc->in_buffer, bytes_consumed);
+ }
+
+ websocket_error(wsc, "Failed to process received data");
+ return -1;
+ }
+
+ // Check if bytes_processed is 0 but this was a successful call
+ // This means we have an incomplete frame and need to keep the entire buffer
+ if (bytes_consumed == 0) {
+ websocket_debug(
+ wsc, "Incomplete frame detected - keeping all %zu bytes in buffer for next read", contiguous_input);
+ return bytes_read; // Return the bytes read so caller knows we made progress
+ }
+
+ // We've processed some data - remove it from the circular buffer
+ cbuffer_remove_unsafe(&wsc->in_buffer, bytes_consumed);
+
+ return bytes_consumed;
+}
+
+// Process incoming WebSocket data
+ssize_t websocket_receive_data(WS_CLIENT *wsc) {
+ internal_fatal(wsc->wth->tid != gettid_cached(), "Function %s() should only be used by the websocket thread", __FUNCTION__ );
+
+ worker_is_busy(WORKERS_WEBSOCKET_SOCK_RECEIVE);
+
+ if (!wsc->in_buffer.data || wsc->sock.fd < 0)
+ return -1;
+
+ size_t available_space = WEBSOCKET_RECEIVE_BUFFER_SIZE;
+ if(wsc->next_frame_size > 0) {
+ size_t used_space = cbuffer_used_size_unsafe(&wsc->in_buffer);
+ if(used_space < wsc->next_frame_size) {
+ size_t missing_for_next_frame = wsc->next_frame_size - used_space;
+ available_space = MAX(missing_for_next_frame, WEBSOCKET_RECEIVE_BUFFER_SIZE);
+ }
+ }
+
+ char *buffer = cbuffer_reserve_unsafe(&wsc->in_buffer, available_space);
+ if(!buffer) {
+ websocket_error(wsc, "Not enough space to read %zu bytes", available_space);
+ return -1;
+ }
+
+ // Read data from socket into temporary buffer using ND_SOCK
+ ssize_t bytes_read = nd_sock_read(&wsc->sock, buffer, available_space, 0);
+
+ if (bytes_read <= 0) {
+ if (bytes_read == 0) {
+ // Connection closed
+ websocket_debug(wsc, "Client closed connection");
+ return -1;
+ }
+
+ if (errno == EAGAIN || errno == EWOULDBLOCK)
+ return 0; // No data available right now
+
+ websocket_error(wsc, "Failed to read from client: %s", strerror(errno));
+ return -1;
+ }
+
+ if (bytes_read > (ssize_t)available_space) {
+ websocket_error(wsc, "Received more data (%zd) than available space in buffer (%zd)",
+ bytes_read, available_space);
+ return -1;
+ }
+
+ cbuffer_commit_reserved_unsafe(&wsc->in_buffer, bytes_read);
+
+ // Update last activity time
+ wsc->last_activity_t = now_monotonic_sec();
+
+ // Dump the received data for debugging
+ websocket_dump_debug(wsc, buffer, bytes_read, "RX SOCK %zd bytes", bytes_read);
+
+ if(wsc->next_frame_size == 0 || cbuffer_has_enough_data_for_next_frame(wsc)) {
+ // we don't know the next frame size
+ // or, we know it and we have all the data for it
+
+ // process the received data
+ if(websocket_received_data_process(wsc, bytes_read) < 0)
+ return -1;
+
+ // we may still have wrapped data in the circular buffer that can satisfy the entire next frame
+ if(cbuffer_next_frame_is_fragmented(wsc)) {
+ // we have enough data to process this frame, no need to wait for more input
+ if(websocket_received_data_process(wsc, bytes_read) < 0)
+ return -1;
+ }
+ }
+
+ // Return the number of bytes we processed from this read
+ // Even if bytes_processed is 0, we still read data which will be processed later
+ return bytes_read;
+}
+
+// --------------------------------------------------------------------------------------------------------------------
+
+// Validate a WebSocket close code according to RFC 6455
+bool websocket_validate_close_code(uint16_t code) {
+ // 1000-2999 are reserved for the WebSocket protocol
+ // 3000-3999 are reserved for use by libraries, frameworks, and applications
+ // 4000-4999 are reserved for private use
+
+ // Check if code is in valid ranges
+ if ((code >= 1000 && code <= 1011) || // Protocol-defined codes
+ (code >= 3000 && code <= 4999)) // Application/library/private codes
+ {
+ // Codes 1004, 1005, and 1006 must not be used in a Close frame by an endpoint
+ if (code != WS_CLOSE_RESERVED && code != WS_CLOSE_NO_STATUS && code != WS_CLOSE_ABNORMAL)
+ return true;
+ }
+
+ return false;
+}
+
+// Centralized function to handle WebSocket protocol exceptions
+void websocket_protocol_exception(WS_CLIENT *wsc, WEBSOCKET_CLOSE_CODE reason_code, const char *reason_txt) {
+ if (!wsc) return;
+
+ websocket_error(wsc, "Protocol exception: %s (code: %d, %s)",
+ reason_txt, reason_code, WEBSOCKET_CLOSE_CODE_2str(reason_code));
+
+ // Always send a close frame with the reason
+ websocket_protocol_send_close(wsc, reason_code, reason_txt);
+
+ // Update state based on current state
+ if (wsc->state == WS_STATE_OPEN) {
+ // We're initiating the close - transition to server-initiated closing
+ wsc->state = WS_STATE_CLOSING_SERVER;
+ }
+ else if (wsc->state == WS_STATE_CLOSING_CLIENT || wsc->state == WS_STATE_CLOSING_SERVER) {
+ // Already in closing state, nothing to do
+ websocket_debug(wsc, "Protocol exception during closing state %s",
+ WEBSOCKET_STATE_2str(wsc->state));
+ }
+ else {
+ // For any other state, move straight to CLOSED
+ wsc->state = WS_STATE_CLOSED;
+ }
+
+ // For severe protocol errors, force immediate disconnection
+ if (reason_code == WS_CLOSE_PROTOCOL_ERROR ||
+ reason_code == WS_CLOSE_POLICY_VIOLATION ||
+ reason_code == WS_CLOSE_INVALID_PAYLOAD) {
+
+ websocket_info(wsc, "Forcing immediate disconnection due to protocol exception");
+
+ // First try to flush outgoing data to send the close frame
+ websocket_write_data(wsc);
+
+ // Remove client from thread
+ if (wsc->wth) {
+ websocket_thread_send_command(wsc->wth, WEBSOCKET_THREAD_CMD_REMOVE_CLIENT, wsc->id);
+ }
+ }
+}
+
+#define WS_ALLOW_USE ( 1)
+#define WS_ALLOW_DISCARD ( 0)
+#define WS_ALLOW_ERROR (-1)
+
+// Centralized function to check if a frame is allowed based on connection state
+static int websocket_is_frame_allowed(WS_CLIENT *wsc, const WEBSOCKET_FRAME_HEADER *header) {
+ if (!wsc || !header)
+ return WS_ALLOW_ERROR;
+
+ bool is_control = websocket_frame_is_control_opcode(header->opcode);
+
+ // Check state-based restrictions
+ switch (wsc->state) {
+ case WS_STATE_OPEN:
+ // In OPEN state, all frames are allowed
+ return WS_ALLOW_USE;
+
+ case WS_STATE_CLOSING_SERVER:
+ // When server initiated closing, only control frames are allowed
+ if (!is_control) {
+ websocket_debug(wsc, "Non-control frame rejected in CLOSING_SERVER state");
+ return WS_ALLOW_DISCARD;
+ }
+ return WS_ALLOW_USE;
+
+ case WS_STATE_CLOSING_CLIENT:
+ // When client initiated closing, we shouldn't process any further frames
+ // All frames in this state should be silently ignored
+ websocket_debug(wsc, "Frame rejected in CLOSING_CLIENT state (will be silently ignored)");
+ return WS_ALLOW_DISCARD;
+
+ case WS_STATE_CLOSED:
+ // In CLOSED state, no frames should be processed
+ websocket_debug(wsc, "Frame rejected in CLOSED state");
+ return WS_ALLOW_DISCARD;
+
+ case WS_STATE_HANDSHAKE:
+ // In HANDSHAKE state, no WebSocket frames should be processed yet
+ websocket_debug(wsc, "Frame rejected in HANDSHAKE state");
+ return WS_ALLOW_ERROR;
+
+ default:
+ // Unknown state - reject frame
+ websocket_debug(wsc, "Frame rejected in unknown state: %d", wsc->state);
+ return WS_ALLOW_DISCARD;
+ }
+}
+
+// Helper function to handle frame header parsing
+bool websocket_protocol_parse_header_from_buffer(const char *buffer, size_t length,
+ WEBSOCKET_FRAME_HEADER *header) {
+ if (!buffer || !header || length < 2) {
+ websocket_debug(NULL, "We need at least 2 bytes to parse a header: buffer=%p, length=%zu", buffer, length);
+ return false;
+ }
+
+ // Get first byte - contains FIN bit, RSV bits, and opcode
+ unsigned char byte1 = (unsigned char)buffer[0];
+ header->fin = (byte1 & WS_FIN) ? 1 : 0;
+ header->rsv1 = (byte1 & WS_RSV1) ? 1 : 0;
+ header->rsv2 = (byte1 & (WS_RSV1 >> 1)) ? 1 : 0;
+ header->rsv3 = (byte1 & (WS_RSV1 >> 2)) ? 1 : 0;
+ header->opcode = byte1 & 0x0F;
+
+ // Get second byte - contains MASK bit and initial length
+ unsigned char byte2 = (unsigned char)buffer[1];
+ header->mask = (byte2 & WS_MASK) ? 1 : 0;
+ header->len = byte2 & 0x7F;
+
+ // Calculate header size and payload length based on length field
+ header->header_size = 2; // Start with 2 bytes for the basic header
+
+ // Determine payload length
+ if (header->len < 126) {
+ header->payload_length = header->len;
+ }
+ else if (header->len == 126) {
+ // 16-bit length
+ if (length < 4) {
+ websocket_debug(NULL, "We need at least 4 bytes to parse this header: buffer=%p, length=%zu", buffer, length);
+ return false; // Not enough data
+ }
+
+ header->payload_length = ((uint64_t)((unsigned char)buffer[2]) << 8) | ((uint64_t)((unsigned char)buffer[3]));
+ header->header_size += 2;
+ }
+ else if (header->len == 127) {
+ // 64-bit length
+ if (length < 10) {
+ websocket_debug(NULL, "We need at least 10 bytes to parse this header: buffer=%p, length=%zu", buffer, length);
+ return false; // Not enough data
+ }
+
+ header->payload_length =
+ ((uint64_t)((unsigned char)buffer[2]) << 56) |
+ ((uint64_t)((unsigned char)buffer[3]) << 48) |
+ ((uint64_t)((unsigned char)buffer[4]) << 40) |
+ ((uint64_t)((unsigned char)buffer[5]) << 32) |
+ ((uint64_t)((unsigned char)buffer[6]) << 24) |
+ ((uint64_t)((unsigned char)buffer[7]) << 16) |
+ ((uint64_t)((unsigned char)buffer[8]) << 8) |
+ ((uint64_t)((unsigned char)buffer[9]));
+ header->header_size += 8;
+ }
+
+ // Read masking key if frame is masked
+ if (header->mask) {
+ if (length < header->header_size + 4) return false; // Not enough data
+
+ // Copy masking key
+ memcpy(header->mask_key, buffer + header->header_size, 4);
+ header->header_size += 4;
+ } else {
+ // Clear mask key if not masked
+ memset(header->mask_key, 0, 4);
+ }
+
+ header->payload = (void *)&buffer[header->header_size];
+ header->frame_size = header->header_size + header->payload_length;
+
+ return true;
+}
+
+// Validate a parsed frame header according to WebSocket protocol rules
+static bool websocket_protocol_validate_header(
+ WS_CLIENT *wsc, WEBSOCKET_FRAME_HEADER *header,
+ uint64_t payload_length, bool in_fragment_sequence) {
+ if (!wsc || !header)
+ return false;
+
+ // Check RSV bits - must be 0 unless extensions are negotiated
+ if (header->rsv2 || header->rsv3) {
+ websocket_error(wsc, "Invalid frame: RSV2 or RSV3 bits set");
+ websocket_protocol_exception(wsc, WS_CLOSE_PROTOCOL_ERROR, "RSV2 or RSV3 bits set");
+ return false;
+ }
+
+ // RSV1 is only valid if compression is enabled
+ if (header->rsv1 && (!wsc->compression.enabled)) {
+ websocket_error(wsc, "Invalid frame: RSV1 bit set but compression not enabled");
+ websocket_protocol_exception(wsc, WS_CLOSE_PROTOCOL_ERROR, "RSV1 bit set without compression");
+ return false;
+ }
+
+ // For continuation frames in a compressed message, RSV1 must be 0 per RFC 7692
+ // Continuation frames for a compressed message must not have RSV1 set
+ if (header->opcode == WS_OPCODE_CONTINUATION && in_fragment_sequence && header->rsv1) {
+ websocket_error(wsc, "Invalid frame: Continuation frame should not have RSV1 bit set");
+ websocket_protocol_exception(wsc, WS_CLOSE_PROTOCOL_ERROR, "RSV1 bit set on continuation frame");
+ return false;
+ }
+
+ // Check opcode validity
+ switch (header->opcode) {
+ case WS_OPCODE_CONTINUATION:
+ // Continuation frames must be in a fragment sequence
+ if (!in_fragment_sequence) {
+ websocket_error(wsc, "Invalid frame: Continuation frame without initial frame");
+ websocket_protocol_exception(wsc, WS_CLOSE_PROTOCOL_ERROR, "Continuation frame without initial frame");
+ return false;
+ }
+ break;
+
+ case WS_OPCODE_TEXT:
+ case WS_OPCODE_BINARY:
+ // New data frames cannot start inside a fragment sequence
+ if (in_fragment_sequence) {
+ websocket_error(wsc, "Invalid frame: New data frame during fragmented message");
+ websocket_protocol_exception(wsc, WS_CLOSE_PROTOCOL_ERROR, "New data frame during fragmented message");
+ return false;
+ }
+ break;
+
+ case WS_OPCODE_CLOSE:
+ case WS_OPCODE_PING:
+ case WS_OPCODE_PONG:
+ // Control frames must not be fragmented
+ if (!header->fin) {
+ websocket_error(wsc, "Invalid frame: Fragmented control frame");
+ websocket_protocol_exception(wsc, WS_CLOSE_PROTOCOL_ERROR, "Fragmented control frame");
+ return false;
+ }
+
+ // Control frames must have payload ≤ 125 bytes
+ if (payload_length > 125) {
+ websocket_error(wsc, "Invalid frame: Control frame payload too large (%llu bytes)",
+ (unsigned long long)payload_length);
+ websocket_protocol_exception(wsc, WS_CLOSE_PROTOCOL_ERROR, "Control frame payload too large");
+ return false;
+ }
+ break;
+
+ default:
+ // Unknown opcode
+ websocket_error(wsc, "Invalid frame: Unknown opcode: 0x%x", (unsigned int)header->opcode);
+ websocket_protocol_exception(wsc, WS_CLOSE_PROTOCOL_ERROR, "Unknown opcode");
+ return false;
+ }
+
+ // Validate payload length against limits
+ if (payload_length > (uint64_t)WS_MAX_FRAME_LENGTH) {
+ websocket_error(wsc, "Invalid frame: Payload too large (%llu bytes)",
+ (unsigned long long)payload_length);
+ websocket_protocol_exception(wsc, WS_CLOSE_MESSAGE_TOO_BIG, "Frame payload too large");
+ return false;
+ }
+
+ // All checks passed
+ return true;
+}
+
+// Process a control frame directly without creating a message structure
+static bool websocket_protocol_process_control_message(
+ WS_CLIENT *wsc, WEBSOCKET_OPCODE opcode,
+ char *payload, size_t payload_length,
+ bool is_masked, const unsigned char *mask_key) {
+ websocket_debug(wsc, "Processing control frame opcode=0x%x, payload_length=%zu, is_masked=%d, connection state=%d",
+ opcode, payload_length, is_masked, wsc->state);
+
+ // If payload is masked, unmask it first
+ if (is_masked && mask_key && payload && payload_length > 0)
+ websocket_unmask(payload, payload, payload_length, mask_key);
+
+ switch (opcode) {
+ case WS_OPCODE_CLOSE: {
+ worker_is_busy(WORKERS_WEBSOCKET_MSG_CLOSE);
+
+ uint16_t code = WS_CLOSE_NORMAL;
+ char reason[124];
+ reason[0] = '\0'; // Initialize reason string
+
+ // Check for malformed CLOSE frame payload
+ if (payload && payload_length == 1) {
+ websocket_error(wsc, "Invalid CLOSE frame payload length: 1 byte (must be 0 or >= 2 bytes)");
+
+ // This is a protocol violation - handle it consistently through the protocol exception mechanism
+ websocket_protocol_exception(wsc, WS_CLOSE_PROTOCOL_ERROR, "Invalid payload length");
+
+ // Return true since we handled the message
+ return true;
+ }
+ // Parse close code if present
+ else if (payload && payload_length >= 2) {
+ code = ((uint16_t)((unsigned char)payload[0]) << 8) |
+ ((uint16_t)((unsigned char)payload[1]));
+
+ // Validate close code
+ if (!websocket_validate_close_code(code)) {
+ websocket_error(wsc, "Invalid close code: %u", code);
+
+ // This is a protocol violation - handle it through the protocol exception mechanism
+ // This will send a close frame with 1002 (protocol error) and set the appropriate state
+ websocket_protocol_exception(wsc, WS_CLOSE_PROTOCOL_ERROR, "Invalid close code");
+
+ // Return true since we handled the message
+ return true;
+ }
+ // Check UTF-8 validity of the reason text
+ else if (payload_length > 2) {
+ // RFC 6455 requires all control frame payloads (including close reasons) to be valid UTF-8
+ if (!websocket_validate_utf8(payload + 2, payload_length - 2)) {
+ websocket_error(wsc, "Invalid UTF-8 in close frame reason");
+ code = WS_CLOSE_INVALID_PAYLOAD; // 1007 - Invalid frame payload data
+ strncpyz(reason, "Invalid UTF-8 in close reason", sizeof(reason) - 1);
+ }
+ else {
+ // Valid UTF-8, copy the reason
+ strncpyz(reason, payload + 2, MIN(payload_length - 2, sizeof(reason) - 1));
+ }
+ }
+ }
+
+ // Different handling based on connection state
+ if (wsc->state == WS_STATE_OPEN) {
+ // This is the initial CLOSE from client - respond with our own CLOSE
+ websocket_debug(wsc, "Received initial CLOSE frame from client, responding with CLOSE");
+
+ // Send close frame in response
+ websocket_protocol_send_close(wsc, code, reason);
+
+ wsc->state = WS_STATE_CLOSING_CLIENT;
+ wsc->flush_and_remove_client = true;
+
+ // IMPORTANT: do not call websocket_write_data() here
+ // because it prevents wsc->flush_and_remove_client from removing the client!
+ }
+ else if (wsc->state == WS_STATE_CLOSING_SERVER) {
+ // We initiated the closing handshake and now received client's response
+ // This completes the closing handshake
+ websocket_debug(wsc, "Closing handshake complete - received client's CLOSE response to our close");
+
+ // Ensure immediate removal from thread/poll
+ if (wsc->wth) {
+ websocket_info(wsc, "Closing TCP connection after completed handshake (server initiated)");
+ websocket_thread_send_command(wsc->wth, WEBSOCKET_THREAD_CMD_REMOVE_CLIENT, wsc->id);
+ }
+
+ // The RFC requires us to close the TCP connection immediately now
+ wsc->state = WS_STATE_CLOSED;
+ }
+ else if (wsc->state == WS_STATE_CLOSING_CLIENT) {
+ // Client already sent a CLOSE, and we responded, but they sent another CLOSE
+ // This is not strictly according to protocol, but we'll handle it gracefully
+ websocket_debug(wsc, "Received another CLOSE frame while in client-initiated closing state");
+
+ // Remove client from thread
+ if (wsc->wth) {
+ websocket_info(wsc, "Closing TCP connection (duplicate close from client)");
+ websocket_thread_send_command(wsc->wth, WEBSOCKET_THREAD_CMD_REMOVE_CLIENT, wsc->id);
+ }
+
+ // Move to CLOSED state and close the connection
+ wsc->state = WS_STATE_CLOSED;
+ }
+ else {
+ // Already in CLOSED state - ignore
+ websocket_debug(wsc, "Ignoring CLOSE frame - connection already in CLOSED state");
+ }
+ return true;
+ }
+
+ case WS_OPCODE_PING:
+ worker_is_busy(WORKERS_WEBSOCKET_MSG_PING);
+
+ // If we're in CLOSING or CLOSED state, decide how to handle PING based on state
+ if (wsc->state == WS_STATE_CLOSING_SERVER || wsc->state == WS_STATE_CLOSING_CLIENT || wsc->state == WS_STATE_CLOSED) {
+ // When we initiated closing, we should still respond to control frames
+ if (wsc->state == WS_STATE_CLOSING_SERVER) {
+ websocket_debug(wsc, "Received PING during server-initiated closing, responding with PONG");
+ return websocket_protocol_send_pong(wsc, payload, payload_length) > 0;
+ }
+
+ // For client-initiated closing or CLOSED, we should ignore control frames
+ websocket_debug(wsc, "Ignoring PING frame - connection in %s state",
+ wsc->state == WS_STATE_CLOSING_CLIENT ? "client closing" : "closed");
+ return true; // Successfully processed (by ignoring)
+ }
+
+ // Ping frame - respond with pong
+ websocket_debug(wsc, "Received PING frame with %zu bytes, responding with PONG", payload_length);
+
+ // Send pong with the same payload
+ return websocket_protocol_send_pong(wsc, payload, payload_length) > 0;
+
+ case WS_OPCODE_PONG:
+ worker_is_busy(WORKERS_WEBSOCKET_MSG_PONG);
+
+ // If we're in CLOSING or CLOSED state, decide how to handle PONG
+ if (wsc->state == WS_STATE_CLOSING_SERVER || wsc->state == WS_STATE_CLOSING_CLIENT || wsc->state == WS_STATE_CLOSED) {
+ // We can safely ignore PONG frames in any closing or closed state
+ websocket_debug(wsc, "Ignoring PONG frame - connection in %s state",
+ wsc->state == WS_STATE_CLOSING_SERVER ? "server closing" :
+ wsc->state == WS_STATE_CLOSING_CLIENT ? "client closing" : "closed");
+ return true; // Successfully processed (by ignoring)
+ }
+
+ // Pong frame - update last activity time
+ websocket_debug(wsc, "Received PONG frame, updating last activity time");
+ wsc->last_activity_t = now_monotonic_sec();
+ return true;
+
+ default:
+ worker_is_busy(WORKERS_WEBSOCKET_MSG_INVALID);
+ break;
+ }
+
+ websocket_error(wsc, "Unknown control opcode: %d", opcode);
+ return false;
+}
+
+// Parse a frame from a buffer and append it to the current message if applicable.
+// Returns one of the following:
+// - WS_FRAME_ERROR: An error occurred, connection should be closed
+// - WS_FRAME_NEED_MORE_DATA: More data is needed to complete the frame
+// - WS_FRAME_COMPLETE: Frame was successfully parsed and handled, but is not a complete message yet
+// - WS_FRAME_MESSAGE_READY: Message is ready for processing
+static WEBSOCKET_FRAME_RESULT
+websocket_protocol_consume_frame(WS_CLIENT *wsc, char *data, size_t length, ssize_t *bytes_processed) {
+ if (!wsc || !data || !length || !bytes_processed)
+ return WS_FRAME_ERROR;
+
+ size_t bytes = *bytes_processed = 0;
+
+ // Local variables for frame processing
+ WEBSOCKET_FRAME_HEADER header = { 0 };
+
+ // Step 1: Parse the frame header
+ if (!websocket_protocol_parse_header_from_buffer(data, length, &header)) {
+ websocket_debug(wsc, "Not enough data to parse a complete header: bytes available = %zu",
+ length);
+ return WS_FRAME_NEED_MORE_DATA;
+ }
+
+ if(header.frame_size > wsc->max_message_size)
+ wsc->max_message_size = header.frame_size;
+
+ // Check if we have enough data for the complete frame (header + payload)
+ // If not, don't consume any bytes and wait for more data
+ if (bytes + header.frame_size > length) {
+
+ // let the circular buffer know how much data we need
+ wsc->next_frame_size = header.frame_size;
+
+ worker_is_busy(WORKERS_WEBSOCKET_INCOMPLETE_FRAME);
+ websocket_debug(wsc,
+ "RX FRAME INCOMPLETE (need %zu bytes more): OPCODE=0x%x, FIN=%s, RSV1=%d, RSV2=%d, RSV3=%d, MASK=%s, LEN=%d, "
+ "PAYLOAD_LEN=%zu, HEADER_SIZE=%zu, FRAME_SIZE=%zu, MASK=%02x%02x%02x%02x, bytes available = %zu",
+ (bytes + header.frame_size) - length,
+ header.opcode, header.fin ? "True" : "False", header.rsv1, header.rsv2, header.rsv3,
+ header.mask ? "True" : "False", header.len,
+ header.payload_length, header.header_size, header.frame_size,
+ header.mask_key[0], header.mask_key[1], header.mask_key[2], header.mask_key[3],
+ length);
+
+ return WS_FRAME_NEED_MORE_DATA;
+ }
+ wsc->next_frame_size = 0; // reset it, since we have enough data now
+
+ worker_is_busy(WORKERS_WEBSOCKET_COMPLETE_FRAME);
+
+ // Log detailed header information for debugging
+ websocket_debug(wsc,
+ "RX FRAME: OPCODE=0x%x, FIN=%s, RSV1=%d, RSV2=%d, RSV3=%d, MASK=%s, LEN=%d, "
+ "PAYLOAD_LEN=%zu, HEADER_SIZE=%zu, FRAME_SIZE=%zu, MASK=%02x%02x%02x%02x",
+ header.opcode, header.fin ? "True" : "False", header.rsv1, header.rsv2, header.rsv3,
+ header.mask ? "True" : "False", header.len,
+ header.payload_length, header.header_size, header.frame_size,
+ header.mask_key[0], header.mask_key[1], header.mask_key[2], header.mask_key[3]);
+
+ // Check for invalid RSV bits
+ if (header.rsv2 || header.rsv3 || (header.rsv1 && !wsc->compression.enabled)) {
+ const char *reason = header.rsv2 ? "RSV2 bit set" :
+ (header.rsv3 ? "RSV3 bit set" : "RSV1 bit set without compression");
+
+ // Handle protocol exception in a centralized way
+ websocket_protocol_exception(wsc, WS_CLOSE_PROTOCOL_ERROR, reason);
+ return WS_FRAME_ERROR;
+ }
+
+ // Check if this frame is allowed in the current connection state
+ switch(websocket_is_frame_allowed(wsc, &header)) {
+ case WS_ALLOW_USE:
+ break;
+
+ case WS_ALLOW_DISCARD:
+ // we have already logged in websocket_is_frame_allowed()
+ return WS_FRAME_COMPLETE;
+
+ default:
+ case WS_ALLOW_ERROR: {
+ char reason[128];
+
+ snprintf(reason, sizeof(reason), "Frame not allowed in %s state",
+ wsc->state == WS_STATE_CLOSING_SERVER ? "server closing" :
+ wsc->state == WS_STATE_CLOSING_CLIENT ? "client closing" :
+ "current");
+
+ websocket_protocol_exception(wsc, WS_CLOSE_PROTOCOL_ERROR, reason);
+ return WS_FRAME_ERROR;
+ }
+ }
+
+ // Step 2: Validate the frame header
+ if (!websocket_protocol_validate_header(wsc, &header, header.payload_length, !wsc->message_complete)) {
+ // Invalid frame - websocket_protocol_validate_header sent a close frame
+ // but we should handle the connection closing consistently
+ websocket_protocol_exception(wsc, WS_CLOSE_PROTOCOL_ERROR, "Invalid frame header");
+ return WS_FRAME_ERROR;
+ }
+
+ // Advance past the header
+ bytes += header.header_size;
+
+ if (websocket_frame_is_control_opcode(header.opcode)) {
+ // Handle control frames (PING, PONG, CLOSE) directly
+ websocket_debug(wsc, "Handling control frame: opcode=0x%x, payload_length=%zu",
+ (unsigned)header.opcode, header.payload_length);
+
+ // Process the control frame with optional payload
+ char *payload = (header.payload_length > 0) ? (data + bytes) : NULL;
+
+ // Process control message directly without creating a message object
+ if (!websocket_protocol_process_control_message(
+ wsc, (WEBSOCKET_OPCODE)header.opcode,
+ payload, header.payload_length,
+ header.mask ? true : false,
+ header.mask_key)) {
+ websocket_error(wsc, "Failed to process control frame");
+ return WS_FRAME_ERROR;
+ }
+
+ // Update bytes processed
+ if (header.payload_length > 0)
+ bytes += (size_t)header.payload_length;
+
+ *bytes_processed = bytes;
+
+ return WS_FRAME_COMPLETE; // Return COMPLETE so we continue processing other frames
+ }
+
+ // For non-control frames (text, binary, continuation), check if connection is closing
+ if (wsc->state == WS_STATE_CLOSING_SERVER || wsc->state == WS_STATE_CLOSING_CLIENT || wsc->state == WS_STATE_CLOSED) {
+ // Per RFC 6455, once the closing handshake is started, we should ignore non-control frames
+ websocket_debug(wsc, "Ignoring non-control frame (opcode=0x%x) - connection in %s state",
+ header.opcode,
+ wsc->state == WS_STATE_CLOSING_SERVER ? "server closing" :
+ wsc->state == WS_STATE_CLOSING_CLIENT ? "client closing" : "closed");
+
+ // Consume the frame bytes but don't process it
+ bytes += header.header_size + header.payload_length;
+ *bytes_processed = bytes;
+
+ return WS_FRAME_COMPLETE;
+ }
+
+ // Step 3: Handle the frame based on its opcode
+ if (header.opcode == WS_OPCODE_CONTINUATION) {
+ // This is a continuation frame - need an existing message in progress
+ if (wsc->message_complete) {
+ websocket_error(wsc, "Received continuation frame with no message in progress");
+ websocket_protocol_exception(wsc, WS_CLOSE_PROTOCOL_ERROR, "Continuation frame without initial frame");
+ return WS_FRAME_ERROR;
+ }
+
+ // If it's a zero-length frame, we don't need to append any data
+ if (header.payload_length == 0) {
+ // For zero-length non-final frames, just update and continue
+ if (!header.fin) {
+ // Non-final zero-length frame
+ websocket_debug(wsc, "Zero-length non-final continuation frame");
+ *bytes_processed = bytes;
+ wsc->frame_id++;
+ return WS_FRAME_COMPLETE;
+ }
+
+ // Final zero-length frame - mark message as complete
+ wsc->message_complete = true;
+ *bytes_processed = bytes;
+ return WS_FRAME_MESSAGE_READY;
+ }
+
+ // The message buffer length is updated as we append frame data
+ }
+ else {
+ if(!header.payload_length) {
+ websocket_debug(wsc, "Received data frame with zero-length payload (fin=%d)", header.fin);
+
+ // Initialize the client's message state for a new message
+ websocket_client_message_reset(wsc);
+ wsc->opcode = (WEBSOCKET_OPCODE)header.opcode;
+ wsc->is_compressed = header.rsv1 ? true : false;
+
+ // The most important part: for fragmented messages (non-final frames),
+ // we must set message_complete to false, regardless of payload length
+ wsc->message_complete = header.fin;
+
+ // Buffer length is already 0 after reset
+ wsc->frame_id = 0;
+
+ // Check if this is a final frame
+ if (header.fin) {
+ // Final frame - message is already marked as complete
+ *bytes_processed = bytes;
+ return WS_FRAME_MESSAGE_READY;
+ } else {
+ // Non-final frame - continue to next frame
+ *bytes_processed = bytes;
+ wsc->frame_id++;
+ return WS_FRAME_COMPLETE;
+ }
+ }
+
+ // This is a new data frame (TEXT or BINARY)
+ // If we have an existing message in progress, it's an error
+ if (!wsc->message_complete) {
+ websocket_error(wsc, "Received new data frame while another message is in progress");
+ websocket_protocol_exception(wsc, WS_CLOSE_PROTOCOL_ERROR, "New data frame during fragmented message");
+ return WS_FRAME_ERROR;
+ }
+
+ // Initialize the client's message state for a new message
+ websocket_client_message_reset(wsc);
+ wsc->opcode = (WEBSOCKET_OPCODE)header.opcode;
+ wsc->is_compressed = header.rsv1 ? true : false;
+
+ // For fragmented messages (non-final frames), we must set message_complete to false
+ // This needs to be consistently done for both empty and non-empty frames
+ wsc->message_complete = header.fin;
+
+ // Buffer length will be updated when we append the payload data
+ wsc->frame_id = 0;
+ }
+
+ // Step 4: Append payload data to the message
+
+ if (header.payload_length > 0) {
+ char *src = header.payload;
+
+ if (header.mask) {
+ // Payload is masked - need to unmask it first
+ websocket_debug(wsc, "Unmasking and appending payload data at position %zu (key=%02x%02x%02x%02x)",
+ wsb_length(&wsc->payload),
+ header.mask_key[0], header.mask_key[1], header.mask_key[2], header.mask_key[3]);
+
+ // Use the new helper function to unmask and append the data in one step
+ wsb_unmask_and_append(&wsc->payload, src, header.payload_length, header.mask_key);
+ }
+ else {
+ // Payload is not masked - can directly append
+ websocket_debug(wsc, "Appending unmasked payload data at position %zu", wsb_length(&wsc->payload));
+
+ // Append unmasked data directly
+ wsb_append(&wsc->payload, src, header.payload_length);
+ }
+
+ // Dump payload for debugging
+ size_t buffer_length = wsb_length(&wsc->payload);
+ websocket_dump_debug(wsc,
+ wsb_data(&wsc->payload) + (buffer_length - header.payload_length),
+ header.payload_length,
+ "RX FRAME PAYLOAD");
+ }
+
+ // Step 5: At this point, we know we've processed a complete frame
+ wsc->frame_id++;
+
+ bytes += header.payload_length;
+ *bytes_processed = bytes;
+
+ // If this is a final frame, mark the message as complete
+ if (header.fin)
+ return WS_FRAME_MESSAGE_READY;
+
+ // Non-final frame, message is incomplete - move to next frame
+ return WS_FRAME_COMPLETE;
+}
+
+// Process incoming data from the WebSocket client
+// This function's job is to:
+// 1. Consume frames from the input buffer
+// 2. Build messages
+// 3. Process complete messages
+ssize_t websocket_protocol_got_data(WS_CLIENT *wsc, char *data, size_t length) {
+ if (!wsc || !data || !length)
+ return -1;
+
+ // Keep processing frames until we can't process any more
+ size_t processed = 0;
+ while (processed < length) {
+ // Try to consume one complete frame
+ ssize_t consumed = 0;
+ WEBSOCKET_FRAME_RESULT result = websocket_protocol_consume_frame(wsc, data + processed, length - processed, &consumed);
+
+ websocket_debug(wsc, "Frame processing result: %d, processed: %zu/%zu", result, consumed, length);
+
+ // Safety check to ensure we always move forward in the buffer
+ if (consumed == 0 && result != WS_FRAME_NEED_MORE_DATA && result != WS_FRAME_ERROR) {
+ // If we're processing a frame but not consuming bytes, we might be stuck
+ websocket_error(wsc, "Protocol processing stalled - consumed 0 bytes but not waiting for more data (%d)", (int)result);
+ return -(ssize_t)processed;
+ }
+
+ switch (result) {
+ case WS_FRAME_ERROR:
+ // Error occurred during frame processing
+ websocket_error(wsc, "Error processing WebSocket frame");
+ return processed ? -(ssize_t)processed : -1;
+
+ case WS_FRAME_NEED_MORE_DATA:
+ // Need more data to complete the current frame
+ websocket_debug(wsc, "Need more data to complete the current frame");
+ return (ssize_t)processed;
+
+ case WS_FRAME_COMPLETE:
+ // Frame was processed successfully, but more frames are needed for a complete message
+ websocket_debug(wsc, "Frame complete, but message not yet complete");
+ processed += consumed;
+ continue;
+
+ case WS_FRAME_MESSAGE_READY:
+ worker_is_busy(WORKERS_WEBSOCKET_MESSAGE);
+ processed += consumed;
+
+ wsc->message_complete = true;
+ if (!websocket_client_process_message(wsc))
+ websocket_error(wsc, "Failed to process completed message");
+
+ continue;
+ }
+ }
+
+ return (ssize_t)processed;
+}
+
diff --git a/src/web/websocket/websocket-send.c b/src/web/websocket/websocket-send.c
new file mode 100644
index 00000000000000..d3272f6d382e9c
--- /dev/null
+++ b/src/web/websocket/websocket-send.c
@@ -0,0 +1,411 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#include "websocket-internal.h"
+
+// --------------------------------------------------------------------------------------------------------------------
+// writing to the socket
+
+// Actually write data to the client socket
+ssize_t websocket_write_data(WS_CLIENT *wsc) {
+ internal_fatal(wsc->wth->tid != gettid_cached(), "Function %s() should only be used by the websocket thread", __FUNCTION__ );
+
+ worker_is_busy(WORKERS_WEBSOCKET_SOCK_SEND);
+
+ if (!wsc->out_buffer.data || wsc->sock.fd < 0)
+ return -1;
+
+ ssize_t bytes_written = 0;
+
+ // Let cbuffer_next_unsafe determine if there's data to write
+ // This correctly handles the circular buffer wrap-around cases
+
+ // Get data to write from circular buffer
+ char *data;
+ size_t data_length = cbuffer_next_unsafe(&wsc->out_buffer, &data);
+ if (data_length == 0)
+ goto done;
+
+ // Dump the data being written for debugging
+ websocket_dump_debug(wsc, data, data_length, "TX SOCK %zu bytes", data_length);
+
+ // In the websocket thread we want non-blocking behavior
+ // Use nd_sock_write with a single retry
+ bytes_written = nd_sock_write(&wsc->sock, data, data_length, 1); // 1 retry for non-blocking write
+
+ if (bytes_written < 0) {
+ websocket_error(wsc, "Failed to write to client: %s", strerror(errno));
+ goto done;
+ }
+
+ // Remove written bytes from circular buffer
+ if (bytes_written > 0)
+ cbuffer_remove_unsafe(&wsc->out_buffer, bytes_written);
+
+done:
+ websocket_thread_update_client_poll_flags(wsc);
+ return bytes_written;
+}
+
+// --------------------------------------------------------------------------------------------------------------------
+
+static inline size_t select_header_size(size_t payload_len) {
+ if (payload_len < 126)
+ return 2;
+ else if (payload_len <= 65535)
+ return 4;
+ else
+ return 10;
+}
+
+// Create and send a WebSocket frame
+int websocket_protocol_send_frame(
+ WS_CLIENT *wsc, const char *payload, size_t payload_len,
+ WEBSOCKET_OPCODE opcode, bool use_compression) {
+
+ if(!wsc)
+ return -1;
+
+ const char *disconnect_msg = "";
+ z_stream *zstrm = wsc->compression.deflate_stream;
+
+ if (wsc->sock.fd < 0) {
+ disconnect_msg = "Client not connected";
+ goto abnormal_disconnect;
+ }
+
+ // Validate parameters based on WebSocket protocol
+ if (websocket_frame_is_control_opcode(opcode) && payload_len > 125) {
+ disconnect_msg = "Control frame payload too large";
+ goto abnormal_disconnect;
+ }
+
+ // Check if we should actually use compression
+ bool compress = !websocket_frame_is_control_opcode(opcode) &&
+ use_compression &&
+ payload && payload_len &&
+ wsc->compression.enabled &&
+ zstrm &&
+ payload_len >= WS_COMPRESS_MIN_SIZE;
+
+ // Calculate maximum possible compressed size using deflateBound
+ size_t max_compressed_size = payload_len;
+ if (compress) {
+ // Use deflateBound to accurately calculate maximum possible size
+ max_compressed_size = deflateBound(zstrm, payload_len);
+
+ // Add 4 bytes for Z_SYNC_FLUSH trailer (will be removed later)
+ max_compressed_size += 4;
+
+ // Ensure the destination can fit the uncompressed data too
+ max_compressed_size = MAX(payload_len, max_compressed_size);
+ }
+
+ // Determine header size based on maximum potential size
+ size_t header_size = select_header_size(max_compressed_size);
+
+ // Calculate maximum potential frame size
+ size_t max_frame_size = header_size + max_compressed_size;
+
+ // Reserve space in the circular buffer for the entire frame
+ unsigned char *header_dst = (unsigned char *)cbuffer_reserve_unsafe(&wsc->out_buffer, max_frame_size);
+ if (!header_dst) {
+ disconnect_msg = "Buffer full - too much outgoing data";
+ goto abnormal_disconnect;
+ }
+
+ // The payload will be written directly after the header in our reserved buffer
+ char *payload_dst = (char *)(header_dst + header_size);
+ size_t final_payload_len = 0;
+
+ if (compress) {
+ // Setup temporary parameters for the compressor to use
+ // We'll use the space after our header for the compressed data
+ zstrm->next_in = (Bytef *)payload;
+ zstrm->avail_in = payload_len;
+ zstrm->next_out = (Bytef *)payload_dst;
+ zstrm->avail_out = max_compressed_size;
+ zstrm->total_in = 0;
+ zstrm->total_out = 0;
+
+ // Compress with sync flush to ensure data is flushed
+ int ret = deflate(zstrm, Z_SYNC_FLUSH);
+
+ bool success = false;
+ if (ret == Z_STREAM_END || (ret == Z_OK && zstrm->avail_in == 0 && zstrm->avail_out > 0))
+ success = true;
+ else if (ret == Z_OK && zstrm->avail_in == 0 && zstrm->avail_out == 0) {
+ unsigned pending = Z_NULL;
+ int bits = Z_NULL;
+ if(deflatePending(zstrm, &pending, &bits) == Z_OK &&
+ (pending == Z_NULL && bits == Z_NULL))
+ success = true;
+ }
+
+ uInt avail_in = zstrm->avail_in;
+ uInt avail_out = zstrm->avail_out;
+ uLong total_in = zstrm->total_in;
+ uLong total_out = zstrm->total_out;
+
+ // we are done - reset the stream to avoid heap-use-after-free issues later
+ if (deflateReset(zstrm) != Z_OK) {
+ disconnect_msg = "Deflate reset failed";
+ goto abnormal_disconnect;
+ }
+
+ // Calculate compressed size if successful
+ if (!success || total_out <= 4) {
+ // Compression failed
+ websocket_error(wsc, "Compression failed: %s "
+ "(ret = %d, avail_in = %u, avail_out = %u, total_in = %lu, total_out = %lu) - "
+ "sending uncompressed payload",
+ zError(ret), ret, avail_in, avail_out, total_in, total_out);
+ compress = false;
+ }
+ else {
+ // As per RFC 7692, remove trailing 4 bytes (00 00 FF FF) from Z_SYNC_FLUSH
+ final_payload_len = max_compressed_size - avail_out - 4;
+
+ websocket_debug(wsc, "Compressed payload from %zu to %zu bytes (%.1f%%)",
+ payload_len, final_payload_len,
+ (double)final_payload_len * 100.0 / (double)payload_len);
+
+ // we may have selected a bigger header size than needed
+ // so we need to move the payload to the right place
+ size_t optimal_header_size = select_header_size(final_payload_len);
+ if(optimal_header_size < header_size) {
+ char *dst = (char *)header_dst + optimal_header_size;
+ char *src = payload_dst;
+ memmove(dst, src, final_payload_len);
+ payload_dst = dst;
+ header_size = optimal_header_size;
+ }
+ }
+
+ // ensure all pointer values are NULL, so that there is no trace back to this compression
+ zstrm->next_in = NULL;
+ zstrm->avail_in = 0;
+ zstrm->next_out = NULL;
+ zstrm->avail_out = 0;
+ zstrm->total_in = 0;
+ zstrm->total_out = 0;
+ }
+
+ if(!compress && payload && payload_len > 0) {
+ memcpy(payload_dst, payload, payload_len);
+ final_payload_len = payload_len;
+ }
+
+ // Write the header
+ // First byte: FIN(1) + RSV1(compress) + RSV2(0) + RSV3(0) + OPCODE(4)
+ header_dst[0] = 0x80 | (compress ? 0x40 : 0) | (opcode & 0x0F);
+
+ // Write payload length with the appropriate format
+ switch(header_size) {
+ case 2:
+ header_dst[1] = final_payload_len & 0x7F;
+ break;
+
+ case 4:
+ header_dst[1] = 126;
+ header_dst[2] = (final_payload_len >> 8) & 0xFF;
+ header_dst[3] = final_payload_len & 0xFF;
+ break;
+
+ case 10:
+ header_dst[1] = 127;
+ header_dst[2] = (final_payload_len >> 56) & 0xFF;
+ header_dst[3] = (final_payload_len >> 48) & 0xFF;
+ header_dst[4] = (final_payload_len >> 40) & 0xFF;
+ header_dst[5] = (final_payload_len >> 32) & 0xFF;
+ header_dst[6] = (final_payload_len >> 24) & 0xFF;
+ header_dst[7] = (final_payload_len >> 16) & 0xFF;
+ header_dst[8] = (final_payload_len >> 8) & 0xFF;
+ header_dst[9] = final_payload_len & 0xFF;
+ break;
+
+ default:
+ // impossible case - added to avoid compiler warning
+ disconnect_msg = "Invalid header size";
+ goto abnormal_disconnect;
+ }
+
+ // Commit the final frame size (header + payload)
+ size_t final_frame_size = header_size + final_payload_len;
+ cbuffer_commit_reserved_unsafe(&wsc->out_buffer, final_frame_size);
+
+#ifdef NETDATA_INTERNAL_CHECKS
+ // Log frame being sent with detailed format matching the received frame logging
+ WEBSOCKET_FRAME_HEADER header;
+ if(!websocket_protocol_parse_header_from_buffer((const char *)header_dst, header_size, &header)) {
+ disconnect_msg = "Failed to parse the header we generated";
+ goto abnormal_disconnect;
+ }
+
+ websocket_debug(wsc,
+ "TX FRAME: OPCODE=0x%x, FIN=%s, RSV1=%d, RSV2=%d, RSV3=%d, MASK=%s, LEN=%d, "
+ "PAYLOAD_LEN=%zu, HEADER_SIZE=%zu, FRAME_SIZE=%zu, MASK=%02x%02x%02x%02x",
+ header.opcode, header.fin ? "True" : "False", header.rsv1, header.rsv2, header.rsv3,
+ header.mask ? "True" : "False", header.len,
+ header.payload_length, header.header_size, header.frame_size,
+ header.mask_key[0], header.mask_key[1], header.mask_key[2], header.mask_key[3]);
+#endif
+
+ // Make sure the client's poll flags include WRITE
+ websocket_thread_update_client_poll_flags(wsc);
+
+ return (int)final_frame_size;
+
+abnormal_disconnect:
+ websocket_error(wsc, "triggering abnormal disconnect: %s", disconnect_msg);
+ websocket_thread_send_command(wsc->wth, WEBSOCKET_THREAD_CMD_REMOVE_CLIENT, wsc->id);
+ return -1;
+
+//graceful_disconnect:
+// // the current implementation does not support graceful disconnect - so we do an abnormal one
+// websocket_error(wsc, "triggering graceful disconnect: %s", disconnect_msg);
+// websocket_thread_send_command(wsc->wth, WEBSOCKET_THREAD_CMD_REMOVE_CLIENT, wsc->id);
+// return -1;
+}
+
+// Send a text message
+int websocket_protocol_send_text(WS_CLIENT *wsc, const char *text) {
+ if (!wsc)
+ return -1;
+
+ // Special handling for null or empty text message
+ if (!text || text[0] == '\0') {
+ websocket_debug(wsc, "Sending empty text message");
+
+ // Use an empty buffer for zero-length text messages
+ static const char empty_data[1] = {0};
+ return websocket_protocol_send_frame(wsc, empty_data, 0, WS_OPCODE_TEXT, false);
+ }
+
+ size_t text_len = strlen(text);
+
+ websocket_debug(wsc, "Sending text message, length=%zu", text_len);
+
+ // Dump text message for debugging
+ websocket_dump_debug(wsc, text, text_len, "TX TEXT MSG");
+
+ // Enable compression for text messages by default
+ return websocket_protocol_send_frame(wsc, text, text_len, WS_OPCODE_TEXT, true);
+}
+
+// Send a binary message
+int websocket_protocol_send_binary(WS_CLIENT *wsc, const void *data, size_t length) {
+ if (!wsc)
+ return -1;
+
+ // Special handling for empty binary message
+ if (!data || length == 0) {
+ websocket_debug(wsc, "Sending empty binary message");
+
+ // Use an empty buffer for zero-length binary messages
+ static const char empty_data[1] = {0};
+ return websocket_protocol_send_frame(wsc, empty_data, 0, WS_OPCODE_BINARY, false);
+ }
+
+ websocket_debug(wsc, "Sending binary message, length=%zu", length);
+
+ // Dump binary message for debugging
+ websocket_dump_debug(wsc, data, length, "TX BIN MSG");
+
+ return websocket_protocol_send_frame(wsc, data, length, WS_OPCODE_BINARY, true);
+}
+
+// Send a close frame
+int websocket_protocol_send_close(WS_CLIENT *wsc, WEBSOCKET_CLOSE_CODE code, const char *reason) {
+ if (!wsc || wsc->sock.fd < 0)
+ return -1;
+
+ // Only send a close frame if we're in a valid state to do so
+ // Per RFC 6455: An endpoint MUST NOT send any more data frames after sending a Close frame
+ // CLOSING_CLIENT means we already responded to a client's close and shouldn't send another
+ if (wsc->state == WS_STATE_CLOSED ||
+ wsc->state == WS_STATE_CLOSING_SERVER ||
+ wsc->state == WS_STATE_CLOSING_CLIENT)
+ return -1;
+
+ // Validate close code
+ if (!websocket_validate_close_code((uint16_t)code)) {
+ websocket_error(wsc, "Invalid close code: %d (%s)", code, WEBSOCKET_CLOSE_CODE_2str(code));
+ code = WS_CLOSE_PROTOCOL_ERROR;
+ reason = "Invalid close code";
+ }
+
+ // Prepare close payload: 2-byte code + optional reason text
+ size_t reason_len = reason ? strlen(reason) : 0;
+
+ // Control frames max size is 125 bytes
+ if (reason_len > 123) {
+ websocket_error(wsc, "Close frame payload too large: %zu bytes (max 123)", reason_len);
+ reason_len = 123; // Truncate reason to fit
+ }
+
+ // Use stack buffer for close frame payload (max 125 bytes per RFC 6455)
+ size_t payload_len = 2 + reason_len;
+ char payload[payload_len];
+
+ // Set status code in network byte order (big-endian)
+ uint16_t code_value = (uint16_t)code;
+ payload[0] = (code_value >> 8) & 0xFF;
+ payload[1] = code & 0xFF;
+
+ // Add reason if provided (truncate if necessary)
+ if (reason && reason_len > 0)
+ memcpy(payload + 2, reason, reason_len);
+
+ // Call the close handler if registered - this is used to inject a message on close if needed
+ if(wsc->on_close)
+ wsc->on_close(wsc, WS_CLOSE_GOING_AWAY, reason);
+
+ // Send close frame (never compressed)
+ int result = websocket_protocol_send_frame(wsc, payload, payload_len, WS_OPCODE_CLOSE, false);
+
+ return result;
+}
+
+// Send a ping frame
+int websocket_protocol_send_ping(WS_CLIENT *wsc, const char *data, size_t length) {
+ if (!wsc)
+ return -1;
+
+ // Control frames max size is 125 bytes
+ if (length > 125) {
+ websocket_error(wsc, "Ping frame payload too large: %zu bytes (max: 125)",
+ length);
+ return -1;
+ }
+
+ // If no data provided, use empty ping
+ if (!data || length == 0) {
+ data = "";
+ length = 0;
+ }
+
+ // Send ping frame (never compressed)
+ return websocket_protocol_send_frame(wsc, data, length, WS_OPCODE_PING, false);
+}
+
+// Send a pong frame
+int websocket_protocol_send_pong(WS_CLIENT *wsc, const char *data, size_t length) {
+ if (!wsc)
+ return -1;
+
+ // Control frames max size is 125 bytes
+ if (length > 125) {
+ websocket_error(wsc, "Pong frame payload too large: %zu bytes (max: 125)",
+ length);
+ return -1;
+ }
+
+ // If no data provided, use empty pong
+ if (!data || length == 0) {
+ data = "";
+ length = 0;
+ }
+
+ // Send pong frame (never compressed)
+ return websocket_protocol_send_frame(wsc, data, length, WS_OPCODE_PONG, false);
+}
diff --git a/src/web/websocket/websocket-thread.c b/src/web/websocket/websocket-thread.c
new file mode 100644
index 00000000000000..93eb2042b92dd9
--- /dev/null
+++ b/src/web/websocket/websocket-thread.c
@@ -0,0 +1,515 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#include "daemon/daemon-service.h"
+#include "websocket-internal.h"
+
+static void websocket_thread_client_socket_error(WEBSOCKET_THREAD *wth, WS_CLIENT *wsc, const char *reason) {
+ internal_fatal(wth->tid != gettid_cached(), "Function %s() should only be used by the websocket thread", __FUNCTION__ );
+
+ worker_is_busy(WORKERS_WEBSOCKET_SOCK_ERROR);
+
+ websocket_debug(wsc, reason);
+
+ // Send command to remove the client
+ // Note: on_disconnect will be called in websocket_thread_remove_client
+ websocket_thread_send_command(wth, WEBSOCKET_THREAD_CMD_REMOVE_CLIENT, wsc->id);
+}
+
+// Add a client to a thread's poll
+static bool websocket_thread_add_client(WEBSOCKET_THREAD *wth, WS_CLIENT *wsc) {
+ internal_fatal(wth->tid != gettid_cached(), "Function %s() should only be used by the websocket thread", __FUNCTION__ );
+
+ // Initialize compression with the parsed options
+ websocket_compression_init(wsc);
+ websocket_decompression_init(wsc);
+
+ // Add client to the poll - use the socket fd directly
+ bool added = nd_poll_add(wth->ndpl, wsc->sock.fd, ND_POLL_READ, wsc);
+ if(!added) {
+ websocket_error(wsc, "Failed to add client to poll");
+ return false;
+ }
+
+ // Add client to the thread's client list
+ DOUBLE_LINKED_LIST_APPEND_ITEM_UNSAFE(wth->clients, wsc, prev, next);
+
+ return true;
+}
+
+static void websocket_thread_remove_client(WEBSOCKET_THREAD *wth, WS_CLIENT *wsc) {
+ internal_fatal(wth->tid != gettid_cached(), "Function %s() should only be used by the websocket thread", __FUNCTION__ );
+
+ // Notify the protocol handler that the client is being disconnected
+ if (wsc->on_disconnect) {
+ websocket_debug(wsc, "Calling on_disconnect callback for protocol %s", WEBSOCKET_PROTOCOL_2str(wsc->protocol));
+ wsc->on_disconnect(wsc);
+ }
+
+ // send a close frame (it won't do it if not allowed by the protocol)
+ websocket_protocol_send_close(wsc, WS_CLOSE_NORMAL, "Connection closed by server");
+
+ // If already in a closing state, just flush any pending data
+ websocket_write_data(wsc);
+
+ // Remove client from the poll - use socket fd directly
+ bool removed = nd_poll_del(wth->ndpl, wsc->sock.fd);
+ if(!removed) {
+ websocket_debug(wsc, "Failed to remove client %zu from poll", wsc->id);
+ }
+
+ websocket_decompression_cleanup(wsc);
+ websocket_compression_cleanup(wsc);
+
+ // Remove client from the thread's client list
+ DOUBLE_LINKED_LIST_REMOVE_ITEM_UNSAFE(wth->clients, wsc, prev, next);
+
+ // Lock the thread clients
+ spinlock_lock(&wth->clients_spinlock);
+
+ if(wth->clients_current > 0)
+ wth->clients_current--;
+
+ // Release the thread clients lock
+ spinlock_unlock(&wth->clients_spinlock);
+
+ websocket_debug(wsc, "Removed and resources freed", wth->id, wsc->id);
+ websocket_client_free(wsc);
+}
+
+// Update a client's poll event flags
+bool websocket_thread_update_client_poll_flags(WS_CLIENT *wsc) {
+ if(!wsc || !wsc->wth || wsc->sock.fd < 0)
+ return false;
+
+ internal_fatal(wsc->wth->tid != gettid_cached(), "Function %s() should only be used by the websocket thread", __FUNCTION__ );
+
+ nd_poll_event_t events = wsc->flush_and_remove_client ? 0 : ND_POLL_READ;
+ if(cbuffer_next_unsafe(&wsc->out_buffer, NULL) > 0)
+ events |= ND_POLL_WRITE;
+
+ // Update poll events
+ bool updated = nd_poll_upd(wsc->wth->ndpl, wsc->sock.fd, events);
+ if(!updated)
+ websocket_error(wsc, "Failed to update poll events for client");
+
+ return updated;
+}
+
+struct pipe_header {
+ uint8_t cmd;
+ union {
+ uint32_t id;
+ uint32_t len;
+ };
+};
+
+// Send command to a thread
+bool websocket_thread_send_command(WEBSOCKET_THREAD *wth, uint8_t cmd, uint32_t id) {
+ if(!wth || wth->cmd.pipe[PIPE_WRITE] == -1) {
+ netdata_log_error("WEBSOCKET[%zu]: Failed to send command - pipe is not initialized", wth->id);
+ return false;
+ }
+
+ // Prepare command
+ struct pipe_header header = {
+ .cmd = cmd,
+ .id = id,
+ };
+
+ // Lock command pipe for writing
+ spinlock_lock(&wth->spinlock);
+
+ // Write command header
+ ssize_t bytes = write(wth->cmd.pipe[PIPE_WRITE], &header, sizeof(header));
+ if(bytes != sizeof(header)) {
+ netdata_log_error("WEBSOCKET[%zu]: Failed to write command header to pipe", wth->id);
+ spinlock_unlock(&wth->spinlock);
+ return false;
+ }
+
+ // Release command pipe
+ spinlock_unlock(&wth->spinlock);
+
+ return true;
+}
+
+bool websocket_thread_send_broadcast(WEBSOCKET_THREAD *wth, WEBSOCKET_OPCODE opcode, const char *message) {
+ if(!wth || wth->cmd.pipe[PIPE_WRITE] == -1) {
+ netdata_log_error("WEBSOCKET[%zu]: Failed to send command - pipe is not initialized", wth->id);
+ return false;
+ }
+
+ uint32_t message_len = strlen(message);
+
+ // Prepare command
+ struct pipe_header header = {
+ .cmd = WEBSOCKET_THREAD_CMD_BROADCAST,
+ .len = sizeof(opcode) + message_len,
+ };
+
+ // Lock command pipe for writing
+ spinlock_lock(&wth->spinlock);
+
+ // Write command header
+ ssize_t bytes = write(wth->cmd.pipe[PIPE_WRITE], &header, sizeof(header.cmd));
+ if(bytes != sizeof(header.cmd)) {
+ netdata_log_error("WEBSOCKET[%zu]: Failed to write command header to pipe", wth->id);
+ spinlock_unlock(&wth->spinlock);
+ return false;
+ }
+
+ // Write the opcode
+ bytes = write(wth->cmd.pipe[PIPE_WRITE], &opcode, sizeof(opcode));
+ if(bytes != sizeof(opcode)) {
+ netdata_log_error("WEBSOCKET[%zu]: Failed to write broadcast opcode to pipe", wth->id);
+ spinlock_unlock(&wth->spinlock);
+ return false;
+ }
+
+ // Write the message
+ bytes = write(wth->cmd.pipe[PIPE_WRITE], message, message_len);;
+ if(bytes != message_len) {
+ netdata_log_error("WEBSOCKET[%zu]: Failed to write broadcast message to pipe", wth->id);
+ spinlock_unlock(&wth->spinlock);
+ return false;
+ }
+
+ // Release command pipe
+ spinlock_unlock(&wth->spinlock);
+
+ return true;
+}
+
+static ssize_t read_pipe_block(int fd, void *buffer, size_t size) {
+ char *buf = buffer;
+ size_t total_read = 0;
+
+ while (total_read < size) {
+ ssize_t bytes = read(fd, buf + total_read, size - total_read);
+
+ if (bytes < 0) {
+ if (errno == EAGAIN || errno == EWOULDBLOCK) {
+ // Non-blocking case, return what we've read so far
+ return (ssize_t)total_read;
+ }
+
+ // Real error occurred
+ return -1;
+
+ }
+ else if (bytes == 0)
+ return (ssize_t)total_read;
+
+ total_read += bytes;
+ }
+
+ return (ssize_t)total_read;
+}
+
+// Process a thread's command pipe
+static void websocket_thread_process_commands(WEBSOCKET_THREAD *wth) {
+ internal_fatal(wth->tid != gettid_cached(), "Function %s() should only be used by the websocket thread", __FUNCTION__ );
+
+ struct pipe_header header;
+
+ // Read all available commands
+ for(;;) {
+ // Read command header
+
+ worker_is_busy(WORKERS_WEBSOCKET_CMD_READ);
+
+ ssize_t bytes = read_pipe_block(wth->cmd.pipe[PIPE_READ], &header, sizeof(header));
+ if(bytes <= 0) {
+ if(bytes < 0 && errno != EAGAIN && errno != EWOULDBLOCK) {
+ netdata_log_error("WEBSOCKET[%zu]: Failed to read command header from pipe", wth->id);
+ }
+ break;
+ }
+
+ if(bytes != sizeof(header)) {
+ netdata_log_error("WEBSOCKET[%zu]: Read partial command header (%zd/%zu bytes)", wth->id, bytes, sizeof(header));
+ break;
+ }
+
+ // Process command
+ switch(header.cmd) {
+ case WEBSOCKET_THREAD_CMD_EXIT:
+ worker_is_busy(WORKERS_WEBSOCKET_CMD_EXIT);
+ netdata_log_info("WEBSOCKET[%zu] received exit command", wth->id);
+ return;
+
+ case WEBSOCKET_THREAD_CMD_ADD_CLIENT: {
+ worker_is_busy(WORKERS_WEBSOCKET_CMD_ADD);
+ WS_CLIENT *wsc = websocket_client_find_by_id(header.id);
+ if(!wsc) {
+ netdata_log_error("WEBSOCKET[%zu]: Client %u not found for add command", wth->id, header.id);
+ continue;
+ }
+ internal_fatal(wsc->wth != wth, "Client %u added to wrong thread", header.id);
+ wsc->wth = wth;
+ if (websocket_thread_add_client(wth, wsc)) {
+ // Call the on_connect callback if provided to notify protocol handler of new client
+ if (wsc->on_connect) {
+ websocket_debug(wsc, "Calling on_connect callback for protocol %s", WEBSOCKET_PROTOCOL_2str(wsc->protocol));
+ wsc->on_connect(wsc);
+ }
+ }
+ break;
+ }
+
+ case WEBSOCKET_THREAD_CMD_REMOVE_CLIENT: {
+ worker_is_busy(WORKERS_WEBSOCKET_CMD_DEL);
+ WS_CLIENT *wsc = websocket_client_find_by_id(header.id);
+ if(!wsc) {
+ netdata_log_error("WEBSOCKET[%zu]: Client %u not found for remove command", wth->id, header.id);
+ continue;
+ }
+
+ websocket_thread_remove_client(wth, wsc);
+ break;
+ }
+
+ case WEBSOCKET_THREAD_CMD_BROADCAST: {
+ worker_is_busy(WORKERS_WEBSOCKET_CMD_BROADCAST);
+
+ WEBSOCKET_OPCODE opcode;
+ bytes = read_pipe_block(wth->cmd.pipe[PIPE_READ], &opcode, sizeof(opcode));
+ if(bytes != sizeof(opcode)) {
+ netdata_log_error("WEBSOCKET[%zu]: Failed to read broadcast opcode from pipe", wth->id);
+ continue;
+ }
+
+ uint32_t message_len = header.len - sizeof(opcode);
+ char message[message_len + 1];
+ bytes = read_pipe_block(wth->cmd.pipe[PIPE_READ], message, message_len);
+ if(bytes != message_len) {
+ netdata_log_error("WEBSOCKET[%zu]: Failed to read broadcast message from pipe", wth->id);
+ continue;
+ }
+
+ // Ensure we have the complete message
+ if(header.len != sizeof(WEBSOCKET_OPCODE) + message_len) {
+ netdata_log_error("WEBSOCKET[%zu]: Broadcast command size mismatch", wth->id);
+ continue;
+ }
+
+ // Send to all clients in this thread
+ spinlock_lock(&wth->clients_spinlock);
+
+ WS_CLIENT *wsc = wth->clients;
+ while(wsc) {
+ if(wsc->state == WS_STATE_OPEN) {
+ websocket_send_message(wsc, message, message_len, opcode);
+ }
+ wsc = wsc->next;
+ }
+
+ spinlock_unlock(&wth->clients_spinlock);
+ break;
+ }
+
+ default:
+ worker_is_busy(WORKERS_WEBSOCKET_CMD_UNKNOWN);
+ netdata_log_error("WEBSOCKET[%zu]: Unknown command %u", wth->id, header.cmd);
+ break;
+ }
+ }
+}
+
+// Thread main function
+void *websocket_thread(void *ptr) {
+ WEBSOCKET_THREAD *wth = (WEBSOCKET_THREAD *)ptr;
+ wth->tid = gettid_uncached();
+
+ worker_register("WEBSOCKET");
+ worker_register_job_name(WORKERS_WEBSOCKET_POLL, "poll");
+ worker_register_job_name(WORKERS_WEBSOCKET_CMD_READ, "cmd read");
+ worker_register_job_name(WORKERS_WEBSOCKET_CMD_EXIT, "cmd exit");
+ worker_register_job_name(WORKERS_WEBSOCKET_CMD_ADD, "cmd add");
+ worker_register_job_name(WORKERS_WEBSOCKET_CMD_DEL, "cmd del");
+ worker_register_job_name(WORKERS_WEBSOCKET_CMD_BROADCAST, "cmd bcast");
+ worker_register_job_name(WORKERS_WEBSOCKET_CMD_UNKNOWN, "cmd unknown");
+ worker_register_job_name(WORKERS_WEBSOCKET_SOCK_RECEIVE, "ws rcv");
+ worker_register_job_name(WORKERS_WEBSOCKET_SOCK_SEND, "ws snd");
+ worker_register_job_name(WORKERS_WEBSOCKET_SOCK_ERROR, "ws err");
+ worker_register_job_name(WORKERS_WEBSOCKET_CLIENT_TIMEOUT, "client timeout");
+ worker_register_job_name(WORKERS_WEBSOCKET_SEND_PING, "send ping");
+ worker_register_job_name(WORKERS_WEBSOCKET_CLIENT_STUCK, "client stuck");
+ worker_register_job_name(WORKERS_WEBSOCKET_INCOMPLETE_FRAME, "incomplete frame");
+ worker_register_job_name(WORKERS_WEBSOCKET_COMPLETE_FRAME, "complete frame");
+ worker_register_job_name(WORKERS_WEBSOCKET_MESSAGE, "message");
+ worker_register_job_name(WORKERS_WEBSOCKET_MSG_PING, "rx ping");
+ worker_register_job_name(WORKERS_WEBSOCKET_MSG_PONG, "rx pong");
+ worker_register_job_name(WORKERS_WEBSOCKET_MSG_CLOSE, "rx close");
+ worker_register_job_name(WORKERS_WEBSOCKET_MSG_INVALID, "rx invalid");
+
+ time_t last_cleanup = now_monotonic_sec();
+
+ // Main thread loop
+ while(service_running(SERVICE_STREAMING) && !nd_thread_signaled_to_cancel()) {
+
+ worker_is_idle();
+
+ // Poll for events
+ nd_poll_result_t ev;
+ int rc = nd_poll_wait(wth->ndpl, 100, &ev); // 100ms timeout
+
+ worker_is_busy(WORKERS_WEBSOCKET_POLL);
+
+ if(rc < 0) {
+ if(errno == EAGAIN || errno == EINTR)
+ continue;
+
+ netdata_log_error("WEBSOCKET[%zu]: Poll error: %s", wth->id, strerror(errno));
+ break;
+ }
+
+ // Process poll events
+ if(rc > 0) {
+ // Handle command pipe
+ if(ev.data == &wth->cmd) {
+ if(ev.events & ND_POLL_READ) {
+ // Read and process commands
+ websocket_thread_process_commands(wth);
+ }
+ continue;
+ }
+
+ // Handle client events
+ WS_CLIENT *wsc = (WS_CLIENT *)ev.data;
+ if(!wsc) {
+ netdata_log_error("WEBSOCKET[%zu]: Poll event with NULL client data", wth->id);
+ continue;
+ }
+
+ // Check for errors
+ if(ev.events & ND_POLL_HUP) {
+ websocket_thread_client_socket_error(wth, wsc, "Client hangup");
+ continue;
+ }
+ if(ev.events & ND_POLL_ERROR) {
+ websocket_thread_client_socket_error(wth, wsc, "Socket error");
+ continue;
+ }
+
+ // Process read events
+ if(ev.events & ND_POLL_READ) {
+ if(websocket_receive_data(wsc) < 0) {
+ websocket_thread_client_socket_error(wth, wsc, "Failed to receive data");
+ continue;
+ }
+ }
+
+ // Process write events
+ if(ev.events & ND_POLL_WRITE) {
+ if(websocket_write_data(wsc) < 0) {
+ websocket_thread_client_socket_error(wth, wsc, "Failed to send data");
+ continue;
+ }
+
+ // Check if this client is waiting to be closed after flushing outgoing data
+ if(wsc->flush_and_remove_client && cbuffer_used_size_unsafe(&wsc->out_buffer) == 0) {
+ // All data flushed - remove client
+ websocket_thread_remove_client(wth, wsc);
+ }
+ }
+ }
+
+ worker_is_idle();
+
+ // Periodic cleanup and health checks (every 30 seconds)
+ time_t now = now_monotonic_sec();
+ if(now - last_cleanup > 30) {
+ // Iterate through all clients in this thread
+ spinlock_lock(&wth->clients_spinlock);
+
+ WS_CLIENT *wsc = wth->clients;
+ while(wsc) {
+ WS_CLIENT *next = wsc->next; // Save next in case we remove this client
+
+ if(wsc->state == WS_STATE_OPEN) {
+ // Check if client is idle (no activity for over 120 seconds)
+ if(now - wsc->last_activity_t > 120) {
+ // Client is idle - send a ping to check if it's still alive
+ worker_is_busy(WORKERS_WEBSOCKET_SEND_PING);
+ websocket_protocol_send_ping(wsc, NULL, 0);
+
+ // If no activity for over 300 seconds (5 minutes), consider it dead
+ if(now - wsc->last_activity_t > 300) {
+ worker_is_busy(WORKERS_WEBSOCKET_CLIENT_TIMEOUT);
+ websocket_error(wsc, "Client timed out (no activity for over 5 minutes)");
+ websocket_protocol_exception(wsc, WS_CLOSE_GOING_AWAY, "Timeout - no activity");
+ }
+ }
+ // For normal clients, send periodic pings (every 60 seconds)
+ else if(now - wsc->last_activity_t > 60) {
+ worker_is_busy(WORKERS_WEBSOCKET_SEND_PING);
+ websocket_protocol_send_ping(wsc, NULL, 0);
+ }
+ }
+ else if(wsc->state == WS_STATE_CLOSING_SERVER || wsc->state == WS_STATE_CLOSING_CLIENT) {
+ // If a client is in any CLOSING state for more than 5 seconds, force close it
+ if(now - wsc->last_activity_t > 5) {
+ worker_is_busy(WORKERS_WEBSOCKET_CLIENT_STUCK);
+ websocket_error(wsc, "Forcing close (stuck in %s state)",
+ wsc->state == WS_STATE_CLOSING_SERVER ? "CLOSING_SERVER" : "CLOSING_CLIENT");
+ websocket_thread_send_command(wth, WEBSOCKET_THREAD_CMD_REMOVE_CLIENT, wsc->id);
+ }
+ }
+
+ wsc = next;
+ }
+
+ spinlock_unlock(&wth->clients_spinlock);
+
+ last_cleanup = now;
+ }
+ }
+
+ netdata_log_info("WEBSOCKET[%zu] exiting", wth->id);
+
+ // Clean up any remaining clients
+ spinlock_lock(&wth->clients_spinlock);
+
+ // Close all clients in this thread
+ WS_CLIENT *wsc = wth->clients;
+ while(wsc) {
+ WS_CLIENT *next = wsc->next;
+
+ websocket_protocol_send_close(wsc, WS_CLOSE_GOING_AWAY, "Server shutting down");
+ websocket_write_data(wsc);
+ websocket_thread_remove_client(wth, wsc);
+
+ wsc = next;
+ }
+
+ // Reset thread's client list
+ wth->clients = NULL;
+ wth->clients_current = 0;
+
+ spinlock_unlock(&wth->clients_spinlock);
+
+ // Cleanup poll resources
+ if(wth->ndpl) {
+ nd_poll_destroy(wth->ndpl);
+ wth->ndpl = NULL;
+ }
+
+ // Cleanup command pipe
+ if(wth->cmd.pipe[PIPE_READ] != -1) {
+ close(wth->cmd.pipe[PIPE_READ]);
+ wth->cmd.pipe[PIPE_READ] = -1;
+ }
+
+ if(wth->cmd.pipe[PIPE_WRITE] != -1) {
+ close(wth->cmd.pipe[PIPE_WRITE]);
+ wth->cmd.pipe[PIPE_WRITE] = -1;
+ }
+
+ // Mark thread as not running
+ spinlock_lock(&wth->spinlock);
+ wth->running = false;
+ spinlock_unlock(&wth->spinlock);
+
+ return NULL;
+}
diff --git a/src/web/websocket/websocket-thread.h b/src/web/websocket/websocket-thread.h
new file mode 100644
index 00000000000000..b58856bdf1a09e
--- /dev/null
+++ b/src/web/websocket/websocket-thread.h
@@ -0,0 +1,11 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#ifndef NETDATA_WEBSOCKET_THREAD_H
+#define NETDATA_WEBSOCKET_THREAD_H
+
+// This file is kept for backward compatibility
+// All contents have been moved to websocket-internal.h
+
+#include "websocket-internal.h"
+
+#endif // NETDATA_WEBSOCKET_THREAD_H
\ No newline at end of file
diff --git a/src/web/websocket/websocket-utils.c b/src/web/websocket/websocket-utils.c
new file mode 100644
index 00000000000000..2555d46694d032
--- /dev/null
+++ b/src/web/websocket/websocket-utils.c
@@ -0,0 +1,105 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#include "websocket-internal.h"
+
+// Debug log function with client, message, and frame IDs
+void websocket_debug(WS_CLIENT *wsc __maybe_unused, const char *format __maybe_unused, ...) {
+#ifdef NETDATA_INTERNAL_CHECKS
+ if (!wsc || !format)
+ return;
+
+ char formatted_message[1024];
+ va_list args;
+ va_start(args, format);
+ vsnprintf(formatted_message, sizeof(formatted_message), format, args);
+ va_end(args);
+
+ // Format the debug message with client, message, and frame IDs
+ netdata_log_debug(D_WEBSOCKET, "WEBSOCKET: C=%u M=%zu F=%zu %s",
+ wsc->id, wsc->message_id, wsc->frame_id, formatted_message);
+#endif /* NETDATA_INTERNAL_CHECKS */
+}
+
+// Info log function with client, message, and frame IDs
+void websocket_info(WS_CLIENT *wsc, const char *format, ...) {
+ if (!wsc || !format)
+ return;
+
+ char formatted_message[1024];
+ va_list args;
+ va_start(args, format);
+ vsnprintf(formatted_message, sizeof(formatted_message), format, args);
+ va_end(args);
+
+ // Format the info message with client, message, and frame IDs
+ netdata_log_info("WEBSOCKET: C=%u M=%zu F=%zu %s",
+ wsc->id, wsc->message_id, wsc->frame_id, formatted_message);
+}
+
+// Error log function with client, message, and frame IDs
+void websocket_error(WS_CLIENT *wsc, const char *format, ...) {
+ if (!wsc || !format)
+ return;
+
+ char formatted_message[1024];
+ va_list args;
+ va_start(args, format);
+ vsnprintf(formatted_message, sizeof(formatted_message), format, args);
+ va_end(args);
+
+ // Format the error message with client, message, and frame IDs
+ netdata_log_error("WEBSOCKET: C=%u M=%zu F=%zu %s",
+ wsc->id, wsc->message_id, wsc->frame_id, formatted_message);
+}
+
+// Debug function that logs a message and dumps payload data for debugging
+void websocket_dump_debug(WS_CLIENT *wsc __maybe_unused, const char *payload __maybe_unused,
+ size_t payload_length __maybe_unused, const char *format __maybe_unused, ...) {
+#ifdef NETDATA_INTERNAL_CHECKS
+ if (!wsc || !format)
+ return;
+
+ // Format the primary message
+ char formatted_message[1024];
+ va_list args;
+ va_start(args, format);
+ vsnprintf(formatted_message, sizeof(formatted_message), format, args);
+ va_end(args);
+
+ // Handle empty payloads explicitly (log message but no hex dump)
+ if (payload_length == 0) {
+ netdata_log_debug(D_WEBSOCKET, "WEBSOCKET: C=%u M=%zu F=%zu %s (EMPTY PAYLOAD - 0 bytes)",
+ wsc->id, wsc->message_id, wsc->frame_id, formatted_message);
+ return;
+ }
+
+ // If payload is provided and not empty, create and log a hex dump
+ if (payload && payload_length > 0) {
+ size_t bytes_to_dump = (payload_length < 32) ? payload_length : 32;
+
+ char hex_dump[bytes_to_dump * 2 + 1];
+ char ascii_dump[bytes_to_dump + 1];
+
+ // Payload check is redundant as we already have it in the outer if
+
+ // Create the hex dump
+ size_t i = 0;
+ for (i = 0; i < bytes_to_dump; i++) {
+ unsigned char c = (unsigned char)payload[i];
+ sprintf(hex_dump + i * 2, "%02x", c);
+ ascii_dump[i] = (isprint(c) ? c : '.');
+ }
+
+ hex_dump[i * 2] = '\0';
+ ascii_dump[i] = '\0';
+
+ // Log the hex dump
+ netdata_log_debug(D_WEBSOCKET, "WEBSOCKET: C=%u M=%zu F=%zu %s DUMP %zu/%zu: HEX:[%s]%s, ASCII:[%s]%s",
+ wsc->id, wsc->message_id, wsc->frame_id,
+ formatted_message,
+ bytes_to_dump, payload_length,
+ hex_dump, payload_length > bytes_to_dump ? "..." : "",
+ ascii_dump, payload_length > bytes_to_dump ? "..." : "");
+ }
+#endif /* NETDATA_INTERNAL_CHECKS */
+}
diff --git a/src/web/websocket/websocket.c b/src/web/websocket/websocket.c
new file mode 100644
index 00000000000000..03b97ebea47354
--- /dev/null
+++ b/src/web/websocket/websocket.c
@@ -0,0 +1,264 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#include "websocket-internal.h"
+#include "websocket-echo.h"
+#include "websocket-jsonrpc.h"
+#include "../mcp/adapters/mcp-websocket.h"
+
+ENUM_STR_MAP_DEFINE(WEBSOCKET_PROTOCOL) = {
+ { .id = WS_PROTOCOL_JSONRPC, .name = "jsonrpc" },
+ { .id = WS_PROTOCOL_ECHO, .name = "echo" },
+ { .id = WS_PROTOCOL_MCP, .name = "mcp" },
+ { .id = WS_PROTOCOL_UNKNOWN, .name = "unknown" },
+
+ // terminator
+ { .name = NULL, .id = 0 }
+};
+ENUM_STR_DEFINE_FUNCTIONS(WEBSOCKET_PROTOCOL, WS_PROTOCOL_UNKNOWN, "unknown");
+
+ENUM_STR_MAP_DEFINE(WEBSOCKET_STATE) = {
+ { .id = WS_STATE_HANDSHAKE, .name = "handshake" },
+ { .id = WS_STATE_OPEN, .name = "open" },
+ { .id = WS_STATE_CLOSING_SERVER, .name = "closing_server" },
+ { .id = WS_STATE_CLOSING_CLIENT, .name = "closing_client" },
+ { .id = WS_STATE_CLOSED, .name = "closed" },
+
+ // terminator
+ { .name = NULL, .id = 0 }
+};
+ENUM_STR_DEFINE_FUNCTIONS(WEBSOCKET_STATE, WS_STATE_CLOSED, "closed");
+
+ENUM_STR_MAP_DEFINE(WEBSOCKET_OPCODE) = {
+ { .id = WS_OPCODE_CONTINUATION, .name = "continuation" },
+ { .id = WS_OPCODE_TEXT, .name = "text" },
+ { .id = WS_OPCODE_BINARY, .name = "binary" },
+ { .id = WS_OPCODE_CLOSE, .name = "close" },
+ { .id = WS_OPCODE_PING, .name = "ping" },
+ { .id = WS_OPCODE_PONG, .name = "pong" },
+
+ // terminator
+ { .name = NULL, .id = 0 }
+};
+ENUM_STR_DEFINE_FUNCTIONS(WEBSOCKET_OPCODE, WS_OPCODE_TEXT, "text");
+
+ENUM_STR_MAP_DEFINE(WEBSOCKET_CLOSE_CODE) = {
+ // Standard WebSocket close codes
+ { .id = WS_CLOSE_NORMAL, .name = "normal" },
+ { .id = WS_CLOSE_GOING_AWAY, .name = "going_away" },
+ { .id = WS_CLOSE_PROTOCOL_ERROR, .name = "protocol_error" },
+ { .id = WS_CLOSE_UNSUPPORTED_DATA, .name = "unsupported_data" },
+ { .id = WS_CLOSE_RESERVED, .name = "reserved" },
+ { .id = WS_CLOSE_NO_STATUS, .name = "no_status" },
+ { .id = WS_CLOSE_ABNORMAL, .name = "abnormal" },
+ { .id = WS_CLOSE_INVALID_PAYLOAD, .name = "invalid_payload" },
+ { .id = WS_CLOSE_POLICY_VIOLATION, .name = "policy_violation" },
+ { .id = WS_CLOSE_MESSAGE_TOO_BIG, .name = "message_too_big" },
+ { .id = WS_CLOSE_EXTENSION_MISSING, .name = "extension_missing" },
+ { .id = WS_CLOSE_INTERNAL_ERROR, .name = "internal_error" },
+ { .id = WS_CLOSE_TLS_HANDSHAKE, .name = "tls_handshake_error" },
+
+ // Netdata-specific close codes
+ { .id = WS_CLOSE_NETDATA_TIMEOUT, .name = "timeout" },
+ { .id = WS_CLOSE_NETDATA_SHUTDOWN, .name = "shutdown" },
+ { .id = WS_CLOSE_NETDATA_REJECTED, .name = "rejected" },
+ { .id = WS_CLOSE_NETDATA_RATE_LIMIT,.name = "rate_limit" },
+
+ // terminator
+ { .name = NULL, .id = 0 }
+};
+ENUM_STR_DEFINE_FUNCTIONS(WEBSOCKET_CLOSE_CODE, WS_CLOSE_NORMAL, "normal");
+
+// Private structure for WebSocket server state
+struct websocket_server {
+ WS_CLIENTS_JudyLSet clients; // JudyL array of WebSocket clients
+ size_t client_id_counter; // Counter for generating unique client IDs
+ size_t active_clients; // Number of active clients
+ SPINLOCK spinlock; // Spinlock to protect the registry
+};
+
+// The global (but private) instance of the WebSocket server state
+static struct websocket_server ws_server = (struct websocket_server){
+ .clients = { 0 },
+ .client_id_counter = 0,
+ .active_clients = 0,
+ .spinlock = SPINLOCK_INITIALIZER
+};
+
+// Initialize WebSocket subsystem
+void websocket_initialize(void) {
+ debug_flags |= D_WEBSOCKET;
+
+ // Initialize thread system
+ websocket_threads_init();
+
+ // Initialize protocol handlers
+ websocket_jsonrpc_initialize();
+ websocket_echo_initialize();
+ mcp_websocket_adapter_initialize();
+
+ netdata_log_info("WebSocket server subsystem initialized");
+}
+
+// Create a new WebSocket client with a unique ID
+NEVERNULL
+WS_CLIENT *websocket_client_create(void) {
+ WS_CLIENT *wsc = callocz(1, sizeof(WS_CLIENT));
+
+ spinlock_lock(&ws_server.spinlock);
+ wsc->id = ++ws_server.client_id_counter; // Generate unique ID
+ spinlock_unlock(&ws_server.spinlock);
+
+ wsc->connected_t = now_realtime_sec();
+ wsc->last_activity_t = wsc->connected_t;
+
+ // initialize callbacks to NULL
+ wsc->on_connect = NULL;
+ wsc->on_message = NULL;
+ wsc->on_close = NULL;
+ wsc->on_disconnect = NULL;
+
+ // initialize the ND_SOCK with the web server's SSL context
+ nd_sock_init(&wsc->sock, netdata_ssl_web_server_ctx, false);
+
+ // Initialize circular buffers for I/O with WebSocket-specific sizes and max limits
+ cbuffer_init(&wsc->in_buffer, WEBSOCKET_IN_BUFFER_INITIAL_SIZE, WEBSOCKET_IN_BUFFER_MAX_SIZE, NULL);
+ cbuffer_init(&wsc->out_buffer, WEBSOCKET_OUT_BUFFER_INITIAL_SIZE, WEBSOCKET_OUT_BUFFER_MAX_SIZE, NULL);
+
+ // Initialize pre-allocated message buffer
+ wsb_init(&wsc->payload, WEBSOCKET_PAYLOAD_INITIAL_SIZE);
+
+ // Initialize uncompressed buffer with a larger size since decompressed data can expand
+ // For compressed content, the expanded data can be much larger than the input
+ wsb_init(&wsc->u_payload, WEBSOCKET_UNPACKED_INITIAL_SIZE);
+
+ // Set the initial message state
+ wsc->opcode = WS_OPCODE_TEXT; // Default opcode
+ wsc->is_compressed = false;
+ wsc->message_complete = true; // Not in a fragmented sequence initially
+ wsc->frame_id = 0;
+ wsc->message_id = 0;
+ wsc->compression = WEBSOCKET_COMPRESSION_DEFAULTS;
+
+ return wsc;
+}
+
+// Free a WebSocket client
+void websocket_client_free(WS_CLIENT *wsc) {
+ if (!wsc)
+ return;
+
+ // First unregister from the client registry
+ websocket_client_unregister(wsc);
+
+ // We MUST make sure the socket is not in the poll before closing it
+ // otherwise kernel structures may be corrupted due to socket reuse
+ if(wsc->wth && wsc->wth->ndpl && wsc->sock.fd >= 0)
+ nd_poll_del(wsc->wth->ndpl, wsc->sock.fd);
+
+ // Close socket using ND_SOCK abstraction
+ nd_sock_close(&wsc->sock);
+
+ // Free circular buffers
+ cbuffer_cleanup(&wsc->in_buffer);
+ cbuffer_cleanup(&wsc->out_buffer);
+
+ // Cleanup pre-allocated message and uncompressed buffers
+ wsb_cleanup(&wsc->payload);
+ wsb_cleanup(&wsc->u_payload);
+
+ // Clean up compression resources if needed
+ websocket_compression_cleanup(wsc);
+
+ freez(wsc);
+}
+
+// Register a WebSocket client in the registry
+bool websocket_client_register(WS_CLIENT *wsc) {
+ if (!wsc || wsc->id == 0)
+ return false;
+
+ spinlock_lock(&ws_server.spinlock);
+
+ int added = WS_CLIENTS_SET(&ws_server.clients, wsc->id, wsc);
+ if (!added) {
+ ws_server.active_clients++;
+ websocket_debug(wsc, "WebSocket client registered, total clients: %u", ws_server.active_clients);
+ }
+
+ spinlock_unlock(&ws_server.spinlock);
+
+ return added;
+}
+
+// Unregister a WebSocket client from the registry
+void websocket_client_unregister(WS_CLIENT *wsc) {
+ if (!wsc || wsc->id == 0)
+ return;
+
+ spinlock_lock(&ws_server.spinlock);
+
+ WS_CLIENT *existing = WS_CLIENTS_GET(&ws_server.clients, wsc->id);
+ if (existing && existing == wsc) {
+ WS_CLIENTS_DEL(&ws_server.clients, wsc->id);
+ if (ws_server.active_clients > 0)
+ ws_server.active_clients--;
+
+ websocket_debug(wsc,"WebSocket client unregistered, total clients: %zu", ws_server.active_clients);
+ }
+
+ spinlock_unlock(&ws_server.spinlock);
+}
+
+// Find a WebSocket client by ID
+ALWAYS_INLINE
+WS_CLIENT *websocket_client_find_by_id(size_t id) {
+ if (id == 0)
+ return NULL;
+
+ WS_CLIENT *wsc = NULL;
+
+ spinlock_lock(&ws_server.spinlock);
+ wsc = WS_CLIENTS_GET(&ws_server.clients, id);
+ spinlock_unlock(&ws_server.spinlock);
+
+ return wsc;
+}
+
+// Broadcast a message to all connected WebSocket clients
+int websocket_broadcast_message(const char *message, WEBSOCKET_OPCODE opcode) {
+ if (!message || (opcode != WS_OPCODE_TEXT && opcode != WS_OPCODE_BINARY))
+ return -1;
+
+ int success_count = 0;
+
+ // Send broadcast command to all active threads
+ for(size_t i = 0; i < WEBSOCKET_MAX_THREADS; i++) {
+ if(websocket_threads[i].thread && websocket_threads[i].running) {
+ if(websocket_thread_send_broadcast(&websocket_threads[i], opcode, message)) {
+ success_count++;
+ }
+ }
+ }
+
+ return success_count;
+}
+
+// Send a WebSocket message to the client
+int websocket_send_message(WS_CLIENT *wsc, const char *message, size_t length, WEBSOCKET_OPCODE opcode) {
+ if (!wsc || !message || wsc->state != WS_STATE_OPEN)
+ return -1;
+
+ // Use the appropriate protocol function based on opcode
+ if (opcode == WS_OPCODE_TEXT) {
+ return websocket_protocol_send_text(wsc, message);
+ } else if (opcode == WS_OPCODE_BINARY) {
+ return websocket_protocol_send_binary(wsc, message, length);
+ } else {
+ // For other opcodes, use the generic frame sender
+ bool use_compression = wsc->compression.enabled &&
+ !websocket_frame_is_control_opcode(opcode) &&
+ length >= WS_COMPRESS_MIN_SIZE;
+
+ return websocket_protocol_send_frame(wsc, message, length, opcode, use_compression);
+ }
+}
diff --git a/src/web/websocket/websocket.h b/src/web/websocket/websocket.h
new file mode 100644
index 00000000000000..958fb75146aa11
--- /dev/null
+++ b/src/web/websocket/websocket.h
@@ -0,0 +1,106 @@
+// SPDX-License-Identifier: GPL-3.0-or-later
+
+#ifndef NETDATA_WEB_SERVER_WEBSOCKET_H
+#define NETDATA_WEB_SERVER_WEBSOCKET_H 1
+
+#include "libnetdata/libnetdata.h"
+
+// WebSocket subprotocols supported by Netdata
+typedef enum __attribute__((packed)) {
+ WS_PROTOCOL_DEFAULT = 0, // the protocol is selected from the url
+ WS_PROTOCOL_UNKNOWN, // Unknown or unsupported protocol
+ WS_PROTOCOL_JSONRPC, // JSON-RPC protocol
+ WS_PROTOCOL_ECHO, // Echo protocol
+ WS_PROTOCOL_MCP, // Model Context Protocol
+} WEBSOCKET_PROTOCOL;
+ENUM_STR_DEFINE_FUNCTIONS_EXTERN(WEBSOCKET_PROTOCOL);
+
+// WebSocket extensions supported by Netdata
+typedef enum __attribute__((packed)) {
+ // RFC 7692
+ WS_EXTENSION_NONE = 0, // No extensions
+ WS_EXTENSION_PERMESSAGE_DEFLATE = (1 << 0), // permessage-deflate
+ WS_EXTENSION_CLIENT_NO_CONTEXT_TAKEOVER = (1 << 1), // client_no_context_takeover
+ WS_EXTENSION_SERVER_NO_CONTEXT_TAKEOVER = (1 << 2), // server_no_context_takeover
+ WS_EXTENSION_SERVER_MAX_WINDOW_BITS = (1 << 3), // server_max_window_bits
+ WS_EXTENSION_CLIENT_MAX_WINDOW_BITS = (1 << 4) // client_max_window_bits
+} WEBSOCKET_EXTENSION;
+
+// Forward declarations
+struct web_client;
+struct websocket_server_client;
+
+// WebSocket connection state
+typedef enum __attribute__((packed)) {
+ WS_STATE_HANDSHAKE = 0, // Initial handshake in progress
+ WS_STATE_OPEN = 1, // Connection established
+ WS_STATE_CLOSING_SERVER = 2, // Server initiated closing handshake
+ WS_STATE_CLOSING_CLIENT = 3, // Client initiated closing handshake
+ WS_STATE_CLOSED = 4 // Connection closed
+} WEBSOCKET_STATE;
+ENUM_STR_DEFINE_FUNCTIONS_EXTERN(WEBSOCKET_STATE);
+
+// WebSocket message types (opcodes) as per RFC 6455
+typedef enum __attribute__((packed)) {
+ WS_OPCODE_CONTINUATION = 0x0,
+ WS_OPCODE_TEXT = 0x1,
+ WS_OPCODE_BINARY = 0x2,
+ WS_OPCODE_CLOSE = 0x8,
+ WS_OPCODE_PING = 0x9,
+ WS_OPCODE_PONG = 0xA
+} WEBSOCKET_OPCODE;
+ENUM_STR_DEFINE_FUNCTIONS_EXTERN(WEBSOCKET_OPCODE);
+
+// WebSocket close codes as per RFC 6455
+typedef enum __attribute__((packed)) {
+ // Standard WebSocket close codes
+ WS_CLOSE_NORMAL = 1000, // Normal closure, meaning the purpose for which the connection was established has been fulfilled
+ WS_CLOSE_GOING_AWAY = 1001, // Server/client going away (such as server shutdown or browser navigating away)
+ WS_CLOSE_PROTOCOL_ERROR = 1002, // Protocol error
+ WS_CLOSE_UNSUPPORTED_DATA = 1003, // Client received data it couldn't accept (e.g., server sent binary data when client only supports text)
+ WS_CLOSE_RESERVED = 1004, // Reserved. Specific meaning might be defined in the future.
+ WS_CLOSE_NO_STATUS = 1005, // No status code was provided even though one was expected
+ WS_CLOSE_ABNORMAL = 1006, // Connection closed abnormally (no close frame received)
+ WS_CLOSE_INVALID_PAYLOAD = 1007, // Frame payload data is invalid (e.g., non-UTF-8 data in a text frame)
+ WS_CLOSE_POLICY_VIOLATION = 1008, // Generic message received that violates policy
+ WS_CLOSE_MESSAGE_TOO_BIG = 1009, // Message too big to process
+ WS_CLOSE_EXTENSION_MISSING = 1010, // Client expected the server to negotiate one or more extensions, but server didn't
+ WS_CLOSE_INTERNAL_ERROR = 1011, // Server encountered an unexpected condition preventing it from fulfilling the request
+ WS_CLOSE_TLS_HANDSHAKE = 1015, // Transport Layer Security (TLS) handshake failure
+
+ // Netdata-specific close codes (4000-4999 range is available for private use)
+ WS_CLOSE_NETDATA_TIMEOUT = 4000, // Client timed out due to inactivity
+ WS_CLOSE_NETDATA_SHUTDOWN = 4001, // Server is shutting down
+ WS_CLOSE_NETDATA_REJECTED = 4002, // Connection rejected by server
+ WS_CLOSE_NETDATA_RATE_LIMIT= 4003 // Client exceeded rate limit
+} WEBSOCKET_CLOSE_CODE;
+ENUM_STR_DEFINE_FUNCTIONS_EXTERN(WEBSOCKET_CLOSE_CODE);
+
+/**
+ * WebSocket Protocol Handler Callbacks
+ *
+ * These callbacks are invoked when specific events occur during the WebSocket lifecycle:
+ *
+ * - on_connect: Called when a client successfully connects and is ready to exchange messages.
+ * This happens after the WebSocket handshake is complete and the client is added to a thread.
+ * Use this callback to welcome the client or initialize any protocol-specific state.
+ *
+ * - on_message: Called when a complete message is received from the client.
+ * This is where the protocol processes incoming messages from clients.
+ *
+ * - on_close: Called BEFORE sending a close frame to the client.
+ * This gives the protocol a chance to inject a final message before the connection closes.
+ *
+ * - on_disconnect: Called when a client is about to be disconnected.
+ * Use this callback to clean up any protocol-specific state for the client.
+ */
+
+// Public WebSocket API functions
+
+// WebSocket detection and handshake
+short int websocket_handle_handshake(struct web_client *w);
+
+// Initialize the WebSocket subsystem
+void websocket_initialize(void);
+
+#endif // NETDATA_WEB_SERVER_WEBSOCKET_H
From e38a4953c30dfa69c4aefc1f06f149b6548836d4 Mon Sep 17 00:00:00 2001
From: Ilya Mashchenko
Date: Thu, 15 May 2025 15:10:34 +0300
Subject: [PATCH 42/51] add "unix://" scheme to DOCKER_HOST in run.sh (#20286)
---
packaging/docker/run.sh | 4 +--
.../cgroups.plugin/cgroup-name.sh.in | 28 +++++++++++++------
2 files changed, 22 insertions(+), 10 deletions(-)
diff --git a/packaging/docker/run.sh b/packaging/docker/run.sh
index d17c4f241c24d1..1b28c5dc7b0624 100755
--- a/packaging/docker/run.sh
+++ b/packaging/docker/run.sh
@@ -68,11 +68,11 @@ if [ "${EUID}" -eq 0 ]; then
re='^[0-9]+$'
if [[ $BALENA_PGID =~ $re ]]; then
echo "Netdata detected balena-engine.sock"
- DOCKER_HOST='/var/run/balena-engine.sock'
+ DOCKER_HOST='unix:///var/run/balena-engine.sock'
PGID="$BALENA_PGID"
elif [[ $DOCKER_PGID =~ $re ]]; then
echo "Netdata detected docker.sock"
- DOCKER_HOST="/var/run/docker.sock"
+ DOCKER_HOST="unix:///var/run/docker.sock"
PGID="$DOCKER_PGID"
fi
export PGID
diff --git a/src/collectors/cgroups.plugin/cgroup-name.sh.in b/src/collectors/cgroups.plugin/cgroup-name.sh.in
index e02650f6bf1764..12696d977cfee8 100755
--- a/src/collectors/cgroups.plugin/cgroup-name.sh.in
+++ b/src/collectors/cgroups.plugin/cgroup-name.sh.in
@@ -147,25 +147,37 @@ function docker_like_get_name_command() {
function docker_like_get_name_api() {
local host_var="${1}"
local host="${!host_var}"
- local path="/containers/${2}/json"
+ local container_id="${2}"
+ local path="/containers/${container_id}/json"
+
if [ -z "${host}" ]; then
warning "No ${host_var} is set"
return 1
fi
+
if ! command -v jq >/dev/null 2>&1; then
warning "Can't find jq command line tool. jq is required for netdata to retrieve container name using ${host} API, falling back to docker ps"
return 1
fi
- if [ -S "${host}" ]; then
- info "Running API command: curl --unix-socket \"${host}\" http://localhost${path}"
- JSON=$(curl -sS --unix-socket "${host}" "http://localhost${path}")
+
+ if [[ "${host}" =~ ^([a-z]+)://(.*) ]]; then
+ address="${BASH_REMATCH[2]}"
+ else
+ address="${host}"
+ fi
+
+ if [ -S "${address}" ]; then
+ info "Running API command: curl --unix-socket \"${address}\" http://localhost${path}"
+ JSON=$(curl -sS --unix-socket "${address}" "http://localhost${path}")
else
- info "Running API command: curl \"${host}${path}\""
- JSON=$(curl -sS "${host}${path}")
+ info "Running API command: curl \"${address}${path}\""
+ JSON=$(curl -sS "${address}${path}")
fi
+
if OUTPUT=$(echo "${JSON}" | jq -r '.Config.Env[]?, "CONT_NAME=\(.Name)", "IMAGE_NAME=\(.Config.Image)", (.Config.Labels | to_entries[] | "LABEL_\(.key)=\(.value)")') && [ -n "$OUTPUT" ]; then
parse_docker_like_inspect_output "$OUTPUT"
fi
+
return 0
}
@@ -615,8 +627,8 @@ function podman_validate_id() {
# -----------------------------------------------------------------------------
-DOCKER_HOST="${DOCKER_HOST:=/var/run/docker.sock}"
-PODMAN_HOST="${PODMAN_HOST:=/run/podman/podman.sock}"
+DOCKER_HOST="${DOCKER_HOST:=unix:///var/run/docker.sock}"
+PODMAN_HOST="${PODMAN_HOST:=unix:///run/podman/podman.sock}"
CGROUP_PATH="${1}" # the path as it is (e.g. '/docker/efcf4c409')
CGROUP="${2//\//_}" # the modified path (e.g. 'docker_efcf4c409')
EXIT_SUCCESS=0
From 161657906ffd337295c27c5a019cfaba227a07c8 Mon Sep 17 00:00:00 2001
From: netdatabot
Date: Fri, 16 May 2025 00:50:39 +0000
Subject: [PATCH 43/51] [ci skip] Update changelog and version for nightly
build: v2.5.0-43-nightly.
---
CHANGELOG.md | 8 +++-----
packaging/version | 2 +-
2 files changed, 4 insertions(+), 6 deletions(-)
diff --git a/CHANGELOG.md b/CHANGELOG.md
index 2b440f77fdd609..59b727c79f8d95 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -6,6 +6,8 @@
**Merged pull requests:**
+- add "unix://" scheme to DOCKER\_HOST in run.sh [\#20286](https://github.com/netdata/netdata/pull/20286) ([ilyam8](https://github.com/ilyam8))
+- Regenerate integrations docs [\#20284](https://github.com/netdata/netdata/pull/20284) ([netdatabot](https://github.com/netdatabot))
- Improved StatsD documentation [\#20282](https://github.com/netdata/netdata/pull/20282) ([kanelatechnical](https://github.com/kanelatechnical))
- Regenerate integrations docs [\#20279](https://github.com/netdata/netdata/pull/20279) ([netdatabot](https://github.com/netdatabot))
- docs: update mssql meta [\#20278](https://github.com/netdata/netdata/pull/20278) ([ilyam8](https://github.com/ilyam8))
@@ -31,6 +33,7 @@
- fix\(go.d/snmp\): use ifDescr for interface name if ifName is empty [\#20248](https://github.com/netdata/netdata/pull/20248) ([ilyam8](https://github.com/ilyam8))
- fix\(go.d/sd/snmp\): fix snmpv3 credentials [\#20247](https://github.com/netdata/netdata/pull/20247) ([ilyam8](https://github.com/ilyam8))
- SNMP first cisco yaml file pass [\#20246](https://github.com/netdata/netdata/pull/20246) ([Ancairon](https://github.com/Ancairon))
+- Model Context Protocol Server \(MCP\) for Netdata Part 1 [\#20244](https://github.com/netdata/netdata/pull/20244) ([ktsaou](https://github.com/ktsaou))
- Fix build issue on old distros [\#20243](https://github.com/netdata/netdata/pull/20243) ([stelfrag](https://github.com/stelfrag))
- Session claim id in docker [\#20240](https://github.com/netdata/netdata/pull/20240) ([stelfrag](https://github.com/stelfrag))
- Let the user override the default stack size [\#20236](https://github.com/netdata/netdata/pull/20236) ([stelfrag](https://github.com/stelfrag))
@@ -463,11 +466,6 @@
- build\(deps\): bump github.com/axiomhq/hyperloglog from 0.2.3 to 0.2.5 in /src/go [\#19750](https://github.com/netdata/netdata/pull/19750) ([dependabot[bot]](https://github.com/apps/dependabot))
- build\(deps\): bump github.com/likexian/whois from 1.15.5 to 1.15.6 in /src/go [\#19749](https://github.com/netdata/netdata/pull/19749) ([dependabot[bot]](https://github.com/apps/dependabot))
- build\(deps\): bump go.mongodb.org/mongo-driver from 1.17.2 to 1.17.3 in /src/go [\#19748](https://github.com/netdata/netdata/pull/19748) ([dependabot[bot]](https://github.com/apps/dependabot))
-- build\(deps\): bump github.com/gosnmp/gosnmp from 1.38.0 to 1.39.0 in /src/go [\#19747](https://github.com/netdata/netdata/pull/19747) ([dependabot[bot]](https://github.com/apps/dependabot))
-- build\(deps\): bump github.com/docker/docker from 28.0.0+incompatible to 28.0.1+incompatible in /src/go [\#19746](https://github.com/netdata/netdata/pull/19746) ([dependabot[bot]](https://github.com/apps/dependabot))
-- more strict parsing of the output of system-info.sh [\#19745](https://github.com/netdata/netdata/pull/19745) ([ktsaou](https://github.com/ktsaou))
-- pass NULL to sensors\_init\(\) when the standard files exist in /etc/ [\#19744](https://github.com/netdata/netdata/pull/19744) ([ktsaou](https://github.com/ktsaou))
-- allow coredumps to be generated [\#19743](https://github.com/netdata/netdata/pull/19743) ([ktsaou](https://github.com/ktsaou))
## [v2.2.6](https://github.com/netdata/netdata/tree/v2.2.6) (2025-02-20)
diff --git a/packaging/version b/packaging/version
index 725e52f1ab3711..8ec89c43344785 100644
--- a/packaging/version
+++ b/packaging/version
@@ -1 +1 @@
-v2.5.0-39-nightly
+v2.5.0-43-nightly
From 71e08dba6bfcdb478aaf5872a7b340ec09524f0d Mon Sep 17 00:00:00 2001
From: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
Date: Fri, 16 May 2025 10:33:29 +0300
Subject: [PATCH 44/51] Improve agent shutdown (#20280)
* Join web threads on shutdown
Join cgroups discover thread
No capabilities on LWT
Free statsd structure / join threads
Signal more services on shutdown
Exporting engine free instance
Join statsd collection threads
* Message doing shutdown for pending threads to join
* Add missing lock
* Do not join main WEB thread
* Attempt to join only exited threads
* Switch to sleep_usec
* Make sure to finalize statements before setting thread exit status
* Revert some of the signalling for now
Fix condition for CTXLOAD thread creation failure
---
src/aclk/aclk_tx_msgs.c | 2 +-
src/collectors/cgroups.plugin/sys_fs_cgroup.c | 3 ++
src/collectors/statsd.plugin/statsd.c | 30 +++++++++----------
src/daemon/daemon-shutdown.c | 12 +++++---
src/database/sqlite/sqlite_metadata.c | 3 +-
src/exporting/exporting_engine.c | 1 +
src/health/health_event_loop.c | 3 +-
src/libnetdata/threads/threads.c | 5 ++--
src/web/server/static/static-threaded.c | 26 ++++++++++++++--
9 files changed, 55 insertions(+), 30 deletions(-)
diff --git a/src/aclk/aclk_tx_msgs.c b/src/aclk/aclk_tx_msgs.c
index b5bb2e03f82f0c..2432db17ff04bc 100644
--- a/src/aclk/aclk_tx_msgs.c
+++ b/src/aclk/aclk_tx_msgs.c
@@ -204,7 +204,7 @@ uint16_t aclk_send_agent_connection_update(mqtt_wss_client client, int reachable
.reachable = (reachable ? 1 : 0),
.lwt = 0,
.session_id = aclk_session_newarch,
- .capabilities = aclk_get_agent_capas()
+ .capabilities = aclk_get_agent_capas(),
};
CLAIM_ID claim_id = claim_id_get();
diff --git a/src/collectors/cgroups.plugin/sys_fs_cgroup.c b/src/collectors/cgroups.plugin/sys_fs_cgroup.c
index 7b1f13056ffc46..5052c8834db4b2 100644
--- a/src/collectors/cgroups.plugin/sys_fs_cgroup.c
+++ b/src/collectors/cgroups.plugin/sys_fs_cgroup.c
@@ -1342,6 +1342,9 @@ static void cgroup_main_cleanup(void *pptr) {
sleep_usec(step);
}
}
+ // We should be done, but just in case, avoid blocking shutdown
+ if (__atomic_load_n(&discovery_thread.exited, __ATOMIC_RELAXED))
+ (void) nd_thread_join(discovery_thread.thread);
static_thread->enabled = NETDATA_MAIN_THREAD_EXITED;
}
diff --git a/src/collectors/statsd.plugin/statsd.c b/src/collectors/statsd.plugin/statsd.c
index 4a80f67fd6739e..60e173e2019c30 100644
--- a/src/collectors/statsd.plugin/statsd.c
+++ b/src/collectors/statsd.plugin/statsd.c
@@ -234,7 +234,7 @@ typedef struct statsd_app {
struct collection_thread_status {
SPINLOCK spinlock;
- bool running;
+ bool initializing;
uint32_t max_sockets;
ND_THREAD *thread;
@@ -1083,10 +1083,6 @@ void statsd_collector_thread_cleanup(void *pptr) {
struct statsd_udp *d = CLEANUP_FUNCTION_GET_PTR(pptr);
if(!d) return;
- spinlock_lock(&d->status->spinlock);
- d->status->running = false;
- spinlock_unlock(&d->status->spinlock);
-
#ifdef HAVE_RECVMMSG
size_t i;
for (i = 0; i < d->size; i++)
@@ -1107,7 +1103,7 @@ static bool statsd_should_stop(void) {
void *statsd_collector_thread(void *ptr) {
struct collection_thread_status *status = ptr;
spinlock_lock(&status->spinlock);
- status->running = true;
+ status->initializing = false;
spinlock_unlock(&status->spinlock);
worker_register("STATSD");
@@ -2402,17 +2398,18 @@ static void statsd_main_cleanup(void *pptr) {
if (statsd.collection_threads_status) {
int i;
for (i = 0; i < statsd.threads; i++) {
- spinlock_lock(&statsd.collection_threads_status[i].spinlock);
-
- if(statsd.collection_threads_status[i].running) {
- collector_info("STATSD: signalling data collection thread %d to stop...", i + 1);
- nd_thread_signal_cancel(statsd.collection_threads_status[i].thread);
- }
- else
- collector_info("STATSD: data collection thread %d found stopped.", i + 1);
-
- spinlock_unlock(&statsd.collection_threads_status[i].spinlock);
+ bool initializing;
+ do {
+ spinlock_lock(&statsd.collection_threads_status[i].spinlock);
+ initializing = statsd.collection_threads_status[i].initializing;
+ spinlock_unlock(&statsd.collection_threads_status[i].spinlock);
+ if (unlikely(initializing))
+ sleep_usec(1000);
+ } while(initializing);
+
+ (void) nd_thread_join(statsd.collection_threads_status[i].thread);
}
+ freez(statsd.collection_threads_status);
}
collector_info("STATSD: closing sockets...");
@@ -2613,6 +2610,7 @@ void *statsd_main(void *ptr) {
char tag[NETDATA_THREAD_TAG_MAX + 1];
snprintfz(tag, NETDATA_THREAD_TAG_MAX, "STATSD_IN[%d]", i + 1);
spinlock_init(&statsd.collection_threads_status[i].spinlock);
+ statsd.collection_threads_status[i].initializing = true;
statsd.collection_threads_status[i].thread = nd_thread_create(tag, NETDATA_THREAD_OPTION_DEFAULT,
statsd_collector_thread, &statsd.collection_threads_status[i]);
}
diff --git a/src/daemon/daemon-shutdown.c b/src/daemon/daemon-shutdown.c
index 714cbceb604d09..ed66da73b5c0b5 100644
--- a/src/daemon/daemon-shutdown.c
+++ b/src/daemon/daemon-shutdown.c
@@ -85,9 +85,10 @@ void cancel_main_threads(void) {
}
for (i = 0; static_threads[i].name != NULL ; i++) {
- struct netdata_static_thread *st = &static_threads[i];
- if(st->thread && !nd_thread_is_me(static_threads[i].thread))
- nd_thread_join(st->thread);
+ if(static_threads[i].thread && !nd_thread_is_me(static_threads[i].thread)) {
+ if (static_threads[i].enabled == NETDATA_MAIN_THREAD_EXITED)
+ nd_thread_join(static_threads[i].thread);
+ }
}
netdata_log_info("All threads finished.");
@@ -197,8 +198,11 @@ static void netdata_cleanup_and_exit(EXIT_REASON reason, bool abnormal, bool exi
webrtc_close_all_connections();
watcher_step_complete(WATCHER_STEP_ID_CLOSE_WEBRTC_CONNECTIONS);
- service_signal_exit(SERVICE_MAINTENANCE | ABILITY_DATA_QUERIES | ABILITY_WEB_REQUESTS |
+ service_signal_exit(SERVICE_MAINTENANCE | ABILITY_DATA_QUERIES | ABILITY_WEB_REQUESTS | SERVICE_ACLK |
ABILITY_STREAMING_CONNECTIONS | SERVICE_SYSTEMD);
+
+ service_signal_exit(SERVICE_EXPORTERS | SERVICE_HEALTH | SERVICE_WEB_SERVER | SERVICE_HTTPD);
+
watcher_step_complete(WATCHER_STEP_ID_DISABLE_MAINTENANCE_NEW_QUERIES_NEW_WEB_REQUESTS_NEW_STREAMING_CONNECTIONS);
service_wait_exit(SERVICE_MAINTENANCE | SERVICE_SYSTEMD, 5 * USEC_PER_SEC);
diff --git a/src/database/sqlite/sqlite_metadata.c b/src/database/sqlite/sqlite_metadata.c
index 4ae08afe2754f6..4ca34df60ac855 100644
--- a/src/database/sqlite/sqlite_metadata.c
+++ b/src/database/sqlite/sqlite_metadata.c
@@ -1935,7 +1935,8 @@ static void ctx_hosts_load(uv_work_t *req)
__atomic_store_n(&hclt[thread_index].busy, true, __ATOMIC_RELAXED);
hclt[thread_index].host = host;
hclt[thread_index].thread = nd_thread_create("CTXLOAD", NETDATA_THREAD_OPTION_DEFAULT, restore_host_context, &hclt[thread_index]);
- async_exec += (hclt[thread_index].thread != NULL);
+ rc = (hclt[thread_index].thread == NULL);
+ async_exec += (rc == 0);
// if it failed, mark the thread slot as free
if (rc)
__atomic_store_n(&hclt[thread_index].busy, false, __ATOMIC_RELAXED);
diff --git a/src/exporting/exporting_engine.c b/src/exporting/exporting_engine.c
index 3a43763a415564..31971ac13954ca 100644
--- a/src/exporting/exporting_engine.c
+++ b/src/exporting/exporting_engine.c
@@ -106,6 +106,7 @@ static void exporting_clean_engine()
instance = instance->next;
clean_instance(current_instance);
+ freez(current_instance);
}
freez((void *)engine->config.hostname);
diff --git a/src/health/health_event_loop.c b/src/health/health_event_loop.c
index 2d99d5aef6f102..fd5591b05d2b34 100644
--- a/src/health/health_event_loop.c
+++ b/src/health/health_event_loop.c
@@ -702,9 +702,8 @@ static void health_main_cleanup(void *pptr) {
worker_unregister();
static_thread->enabled = NETDATA_MAIN_THREAD_EXITING;
- static_thread->enabled = NETDATA_MAIN_THREAD_EXITED;
-
finalize_self_prepared_sql_statements();
+ static_thread->enabled = NETDATA_MAIN_THREAD_EXITED;
nd_log(NDLS_DAEMON, NDLP_DEBUG, "Health thread ended.");
}
diff --git a/src/libnetdata/threads/threads.c b/src/libnetdata/threads/threads.c
index 3a556ab0e34eef..f14fdbeea604ba 100644
--- a/src/libnetdata/threads/threads.c
+++ b/src/libnetdata/threads/threads.c
@@ -272,6 +272,7 @@ void nd_thread_join_threads()
if (nti) {
DOUBLE_LINKED_LIST_REMOVE_ITEM_UNSAFE(threads_globals.exited.list, nti, prev, next);
nti->list = ND_THREAD_LIST_NONE;
+ nd_log_daemon(NDLP_DEBUG, "nd_thread_join_threads: Joining thread with id %d (%s) during shutdown", nti->tid, nti->tag);
}
spinlock_unlock(&threads_globals.exited.spinlock);
@@ -457,10 +458,8 @@ int nd_thread_join(ND_THREAD *nti) {
if(!nti)
return ESRCH;
- if(nd_thread_status_check(nti, NETDATA_THREAD_STATUS_JOINED)) {
- freez(nti);
+ if(nd_thread_status_check(nti, NETDATA_THREAD_STATUS_JOINED))
return 0;
- }
int ret;
if((ret = uv_thread_join(&nti->thread))) {
diff --git a/src/web/server/static/static-threaded.c b/src/web/server/static/static-threaded.c
index aa1692e33a676e..1e65a817200fbc 100644
--- a/src/web/server/static/static-threaded.c
+++ b/src/web/server/static/static-threaded.c
@@ -64,7 +64,8 @@ struct web_server_static_threaded_worker {
ND_THREAD *thread;
int id;
- int running;
+ bool initializing;
+ SPINLOCK spinlock;
size_t max_sockets;
@@ -274,7 +275,6 @@ static void socket_listen_main_static_threaded_worker_cleanup(void *pptr) {
worker_private->sends
);
- worker_private->running = 0;
worker_unregister();
}
@@ -284,7 +284,9 @@ static bool web_server_should_stop(void) {
void *socket_listen_main_static_threaded_worker(void *ptr) {
worker_private = ptr;
- worker_private->running = 1;
+ spinlock_lock(&worker_private->spinlock);
+ worker_private->initializing = false;
+ spinlock_unlock(&worker_private->spinlock);
worker_register("WEB");
worker_register_job_name(WORKER_JOB_ADD_CONNECTION, "connect");
worker_register_job_name(WORKER_JOB_DEL_COLLECTION, "disconnect");
@@ -361,6 +363,20 @@ static void socket_listen_main_static_threaded_cleanup(void *pptr) {
listen_sockets_close(&api_sockets);
netdata_log_info("all static web threads stopped.");
+
+ // Lets join all threads
+ for (int i = 1; i < static_threaded_workers_count; i++) {
+ bool initializing;
+ do {
+ spinlock_lock(&static_workers_private_data[i].spinlock);
+ initializing = static_workers_private_data[i].initializing;
+ spinlock_unlock(&static_workers_private_data[i].spinlock);
+ if (unlikely(initializing))
+ sleep_usec(1000);
+ } while(initializing);
+ (void) nd_thread_join(static_workers_private_data[i].thread);
+ }
+
static_thread->enabled = NETDATA_MAIN_THREAD_EXITED;
}
@@ -387,6 +403,8 @@ void *socket_listen_main_static_threaded(void *ptr) {
sizeof(struct web_server_static_threaded_worker));
int i;
+ spinlock_init(&static_workers_private_data[0].spinlock);
+ static_workers_private_data[0].initializing = true;
for (i = 1; i < static_threaded_workers_count; i++) {
static_workers_private_data[i].id = i;
static_workers_private_data[i].max_sockets = max_sockets / static_threaded_workers_count;
@@ -394,6 +412,8 @@ void *socket_listen_main_static_threaded(void *ptr) {
char tag[50 + 1];
snprintfz(tag, sizeof(tag) - 1, "WEB[%d]", i+1);
+ spinlock_init(&static_workers_private_data[i].spinlock);
+ static_workers_private_data[i].initializing = true;
static_workers_private_data[i].thread = nd_thread_create(tag, NETDATA_THREAD_OPTION_DEFAULT,
socket_listen_main_static_threaded_worker,
(void *)&static_workers_private_data[i]);
From c87793016585ab5f7a587fa0d2a5ccbcaaf09ddf Mon Sep 17 00:00:00 2001
From: Fotis Voutsas
Date: Fri, 16 May 2025 11:10:57 +0300
Subject: [PATCH 45/51] comment metric tags that could be metrics (#20272)
---
.../snmp.profiles/default/3com-huawei.yaml | 2 ++
.../snmp.profiles/default/_cisco-generic.yaml | 10 ++++++
.../default/_cisco-ipsec-flow-monitor.yaml | 2 ++
.../snmp.profiles/default/_cisco-voice.yaml | 1 +
.../snmp.profiles/default/_cisco-wlc.yaml | 3 ++
.../go.d/snmp.profiles/default/_dell-rac.yaml | 14 ++++++++
.../snmp.profiles/default/_generic-bgp4.yaml | 1 +
.../default/_generic-entity-sensor.yaml | 1 +
.../snmp.profiles/default/_generic-ip.yaml | 2 ++
.../snmp.profiles/default/_generic-lldp.yaml | 1 +
.../snmp.profiles/default/_generic-ospf.yaml | 8 +++++
.../default/_hp-compaq-health.yaml | 2 ++
.../snmp.profiles/default/a10-thunder.yaml | 9 +++++
.../default/alcatel-lucent-ent.yaml | 4 +++
.../default/alcatel-lucent-ind.yaml | 1 +
.../alcatel-lucent-omni-access-wlc.yaml | 2 ++
.../snmp.profiles/default/apc-netbotz.yaml | 9 +++++
.../go.d/snmp.profiles/default/apc-pdu.yaml | 6 ++++
.../go.d/snmp.profiles/default/apc_ups.yaml | 1 +
.../snmp.profiles/default/arista-switch.yaml | 1 +
.../snmp.profiles/default/aruba-switch.yaml | 1 +
.../default/aruba-wireless-controller.yaml | 5 ++-
.../default/avaya-cajun-switch.yaml | 1 +
.../default/avaya-media-gateway.yaml | 3 ++
.../avaya-nortel-ethernet-routing-switch.yaml | 2 +-
.../snmp.profiles/default/avocent-acs.yaml | 1 +
.../default/barracuda-cloudgen.yaml | 5 +++
.../default/brocade-fc-switch.yaml | 2 ++
.../snmp.profiles/default/chatsworth_pdu.yaml | 1 +
.../snmp.profiles/default/checkpoint.yaml | 1 +
.../default/cisco-firepower-asa.yaml | 1 +
.../default/cisco-firepower.yaml | 2 ++
.../default/cisco-ironport-email.yaml | 3 ++
.../snmp.profiles/default/cisco-nexus.yaml | 1 +
.../go.d/snmp.profiles/default/cisco-ucs.yaml | 8 +++++
.../default/citrix-netscaler-sdx.yaml | 2 ++
.../default/citrix-netscaler.yaml | 12 +++++--
.../snmp.profiles/default/cradlepoint.yaml | 1 +
.../snmp.profiles/default/cyberpower-pdu.yaml | 10 ++++++
.../default/dell-emc-data-domain.yaml | 6 ++++
.../go.d/snmp.profiles/default/dell-os10.yaml | 5 +++
.../default/dell-powerconnect.yaml | 2 ++
.../snmp.profiles/default/dell-poweredge.yaml | 16 +++++++++
.../default/dlink-dgs-switch.yaml | 6 ++++
.../snmp.profiles/default/eaton-epdu.yaml | 15 ++++++++
.../go.d/snmp.profiles/default/f5-big-ip.yaml | 6 ++++
.../go.d/snmp.profiles/default/fireeye.yaml | 1 +
.../default/fortinet-appliance.yaml | 3 ++
.../default/fortinet-fortigate.yaml | 2 ++
.../default/hpe-bladesystem-enclosure.yaml | 8 +++++
.../go.d/snmp.profiles/default/hpe-msa.yaml | 3 ++
.../snmp.profiles/default/hpe-nimble.yaml | 1 +
.../snmp.profiles/default/huawei-routers.yaml | 2 ++
.../default/huawei-switches.yaml | 2 ++
.../go.d/snmp.profiles/default/huawei.yaml | 3 ++
.../default/ibm-datapower-gateway.yaml | 2 ++
.../default/ibm-lenovo-server.yaml | 8 +++++
.../default/infinera-coriant-groove.yaml | 2 ++
.../snmp.profiles/default/infoblox-ipam.yaml | 1 +
.../default/kyocera-printer.yaml | 2 ++
.../default/meraki-cloud-controller.yaml | 1 +
.../snmp.profiles/default/nasuni-filer.yaml | 1 +
.../default/netgear-readynas.yaml | 3 ++
.../default/opengear-console-manager.yaml | 2 ++
.../opengear-infrastructure-manager.yaml | 1 +
.../go.d/snmp.profiles/default/peplink.yaml | 3 ++
.../default/raritan-dominion.yaml | 2 ++
.../default/servertech-pdu3.yaml | 5 +++
.../default/servertech-pdu4.yaml | 9 +++++
.../default/silverpeak-edgeconnect.yaml | 1 +
.../default/sinetica-eagle-i.yaml | 2 ++
.../default/sophos-xgs-firewall.yaml | 1 +
.../default/synology-disk-station.yaml | 3 ++
.../go.d/snmp.profiles/default/tp-link.yaml | 1 +
.../snmp.profiles/default/tripplite-pdu.yaml | 5 +++
.../snmp.profiles/default/tripplite-ups.yaml | 4 +++
.../snmp.profiles/default/ubiquiti-unifi.yaml | 36 +++++++++----------
.../default/vertiv-watchdog.yaml | 18 +++++-----
.../snmp.profiles/default/vmware-esx.yaml | 2 ++
.../snmp.profiles/default/zebra-printer.yaml | 1 +
.../snmp.profiles/default/zyxel-switch.yaml | 1 +
81 files changed, 309 insertions(+), 31 deletions(-)
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/3com-huawei.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/3com-huawei.yaml
index 00f3c9fa5b7489..4db2c15d364197 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/3com-huawei.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/3com-huawei.yaml
@@ -42,6 +42,7 @@ metrics:
constant_value_one: true
description: Fan status indicator
unit: "TBD"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- index: 1
tag: fan_num
@@ -63,6 +64,7 @@ metrics:
constant_value_one: true
description: Power status indicator
unit: "TBD"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- index: 1
tag: power_num
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/_cisco-generic.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/_cisco-generic.yaml
index 6f7e8ccdae3960..a1ae14b72caabf 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/_cisco-generic.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/_cisco-generic.yaml
@@ -40,6 +40,7 @@ metrics:
constant_value_one: true
description: FRU power status constant value one
unit: "TBD"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- index: 1
tag: fru
@@ -124,6 +125,7 @@ metrics:
name: ciscoEnvMonTemperatureStatusValue
description: Current measurement of the testpoint being instrumented
unit: "Cel"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- tag: temp_state
symbol:
@@ -147,6 +149,7 @@ metrics:
name: ciscoEnvMonSupplyState
description: Current state of the power supply being instrumented
unit: "{power_state}"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.9.9.13.1.5.1.4
@@ -172,6 +175,7 @@ metrics:
constant_value_one: true
description: Power supply status constant value one
unit: "TBD"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.9.9.13.1.5.1.3
@@ -225,6 +229,7 @@ metrics:
constant_value_one: true
description: Fan status constant value one
unit: "TBD"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.9.9.13.1.4.1.3
@@ -298,6 +303,7 @@ metrics:
constant_value_one: true
description: Switch info constant value one
unit: "TBD"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- tag: mac_addr
symbol:
@@ -350,6 +356,7 @@ metrics:
constant_value_one: true
description: Fan tray status constant value one
unit: "TBD"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- index: 1
tag: fru
@@ -404,6 +411,7 @@ metrics:
name: cfwConnectionStatCount
description: Integer value containing the resource statistic count
unit: "{connection_statistic}"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.9.9.147.1.2.2.2.1.2
@@ -463,6 +471,7 @@ metrics:
name: rttMonLatestRttOperSense
description: Sense code for the completion status of the latest RTT operation
unit: "{sense_code}"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.9.9.42.1.2.9.1.10
@@ -556,6 +565,7 @@ metrics:
name: rttMonCtrlOperTimeoutOccurred
description: Indicates if a timeout occurred for the RTT operation
unit: "1"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.9.9.42.1.2.9.1.10
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/_cisco-ipsec-flow-monitor.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/_cisco-ipsec-flow-monitor.yaml
index 7a5f7c8edfe4c8..e2b53b96c140d3 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/_cisco-ipsec-flow-monitor.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/_cisco-ipsec-flow-monitor.yaml
@@ -32,6 +32,7 @@ metrics:
name: cikeTunOutDropPkts
description: The total number of packets dropped by this IPsec Phase-1 IKE Tunnel during send processing.
unit: "{packet}"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- index: 1
tag: phase_1_tunnel_index
@@ -99,6 +100,7 @@ metrics:
name: cipSecTunOutEncryptFails
description: The total number of outbound encryption's which ended in failure by this IPsec Phase-2 Tunnel.
unit: "{encryption_failure}"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- index: 1
tag: phase_2_tunnel_index
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/_cisco-voice.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/_cisco-voice.yaml
index c6197b3eac9a3c..3b84be2d62115e 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/_cisco-voice.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/_cisco-voice.yaml
@@ -187,6 +187,7 @@ metrics:
OID: 1.3.6.1.4.1.9.9.473.1.3.6.1.4
description: Last known status of the enterprise contact center application peripheral interface manager functional component.
unit: "1"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.9.9.473.1.3.6.1.1
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/_cisco-wlc.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/_cisco-wlc.yaml
index fc7e05e11d0be1..38649a8f795a21 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/_cisco-wlc.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/_cisco-wlc.yaml
@@ -31,6 +31,7 @@ metrics:
OID: 1.3.6.1.4.1.14179.2.2.1.1.19
name: bsnApIpAddress
tag: ap_ip_address
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
- symbol:
OID: 1.3.6.1.4.1.14179.2.2.1.1.6
name: bsnAPOperationStatus
@@ -93,6 +94,7 @@ metrics:
- start: 0
end: 5
tag: ap_ip_address
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
- symbol:
OID: 1.3.6.1.4.1.14179.2.2.2.1.12
name: bsnAPIfOperStatus
@@ -236,6 +238,7 @@ metrics:
OID: 1.3.6.1.4.1.14179.2.1.1.1.2
name: bsnDot11EssSsid
tag: ssid
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
- symbol:
OID: 1.3.6.1.4.1.14179.2.1.1.1.6
name: bsnDot11EssAdminStatus
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/_dell-rac.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/_dell-rac.yaml
index fb47388452eb85..88d02917a8803c 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/_dell-rac.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/_dell-rac.yaml
@@ -70,6 +70,7 @@ metrics:
symbols:
- name: dell.systemState
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.674.10892.5.4.200.10.1.1
@@ -279,6 +280,7 @@ metrics:
symbols:
- name: dell.physicalDisk
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.674.10892.5.5.1.20.130.4.1.4
@@ -336,6 +338,7 @@ metrics:
symbols:
- name: enclosurePowerSupply
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.674.10892.5.5.1.20.130.9.1.4
@@ -386,6 +389,7 @@ metrics:
symbols:
- name: dell.battery
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.674.10892.5.5.1.20.130.15.1.4
@@ -450,6 +454,7 @@ metrics:
symbols:
- name: dell.controller
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.674.10892.5.5.1.20.130.1.1.37
@@ -513,6 +518,7 @@ metrics:
symbols:
- name: dell.pCIDevice
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.674.10892.5.4.1100.80.1.5
@@ -560,6 +566,7 @@ metrics:
symbols:
- name: dell.systemSlot
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.674.10892.5.4.1200.10.1.5
@@ -611,6 +618,7 @@ metrics:
symbols:
- name: dell.networkDevice
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.674.10892.5.4.1100.90.1.3
@@ -666,6 +674,7 @@ metrics:
symbols:
- name: dell.systemBIOS
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.674.10892.5.4.300.50.1.1
@@ -798,6 +807,7 @@ metrics:
symbols:
- name: dell.amperageProbe
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.674.10892.5.4.600.30.1.7
@@ -924,6 +934,7 @@ metrics:
symbols:
- name: dell.voltageProbe
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.674.10892.5.4.600.20.1.1
@@ -1052,6 +1063,7 @@ metrics:
OID: 1.3.6.1.4.1.674.10892.5.4.700.12.1.2
name: coolingDeviceIndex
tag: cooling_device_name
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
- symbol:
OID: 1.3.6.1.4.1.674.10892.5.4.700.12.1.7
name: coolingDeviceType
@@ -1102,6 +1114,7 @@ metrics:
OID: 1.3.6.1.4.1.674.10892.5.4.700.20.1.2
name: temperatureProbeIndex
tag: temperature_probe_index
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
- symbol:
OID: 1.3.6.1.4.1.674.10892.5.4.700.20.1.7
name: temperatureProbeType
@@ -1208,6 +1221,7 @@ metrics:
symbols:
- name: dell.memoryDevice
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.674.10892.5.4.1100.50.1.7
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/_generic-bgp4.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/_generic-bgp4.yaml
index 8f47dbded79b8c..6d4f4356da8487 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/_generic-bgp4.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/_generic-bgp4.yaml
@@ -72,6 +72,7 @@ metrics:
name: bgpPeerMinASOriginationInterval
description: Time interval in seconds for the MinASOriginationInterval timer
unit: "s"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- tag: remote_as
symbol:
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/_generic-entity-sensor.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/_generic-entity-sensor.yaml
index 4b15cd5d2e6239..0bd7bed71a71ba 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/_generic-entity-sensor.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/_generic-entity-sensor.yaml
@@ -8,6 +8,7 @@ metrics:
name: entPhySensorValue
description: The most recent measurement obtained by the agent for this sensor
unit: "TBD"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.2.1.99.1.1.1.1
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/_generic-ip.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/_generic-ip.yaml
index cdc28470a93ad6..1aaf19eea352d6 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/_generic-ip.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/_generic-ip.yaml
@@ -123,6 +123,7 @@ metrics:
name: ipSystemStatsHCOutBcastPkts
description: The number of IP broadcast datagrams transmitted
unit: "{datagram}"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- index: 1
tag: ipversion
@@ -247,6 +248,7 @@ metrics:
name: ipIfStatsHCOutBcastPkts
description: The number of IP broadcast datagrams transmitted
unit: "{datagram}"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- index: 1
tag: ipversion
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/_generic-lldp.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/_generic-lldp.yaml
index 9b75d7abd701bf..8a78e13bb07908 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/_generic-lldp.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/_generic-lldp.yaml
@@ -7,6 +7,7 @@ metrics:
- name: lldpRem
constant_value_one: true
metric_tags:
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
- symbol:
OID: 1.0.8802.1.1.2.1.4.1.1.6
name: lldpRemPortIdSubtype
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/_generic-ospf.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/_generic-ospf.yaml
index d7527f516bcfa3..8153ddb86d6972 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/_generic-ospf.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/_generic-ospf.yaml
@@ -16,6 +16,7 @@ metrics:
name: ospfNbrLsRetransQLen
description: The current length of the retransmission queue
unit: "{retransmission_queue}"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.2.1.14.10.1.3
@@ -45,6 +46,7 @@ metrics:
symbols:
- name: ospfNbr
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.2.1.14.10.1.3
@@ -84,6 +86,7 @@ metrics:
name: ospfVirtNbrLsRetransQLen
description: The current length of the retransmission queue
unit: "{retransmission_queue}"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.2.1.14.11.1.3
@@ -113,6 +116,7 @@ metrics:
symbols:
- name: ospfVirtNbr
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.2.1.14.11.1.3
@@ -152,6 +156,7 @@ metrics:
name: ospfIfLsaCount
description: The total number of link-local link state advertisements in this interface's link-local link state database
unit: "{link_state_advertisement}"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.2.1.14.7.1.1
@@ -182,6 +187,7 @@ metrics:
symbols:
- name: ospfIf
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.2.1.14.7.1.1
@@ -222,6 +228,7 @@ metrics:
name: ospfVirtIfLsaCount
description: The total number of link-local link state advertisements in this virtual interface's link-local link state database
unit: "{link_state_advertisement}"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.2.1.14.11.1.2
@@ -243,6 +250,7 @@ metrics:
symbols:
- name: ospfVirtIf
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.2.1.14.11.1.2
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/_hp-compaq-health.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/_hp-compaq-health.yaml
index 7f06c96b03bcb0..d7d248799e3d70 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/_hp-compaq-health.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/_hp-compaq-health.yaml
@@ -232,6 +232,7 @@ metrics:
name: cpqHeFltTolPowerSupplyStatus
description: Status of the fault tolerant power supply
unit: "{instance}"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.232.6.2.9.3.1.1
@@ -253,6 +254,7 @@ metrics:
name: cpqHeFltTolPowerSupplyCapacityMaximum
description: Maximum capacity of the power supply in watts
unit: "W"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.232.6.2.9.3.1.5
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/a10-thunder.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/a10-thunder.yaml
index 56c25e5507bd6b..1fcbd953a07a46 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/a10-thunder.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/a10-thunder.yaml
@@ -241,6 +241,7 @@ metrics:
OID: 1.3.6.1.4.1.22610.2.4.1.5.9.1.4
description: The fan1's speed
unit: "1.s"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: axFanName
@@ -267,6 +268,7 @@ metrics:
constant_value_one: true
description: TBD
unit: "TBD"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: axPowerSupplyVoltageDescription
@@ -289,6 +291,7 @@ metrics:
constant_value_one: true
description: TBD
unit: "TBD"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: axPowerSupplyName
@@ -330,6 +333,7 @@ metrics:
constant_value_one: true
description: TBD
unit: "TBD"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: axServerName
@@ -364,6 +368,7 @@ metrics:
metric_type: monotonic_count
description: The number of current L7 requests if applicable
unit: "{request}"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: axServerStatName
@@ -386,6 +391,7 @@ metrics:
constant_value_one: true
description: TBD
unit: "TBD"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: axServiceGroupName
@@ -431,6 +437,7 @@ metrics:
constant_value_one: true
description: TBD
unit: "TBD"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: axVirtualServerName
@@ -510,6 +517,7 @@ metrics:
metric_type: monotonic_count
description: The number of successful L7 requests if applicable
unit: "{request}"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: axVirtualServerStatAddress
@@ -544,6 +552,7 @@ metrics:
OID: 1.3.6.1.4.1.22610.2.4.3.4.4.1.1.12
description: Current connections from client side
unit: "{connection}"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: axVirtualServerPortStatAddress
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/alcatel-lucent-ent.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/alcatel-lucent-ent.yaml
index a86a5813f81e09..adf95a0b535c0d 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/alcatel-lucent-ent.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/alcatel-lucent-ent.yaml
@@ -44,6 +44,7 @@ metrics:
constant_value_one: true
description: Physical entity constant value one
unit: "{physical_entity}"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
# MIB: ENTITY-MIB
# table: entPhysicalTable
@@ -280,6 +281,7 @@ metrics:
OID: 1.3.6.1.4.1.6486.801.1.1.1.1.1.2.1.1
description: Current temperature of the physical entity
unit: "Cel"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
# MIB: ENTITY-MIB
# table: entPhysicalTable
@@ -342,6 +344,7 @@ metrics:
OID: 1.3.6.1.4.1.6486.801.1.1.1.3.1.1.11.1.4
description: Fan speed in revolutions per minute
unit: "1.min"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.6486.801.1.1.1.3.1.1.11.1.2
@@ -371,6 +374,7 @@ metrics:
name: alcatel.ent.alaChasBpsPowerSupplySerialNum
OID: 1.3.6.1.4.1.6486.801.1.1.1.3.1.1.14.4.1.8
tag: ala_chas_bps_power_supply_serial_num
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
- symbol:
OID: 1.3.6.1.4.1.6486.801.1.1.1.3.1.1.14.4.1.10
name: alcatel.ent.alaChasBpsPowerSupplyOperStatus
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/alcatel-lucent-ind.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/alcatel-lucent-ind.yaml
index 8060d4037a27c4..f82d70c545d04f 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/alcatel-lucent-ind.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/alcatel-lucent-ind.yaml
@@ -28,6 +28,7 @@ metrics:
symbols:
- name: alcatel.ind.chasEntPhysical
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
# MIB: ENTITY-MIB
# table: entPhysicalTable
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/alcatel-lucent-omni-access-wlc.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/alcatel-lucent-omni-access-wlc.yaml
index ac3cf2f9b92915..36d6e9ba89a0a4 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/alcatel-lucent-omni-access-wlc.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/alcatel-lucent-omni-access-wlc.yaml
@@ -37,6 +37,7 @@ metrics:
OID: 1.3.6.1.4.1.14823.2.2.1.1.1.10.1.4
description: Size of the storage filesystem in MB.
unit: "MBy"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.14823.2.2.1.1.1.10.1.2
@@ -85,6 +86,7 @@ metric_tags:
- OID: 1.3.6.1.4.1.14823.2.2.1.1.1.2.0
symbol: wlsxModelName
tag: wlsx_model_name
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
- OID: 1.3.6.1.4.1.14823.2.2.1.1.1.4.0
symbol: wlsxSwitchRole
tag: wlsx_switch_role
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/apc-netbotz.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/apc-netbotz.yaml
index 370a7ab5bcc066..56d75ab5984559 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/apc-netbotz.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/apc-netbotz.yaml
@@ -15,6 +15,7 @@ metrics:
symbols:
- name: netbotz.enclosure
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.5528.100.2.1.1.7
@@ -64,6 +65,7 @@ metrics:
symbols:
- name: netbotz.dinPort
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.5528.100.3.1.1.9
@@ -94,6 +96,7 @@ metrics:
symbols:
- name: netbotz.otherPort
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.5528.100.3.10.1.5
@@ -163,6 +166,7 @@ metrics:
tag: netbotz_temp_sensor_enc_id
description: The id of the physical enclosure containing the sensor
unit: "{enclosure}"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
- symbol:
OID: 1.3.6.1.4.1.5528.100.4.1.1.1.3
name: tempSensorErrorStatus
@@ -208,6 +212,7 @@ metrics:
tag: netbotz_humi_sensor_enc_id
description: The id of the physical enclosure containing the sensor
unit: "{enclosure}"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
- symbol:
OID: 1.3.6.1.4.1.5528.100.4.1.2.1.3
name: humiSensorErrorStatus
@@ -293,6 +298,7 @@ metrics:
tag: netbotz_other_numeric_sensor_units
description: The unit of measure for the sensor value
unit: "TBD"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
- symbol:
OID: 1.3.6.1.4.1.5528.100.4.1.10.1.3
name: otherNumericSensorErrorStatus
@@ -313,6 +319,7 @@ metrics:
symbols:
- name: netbotz.doorSwitchSensor
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.5528.100.4.2.2.1.8
@@ -361,6 +368,7 @@ metrics:
symbols:
- name: netbotz.otherStateSensor
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.5528.100.4.2.10.1.4
@@ -406,6 +414,7 @@ metrics:
symbols:
- name: netbotz.errorCond
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.5528.100.5.1.1.12
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/apc-pdu.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/apc-pdu.yaml
index 2aeae9425fbddf..2cf1d0bfadae78 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/apc-pdu.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/apc-pdu.yaml
@@ -19,6 +19,7 @@ metrics:
name: powernet.rPDULoadStatusLoad
description: Load status of the PDU
unit: "TBD"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- tag: powernet_r_pdu_load_status_index
symbol:
@@ -42,6 +43,7 @@ metrics:
name: powernet.rPDUOutletStatusLoad
description: Load status of the PDU outlet
unit: "TBD"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- tag: powernet_r_pdu_outlet_status_index
symbol:
@@ -92,6 +94,7 @@ metrics:
constant_value_one: true
description: Status of the PDU bank
unit: "TBD"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- tag: powernet_r_pdu_status_bank_index
symbol:
@@ -119,6 +122,7 @@ metrics:
constant_value_one: true
description: Status of the PDU phase
unit: "TBD"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- tag: powernet_r_pdu_status_phase_index
symbol:
@@ -146,6 +150,7 @@ metrics:
constant_value_one: true
description: Status of the PDU outlet
unit: "TBD"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- tag: powernet_r_pdu_status_outlet_index
symbol:
@@ -181,6 +186,7 @@ metrics:
name: powernet.rPDU2SensorTempHumidityStatusRelativeHumidity
description: Relative humidity percentage
unit: "%"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- tag: powernet_r_pdu2_sensor_temp_humidity_status_name
symbol:
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/apc_ups.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/apc_ups.yaml
index 8a260e728011d7..8a2cfa00e3aa03 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/apc_ups.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/apc_ups.yaml
@@ -232,6 +232,7 @@ metrics:
constant_value_one: true
- OID: 1.3.6.1.4.1.318.1.1.1.12.1.2.1.3
name: upsOutletGroupStatusGroupState
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- tag: outlet_group_name
symbol:
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/arista-switch.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/arista-switch.yaml
index 2c59cab026215b..cf4032020d47d3 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/arista-switch.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/arista-switch.yaml
@@ -204,6 +204,7 @@ metrics:
OID: 1.3.6.1.4.1.30065.4.1.1.2.1.10
description: "The remote autonomous system number received in the BGP OPEN message."
unit: "1"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: aristaBgp4V2PeerLocalAddr
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/aruba-switch.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/aruba-switch.yaml
index c695956b3a0a44..5b1a74bcfb7ad9 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/aruba-switch.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/aruba-switch.yaml
@@ -97,6 +97,7 @@ metrics:
constant_value_one: true
description: Fan constant value one
unit: "TBD"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- tag: fan_index
symbol:
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/aruba-wireless-controller.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/aruba-wireless-controller.yaml
index 672c4bacfd2483..703cb20a5d0425 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/aruba-wireless-controller.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/aruba-wireless-controller.yaml
@@ -81,6 +81,7 @@ metrics:
OID: 1.3.6.1.4.1.14823.2.2.1.1.1.10.1.4
description: Size of the storage filesystem in MB
unit: "MBy"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: sysXStorageType
@@ -106,6 +107,7 @@ metrics:
OID: 1.3.6.1.4.1.14823.2.2.1.1.3.3.1.14
description: Signal to noise ratio for the BSSID
unit: "dB"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: apESSID
@@ -284,6 +286,7 @@ metrics:
OID: 1.3.6.1.4.1.14823.2.2.1.20.1.5.1.4
description: Number of AP hbt GRE tunnels
unit: "{tunnel}"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.14823.2.2.1.20.1.1.1.2
@@ -291,7 +294,6 @@ metrics:
table: wlsxHighAvalabilityConfigTable
tag: ha_membership
# The actual index of the table is haProfileName that is not accessible
-
- MIB: WLSX-WLAN-MIB
table:
name: wlsxWlanStationTable
@@ -313,6 +315,7 @@ metrics:
OID: 1.3.6.1.4.1.14823.2.2.1.5.2.2.1.1.17
description: Transmit rate code of the station
unit: "TBD"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: wlanStaPhyType
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/avaya-cajun-switch.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/avaya-cajun-switch.yaml
index 55f7cc48ba3477..6f1e2b132823e0 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/avaya-cajun-switch.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/avaya-cajun-switch.yaml
@@ -32,6 +32,7 @@ metrics:
symbols:
- name: avaya.genPort
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: avaya.genPortId
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/avaya-media-gateway.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/avaya-media-gateway.yaml
index 63dc17b14dec37..c3973c68868abf 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/avaya-media-gateway.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/avaya-media-gateway.yaml
@@ -29,6 +29,7 @@ metrics:
symbols:
- name: avaya.avEntPhyChFru
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- index: 1
tag: avaya_ent_physical_index
@@ -107,6 +108,7 @@ metrics:
OID: 1.3.6.1.4.1.6889.2.9.1.4.5.1.8
description: Five-minute average occupancy of the VoIP engine
unit: "%"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: avaya.cmgVoipCurrentIpAddress
@@ -150,6 +152,7 @@ metrics:
OID: 1.3.6.1.4.1.6889.2.9.1.4.6.1.3
description: Number of channels currently in use at the DSP core
unit: "{channel}"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.6889.2.9.1.4.6.1.4
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/avaya-nortel-ethernet-routing-switch.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/avaya-nortel-ethernet-routing-switch.yaml
index b426fef2cd99a5..a9a23553e623d0 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/avaya-nortel-ethernet-routing-switch.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/avaya-nortel-ethernet-routing-switch.yaml
@@ -44,6 +44,7 @@ metrics:
symbols:
- name: avaya.s5ChasCom
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: avaya.s5ChasComDescr
@@ -83,7 +84,6 @@ metrics:
12: obsoleted
description: Current operational state of the component or sub-component
unit: "TBD"
-
metric_tags:
- OID: 1.3.6.1.4.1.45.1.6.3.1.5.0
symbol: s5ChasVer
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/avocent-acs.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/avocent-acs.yaml
index 70c490b362fa6f..a58152e52d0828 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/avocent-acs.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/avocent-acs.yaml
@@ -55,6 +55,7 @@ metrics:
constant_value_one: true
description: Serial port constant value one.
unit: "TBD"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- tag: avocent_acs_serial_port_table_device_name
symbol:
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/barracuda-cloudgen.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/barracuda-cloudgen.yaml
index aab71c17b08fe0..82663dc5b99fa4 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/barracuda-cloudgen.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/barracuda-cloudgen.yaml
@@ -20,6 +20,7 @@ metrics:
constant_value_one: true
description: " "
unit: "TBD"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.10704.1.0.1.1
@@ -136,6 +137,7 @@ metrics:
name: phion.hwSensorValue
description: "Sensor value"
unit: "1"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.10704.1.4.1.1
@@ -164,6 +166,7 @@ metrics:
constant_value_one: true
description: " "
unit: "TBD"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.10704.1.6.1.1
@@ -206,6 +209,7 @@ metrics:
constant_value_one: true
description: " "
unit: "TBD"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.10704.1.8.1.1
@@ -240,6 +244,7 @@ metrics:
constant_value_one: true
description: " "
unit: "TBD"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.10704.1.9.1.1
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/brocade-fc-switch.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/brocade-fc-switch.yaml
index b871af5d25c50d..c7674085b587f7 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/brocade-fc-switch.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/brocade-fc-switch.yaml
@@ -44,6 +44,7 @@ metrics:
constant_value_one: true
description: Operational status of the FxPort.
unit: "{status}"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- index: 1 # fcFeModuleIndex is index #1 of fcFxPortEntry
tag: fc_fe_module_index
@@ -77,6 +78,7 @@ metrics:
constant_value_one: true
description: Physical status of the FxPort.
unit: "{status}"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- index: 1 # fcFeModuleIndex is index #1 of fcFxPortEntry
tag: fc_fe_module_index
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/chatsworth_pdu.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/chatsworth_pdu.yaml
index e0e78164a26a91..5c696403670892 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/chatsworth_pdu.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/chatsworth_pdu.yaml
@@ -787,6 +787,7 @@ metrics:
symbols:
- name: cpiEas
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- tag: lock_id
symbol:
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/checkpoint.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/checkpoint.yaml
index 3133afe09bd7a5..a4773d8dfecede 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/checkpoint.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/checkpoint.yaml
@@ -204,6 +204,7 @@ metrics:
symbols:
- name: fanSpeedSensor
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.2620.1.6.7.8.2.1.6
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/cisco-firepower-asa.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/cisco-firepower-asa.yaml
index 3888a157142c0a..24290a75cd62a8 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/cisco-firepower-asa.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/cisco-firepower-asa.yaml
@@ -36,6 +36,7 @@ metrics:
name: cpu.usage
description: The overall CPU busy percentage in the last 1 minute period
unit: "%"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- index: 1 # cpmCPUTotalIndex
tag: cpu
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/cisco-firepower.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/cisco-firepower.yaml
index 214ecd55229285..89a9ec6e6a80f3 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/cisco-firepower.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/cisco-firepower.yaml
@@ -73,6 +73,7 @@ metrics:
constant_value_one: true
description: Fan equipment presence indicator
unit: "{fan}"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- tag: cfpr_equipment_fan_dn
symbol:
@@ -122,6 +123,7 @@ metrics:
constant_value_one: true
description: Power supply unit equipment presence indicator
unit: "{power_supply}"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- tag: cfpr_equipment_psu_dn
symbol:
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/cisco-ironport-email.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/cisco-ironport-email.yaml
index 781d01831be43f..48a611a0c88e43 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/cisco-ironport-email.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/cisco-ironport-email.yaml
@@ -77,6 +77,7 @@ metrics:
constant_value_one: true
description: A table of one or power supply entries.
unit: "{power_supply}"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: ironport.powerSupplyName
@@ -157,6 +158,7 @@ metrics:
unit: "s"
description: A table of Feature Key expiration entries.
unit: "{feature_key_expiration_entry}"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: ironport.keyDescription
@@ -222,6 +224,7 @@ metrics:
constant_value_one: true
description: Unique index for a drive being instrumented in the appliance. This index is for SNMP purposes only; it has no intrinsic value.
unit: "{raid_drive}"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: ironport.raidID
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/cisco-nexus.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/cisco-nexus.yaml
index 872002a09da1a6..e198019bf129ff 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/cisco-nexus.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/cisco-nexus.yaml
@@ -32,6 +32,7 @@ metrics:
name: entSensorValue
description: Most recent measurement seen by the sensor
unit: "TBD"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.9.9.91.1.1.1.1.1
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/cisco-ucs.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/cisco-ucs.yaml
index b7bdefff6e447d..424834a8e9a1f9 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/cisco-ucs.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/cisco-ucs.yaml
@@ -20,6 +20,7 @@ metrics:
symbols:
- name: cucsComputeBoard
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: cucsComputeBoardDn
@@ -283,6 +284,7 @@ metrics:
OID: 1.3.6.1.4.1.9.9.719.1.9.35.1.49
description: "Total memory"
unit: "By"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: cucsComputeRackUnitDn
@@ -555,6 +557,7 @@ metrics:
symbols:
- name: cucsEquipmentFan
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: cucsEquipmentFanDn
@@ -718,6 +721,7 @@ metrics:
symbols:
- name: cucsEquipmentPsu
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: cucsEquipmentPsuDn
@@ -891,6 +895,7 @@ metrics:
OID: 1.3.6.1.4.1.9.9.719.1.30.11.1.6
description: "Capacity of the memory unit"
unit: "By"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: cucsMemoryUnitDn
@@ -1109,6 +1114,7 @@ metrics:
symbols:
- name: cucsProcessorUnit
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: cucsProcessorUnitDn
@@ -1317,6 +1323,7 @@ metrics:
OID: 1.3.6.1.4.1.9.9.719.1.45.34.1.26
description: "Write io error count"
unit: "{error}"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: cucsStorageFlexFlashCardDn
@@ -1481,6 +1488,7 @@ metrics:
OID: 1.3.6.1.4.1.9.9.719.1.45.36.1.18
description: "Size of the flex flash drive"
unit: "By"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: cucsStorageFlexFlashDriveDn
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/citrix-netscaler-sdx.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/citrix-netscaler-sdx.yaml
index b2c8fc4f16c5bd..f6eb2382f812aa 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/citrix-netscaler-sdx.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/citrix-netscaler-sdx.yaml
@@ -157,6 +157,7 @@ metrics:
OID: 1.3.6.1.4.1.5951.6.3.1.1.11
description: "Memory usage percentage of host"
unit: "%"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.5951.6.3.1.1.1
@@ -204,6 +205,7 @@ metrics:
OID: 1.3.6.1.4.1.5951.6.3.2.1.38
description: "Http requests per second"
unit: "1.s"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: netscaler.sdx.nsIpAddressType
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/citrix-netscaler.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/citrix-netscaler.yaml
index b267f99ca36b4f..8ee74e2ca44abf 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/citrix-netscaler.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/citrix-netscaler.yaml
@@ -347,6 +347,7 @@ metrics:
OID: 1.3.6.1.4.1.5951.4.1.3.1.1.7
description: Number of current client connections
unit: "{connection}"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: netscaler.vsvrName
@@ -585,6 +586,7 @@ metrics:
OID: 1.3.6.1.4.1.5951.4.1.3.6.1.5
description: Average time to first byte between NetScaler and server
unit: "ms"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.5951.4.1.3.6.1.1
@@ -647,6 +649,7 @@ metrics:
symbols:
- name: netscaler.serviceGroup
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: netscaler.svcgrpSvcGroupName
@@ -809,6 +812,7 @@ metrics:
OID: 1.3.6.1.4.1.5951.4.1.3.1.1.70
description: Spillover threshold for the vserver
unit: "{spillover}"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: netscaler.vsvrFullName
@@ -897,6 +901,7 @@ metrics:
OID: 1.3.6.1.4.1.5951.4.1.2.1.1.53
description: Number of active transactions handled by this service including surge queue
unit: "{transaction}"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: netscaler.svcServiceName
@@ -984,6 +989,7 @@ metrics:
symbols:
- name: netscaler.server
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: netscaler.serverName
@@ -1006,6 +1012,6 @@ metrics:
7: up
8: transition_to_out_of_service_down
# metric_tags:
- # - OID: 1.3.6.1.4.1.5951.4.1.1.11.0
- # symbol: sysHardwareVersionDesc
- # tag: netscaler_sys_hardware_version_desc
+# - OID: 1.3.6.1.4.1.5951.4.1.1.11.0
+# symbol: sysHardwareVersionDesc
+# tag: netscaler_sys_hardware_version_desc
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/cradlepoint.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/cradlepoint.yaml
index ec62534fd5ced6..bade20a8606bdc 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/cradlepoint.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/cradlepoint.yaml
@@ -84,6 +84,7 @@ metrics:
OID: 1.3.6.1.4.1.20992.1.2.2.1.15
description: The cellular modems RSRQ given in dBm's.
unit: "dBm"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: cradlepoint.mdmDescr
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/cyberpower-pdu.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/cyberpower-pdu.yaml
index 91a9cb50f7d7bf..91466b8b1a79de 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/cyberpower-pdu.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/cyberpower-pdu.yaml
@@ -37,6 +37,7 @@ metrics:
name: cyberpower.ePDULoadStatusPowerFactor
description: "Power factor of the output measured in hundredths"
unit: "%"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.3808.1.1.3.2.3.1.1.1
@@ -58,6 +59,7 @@ metrics:
symbols:
- name: cyberpower.ePDULoadBankConfig
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.3808.1.1.3.2.4.1.1.1
@@ -85,6 +87,7 @@ metrics:
name: cyberpower.ePDUOutletStatusActivePower
description: "Measured Outlet load for an Outlet Monitored Rack PDU in watts"
unit: "W"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.3808.1.1.3.3.5.1.1.1
@@ -152,6 +155,7 @@ metrics:
symbols:
- name: cyberpower.ePDUStatusBank
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.3808.1.1.3.5.2.1.1
@@ -177,6 +181,7 @@ metrics:
symbols:
- name: cyberpower.ePDUStatusPhase
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.3808.1.1.3.5.4.1.1
@@ -202,6 +207,7 @@ metrics:
symbols:
- name: cyberpower.ePDUStatusOutlet
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.3808.1.1.3.5.6.1.1
@@ -253,6 +259,7 @@ metrics:
name: cyberpower.ePDU2DeviceStatusPowerFactor
description: "Power factor of the Rack PDU load in hundredths"
unit: "%"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.3808.1.1.6.3.4.1.1
@@ -329,6 +336,7 @@ metrics:
name: cyberpower.ePDU2PhaseStatusPeakLoad
description: "Peak current of the Rack PDU phase load in tenths of Amps"
unit: "A"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.3808.1.1.6.4.4.1.1
@@ -356,6 +364,7 @@ metrics:
name: cyberpower.ePDU2BankStatusPeakLoad
description: "Peak current of the Rack PDU bank load in tenths of Amps"
unit: "A"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.3808.1.1.6.5.4.1.1
@@ -377,6 +386,7 @@ metrics:
symbols:
- name: cyberpower.ePDU2OutletSwitchedStatus
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.3808.1.1.6.6.1.4.1.1
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/dell-emc-data-domain.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/dell-emc-data-domain.yaml
index f5a6bf502cff88..c3ef4dc7c736b5 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/dell-emc-data-domain.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/dell-emc-data-domain.yaml
@@ -105,6 +105,7 @@ metrics:
constant_value_one: true
description: A table containing entries of PowerModuleEntry
unit: "TBD"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: powerModuleDescription
@@ -130,6 +131,7 @@ metrics:
OID: 1.3.6.1.4.1.19746.1.1.2.1.1.1.5
description: Current temperature value of the sensor
unit: "Cel"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: tempSensorDescription
@@ -154,6 +156,7 @@ metrics:
constant_value_one: true
description: A table containing entries of FanPropertiesEntry
unit: "TBD"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: fanDescription
@@ -317,6 +320,7 @@ metrics:
OID: 1.3.6.1.4.1.19746.1.5.1.1.1.16
description: Number of kilobytes per second sent for replication
unit: "kBy/s"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.19746.1.5.1.1.1.1
@@ -331,6 +335,7 @@ metrics:
constant_value_one: true
description: A table containing entries of DiskPropertiesEntry
unit: "TBD"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: diskPropEnclosureID
@@ -398,6 +403,7 @@ metrics:
OID: 1.3.6.1.4.1.19746.1.6.2.1.1.6
description: Percentage of time disk is busy
unit: "%"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.19746.1.6.2.1.1.7
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/dell-os10.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/dell-os10.yaml
index 86ea04bd53fd54..3b6016c7f3e6a5 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/dell-os10.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/dell-os10.yaml
@@ -59,6 +59,7 @@ metrics:
name: dell.os10CardTemp
description: "Temperature of the card"
unit: "{temperature}"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.674.11000.5000.100.4.1.1.4.1.3
@@ -86,6 +87,7 @@ metrics:
symbols:
- name: dell.os10PowerSupply
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.674.11000.5000.100.4.1.2.1.1.4
@@ -111,6 +113,7 @@ metrics:
symbols:
- name: dell.os10FanTray
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.674.11000.5000.100.4.1.2.2.1.4
@@ -136,6 +139,7 @@ metrics:
symbols:
- name: dell.os10Fan
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.674.11000.5000.100.4.1.2.3.1.7
@@ -161,6 +165,7 @@ metrics:
symbols:
- name: dell.os10bgp4V2Peer
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.674.11000.5000.200.1.1.2.1.2
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/dell-powerconnect.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/dell-powerconnect.yaml
index ef47e1744aedf0..19ddf2114060f9 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/dell-powerconnect.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/dell-powerconnect.yaml
@@ -131,6 +131,7 @@ metrics:
OID: 1.3.6.1.4.1.674.10895.3000.1.2.110.7.1.1.4
description: "Fan speed in revolutions per minute"
unit: "1.min"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: dell.envMonFanStatusDescr
@@ -160,6 +161,7 @@ metrics:
OID: 1.3.6.1.4.1.674.10895.3000.1.2.110.7.2.1.6
description: "Average power consumption"
unit: "W"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/ statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: dell.envMonSupplyStatusDescr
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/dell-poweredge.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/dell-poweredge.yaml
index 1781cbcec6dbbf..644512e6194ba9 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/dell-poweredge.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/dell-poweredge.yaml
@@ -94,6 +94,7 @@ metrics:
symbols:
- name: systemState
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.674.10892.1.200.10.1.9
@@ -153,6 +154,7 @@ metrics:
name: batteryReading
description: Reading value of the battery
unit: "TBD"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.674.10892.1.600.50.1.1
@@ -215,6 +217,7 @@ metrics:
name: intrusionReading
description: Reading value of the intrusion
unit: "TBD"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.674.10892.1.300.70.1.1
@@ -241,6 +244,7 @@ metrics:
name: voltageProbeReading
description: Reading value of the voltage probe
unit: "V"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.674.10892.1.600.20.1.1
@@ -287,6 +291,7 @@ metrics:
name: amperageProbeReading
description: Reading value of the amperage probe
unit: "A"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.674.10892.1.600.30.1.7
@@ -343,6 +348,7 @@ metrics:
name: powerSupplyCurrentInputVoltage
description: Current input voltage of the power supply
unit: "V"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.674.10892.1.600.12.1.1
@@ -361,6 +367,7 @@ metrics:
name: powerUsageStatus
description: Status of the power usage
unit: "{power_usage}"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.674.10892.1.600.60.1.1
@@ -387,6 +394,7 @@ metrics:
name: coolingUnitStatus
description: Status of the cooling unit
unit: "{cooling_unit}"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.674.10892.1.700.10.1.1
@@ -417,6 +425,7 @@ metrics:
name: coolingDeviceDiscreteReading
description: Discrete reading of the cooling device
unit: "TBD"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.674.10892.1.700.12.1.1
@@ -460,6 +469,7 @@ metrics:
name: temperatureProbeReading
description: Reading value of the temperature probe
unit: "Cel"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.674.10892.1.700.20.1.1
@@ -496,6 +506,7 @@ metrics:
name: processorDeviceThreadCount
description: Number of threads in the processor device
unit: "{thread}"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.674.10892.1.1100.30.1.1
@@ -518,6 +529,7 @@ metrics:
name: processorDeviceStatusReading
description: Reading value of the processor device status
unit: "TBD"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.674.10892.1.1100.32.1.1
@@ -549,6 +561,7 @@ metrics:
name: cacheDeviceCurrentSize
description: Current size of the cache device
unit: "By"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.674.10892.1.1100.40.1.1
@@ -579,6 +592,7 @@ metrics:
# (4),-- ECC multibit fault encountered
# (8) -- ECC single bit correction logging disabled
# (16) -- device disabled because of spare activation
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.674.10892.1.1100.50.1.1
@@ -616,6 +630,7 @@ metrics:
name: systemSlotStatus
description: Status of the system slot
unit: "{system_slot}"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.674.10892.1.1200.10.1.8
@@ -634,6 +649,7 @@ metrics:
name: fruInformationStatus
description: Status of the field replaceable unit information
unit: "{fru_information}"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.674.10892.1.2000.10.1.1
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/dlink-dgs-switch.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/dlink-dgs-switch.yaml
index 21658ad1fb18d1..4cc273b98e0501 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/dlink-dgs-switch.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/dlink-dgs-switch.yaml
@@ -47,6 +47,7 @@ metrics:
OID: 1.3.6.1.4.1.171.14.5.1.4.1.4
description: Used memory size of the entry
unit: "By"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: dEntityExtMemUtilUnitId
@@ -69,6 +70,7 @@ metrics:
OID: 1.3.6.1.4.1.171.14.5.1.1.1.1.4
description: Current measurement of the testpoint
unit: "Cel"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: dEntityExtEnvTempUnitId
@@ -96,6 +98,7 @@ metrics:
symbols:
- name: dlink.dEntityExtEnvFan
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: dEntityExtEnvFanUnitId
@@ -129,6 +132,7 @@ metrics:
OID: 1.3.6.1.4.1.171.14.5.1.1.3.1.5
description: Maximum power which the power module can supply
unit: "W"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: dEntityExtEnvPowerUnitId
@@ -157,6 +161,7 @@ metrics:
symbols:
- name: dlink.dEntityExtEnvAirFlow
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: dEntityExtEnvAirFlowUnitId
@@ -176,6 +181,7 @@ metrics:
symbols:
- name: dlink.dEntityExtUnit
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: dEntityExtUnitIndex
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/eaton-epdu.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/eaton-epdu.yaml
index 5ff9f3cab7844f..ae61a1b5f0868b 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/eaton-epdu.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/eaton-epdu.yaml
@@ -48,6 +48,7 @@ metrics:
OID: 1.3.6.1.4.1.534.6.6.7.3.1.1.3
description: Units are 0.1 Hz; divide by ten to get Hz
unit: "Hz"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.534.6.6.7.3.1.1.10
@@ -69,6 +70,7 @@ metrics:
OID: 1.3.6.1.4.1.534.6.6.7.3.2.1.3
description: An input voltage measurement value
unit: "mV"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.534.6.6.7.3.3.1.12
@@ -97,6 +99,7 @@ metrics:
OID: 1.3.6.1.4.1.534.6.6.7.3.3.1.11
description: Current percent load based on the rated current capacity
unit: "%"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.534.6.6.7.3.3.1.12
@@ -125,6 +128,7 @@ metrics:
OID: 1.3.6.1.4.1.534.6.6.7.3.4.1.4
description: An input Watts value
unit: "W"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.534.6.6.7.3.3.1.12
@@ -137,6 +141,7 @@ metrics:
symbols:
- name: eaton.epdu.group
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.534.6.6.7.5.1.1.3
@@ -170,6 +175,7 @@ metrics:
OID: 1.3.6.1.4.1.534.6.6.7.5.3.1.3
description: Units are millivolts
unit: "mV"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.534.6.6.7.5.1.1.3
@@ -209,6 +215,7 @@ metrics:
OID: 1.3.6.1.4.1.534.6.6.7.5.4.1.10
description: Current percent load based on the rated current capacity
unit: "%"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.534.6.6.7.5.1.1.3
@@ -248,6 +255,7 @@ metrics:
OID: 1.3.6.1.4.1.534.6.6.7.5.5.1.3
description: A group Watts value
unit: "W"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.534.6.6.7.5.1.1.3
@@ -271,6 +279,7 @@ metrics:
symbols:
- name: eaton.epdu.groupControl
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.534.6.6.7.5.1.1.3
@@ -305,6 +314,7 @@ metrics:
OID: 1.3.6.1.4.1.534.6.6.7.6.3.1.2
description: Units are millivolts
unit: "mV"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.534.6.6.7.6.1.1.3
@@ -333,6 +343,7 @@ metrics:
OID: 1.3.6.1.4.1.534.6.6.7.6.4.1.10
description: Current percent load based on the rated current capacity
unit: "%"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.534.6.6.7.6.1.1.3
@@ -361,6 +372,7 @@ metrics:
OID: 1.3.6.1.4.1.534.6.6.7.6.5.1.3
description: An outlet Watts value
unit: "W"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.534.6.6.7.6.1.1.3
@@ -375,6 +387,7 @@ metrics:
OID: 1.3.6.1.4.1.534.6.6.7.7.1.1.4
description: Units are in tenths of a degree according to the scale specified by temperatureScale either Fahrenheit or Celsius Divide by ten to get degrees
unit: "Cel"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.534.6.6.7.7.1.1.2
@@ -407,6 +420,7 @@ metrics:
OID: 1.3.6.1.4.1.534.6.6.7.7.2.1.4
description: Units are tenths of a percent relative humidity Divide the value by 10 to get %RH
unit: "%"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.534.6.6.7.7.2.1.2
@@ -437,6 +451,7 @@ metrics:
symbols:
- name: eaton.epdu.contact
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.534.6.6.7.7.3.1.2
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/f5-big-ip.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/f5-big-ip.yaml
index 6fddb6df15392d..5316d5c065c965 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/f5-big-ip.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/f5-big-ip.yaml
@@ -162,6 +162,7 @@ metrics:
name: sysMultiHostCpuIowait
description: The average time spent by the specified processor waiting for external I/O to complete for the associated host
unit: "%"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.3375.2.1.7.5.2.1.3
@@ -328,6 +329,7 @@ metrics:
name: ltmVirtualServConnLimit
description: The maximum number of connections the specified virtual server is allowed to have open at one time
unit: "{connection}"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- tag: server
symbol:
@@ -342,6 +344,7 @@ metrics:
# Gauges
- name: ltmVsStatus
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- tag: ltm_vs_status_avail_state
symbol:
@@ -498,6 +501,7 @@ metrics:
# Gauges
- name: ltmNodeAddr
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- tag: node
symbol:
@@ -750,6 +754,7 @@ metrics:
name: ltmPoolMemberSessionStatus
description: The hierarchical status of the session including parent status for the specified pool member
unit: "TBD"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- tag: pool
symbol:
@@ -768,6 +773,7 @@ metrics:
# Gauges
- name: ltmPoolMember
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- tag: pool
symbol:
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/fireeye.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/fireeye.yaml
index 2bf338c68b104d..3ac253aac080cc 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/fireeye.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/fireeye.yaml
@@ -85,6 +85,7 @@ metrics:
constant_value_one: true
description: Physical disk entity
unit: "{physical_disk}"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.25597.11.2.1.3.1.2
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/fortinet-appliance.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/fortinet-appliance.yaml
index fadb8ad800c591..5a213f9638a2f1 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/fortinet-appliance.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/fortinet-appliance.yaml
@@ -188,6 +188,7 @@ metrics:
symbols:
- name: fmDevice
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: fmDeviceEntName
@@ -294,6 +295,7 @@ metrics:
OID: 1.3.6.1.4.1.12356.103.7.2.1.3
description: Raid disk size in GB
unit: "GBy"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.12356.103.7.2.1.2
@@ -347,6 +349,7 @@ metrics:
symbols:
- name: fmHaPeer
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: fmHaPeerEntIp
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/fortinet-fortigate.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/fortinet-fortigate.yaml
index 6d90725440ed7e..d0e58bbf52cc1d 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/fortinet-fortigate.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/fortinet-fortigate.yaml
@@ -228,6 +228,7 @@ metrics:
symbols:
- name: fgVirtualDomain
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- tag: virtualdomain_index
symbol:
@@ -270,6 +271,7 @@ metrics:
symbols:
- OID: 1.3.6.1.4.1.12356.101.7.2.1.1.1
name: fgIntfEntVdom
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
# The virtual domain the interface belongs to. This index corresponds to the index used by fgVdTable.
- tag: virtualdomain_index
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/hpe-bladesystem-enclosure.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/hpe-bladesystem-enclosure.yaml
index 2ae5869454b399..544fdd5448933c 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/hpe-bladesystem-enclosure.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/hpe-bladesystem-enclosure.yaml
@@ -15,6 +15,7 @@ metrics:
OID: 1.3.6.1.4.1.232.22.2.3.1.2.1.6
description: This is the current temperature sensor reading in degrees celsius
unit: "Cel"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.232.22.2.3.1.2.1.4
@@ -51,6 +52,7 @@ metrics:
constant_value_one: true
description: Fan presence indicator
unit: "{fan}"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.232.22.2.3.1.3.1.5
@@ -106,6 +108,7 @@ metrics:
constant_value_one: true
description: Fuse presence indicator
unit: "{fuse}"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.232.22.2.3.1.4.1.4
@@ -140,6 +143,7 @@ metrics:
constant_value_one: true
description: Manager presence indicator
unit: "{manager}"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.232.22.2.3.1.6.1.4
@@ -202,6 +206,7 @@ metrics:
constant_value_one: true
description: Power enclosure presence indicator
unit: "{power_enclosure}"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.232.22.2.3.3.1.1.3
@@ -268,6 +273,7 @@ metrics:
# - name: cpqRackServerBladeFaultDiagnosticString
# OID: 1.3.6.1.4.1.232.22.2.4.1.1.1.24
# string metric is not supported yet (keep this metric and this comment in profile until it's fixed)
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.232.22.2.4.1.1.1.4
@@ -358,6 +364,7 @@ metrics:
OID: 1.3.6.1.4.1.232.22.2.4.3.1.1.8
description: This is the current temperature sensor reading in degrees celsius
unit: "Cel"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.232.22.2.4.3.1.1.4
@@ -402,6 +409,7 @@ metrics:
OID: 1.3.6.1.4.1.232.22.2.5.1.1.1.13
description: The current air temperature at the exhaust of the power supply in degrees celsius
unit: "Cel"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.232.22.2.5.1.1.1.4
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/hpe-msa.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/hpe-msa.yaml
index e9ca994570be1a..a3c6bd46d073b7 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/hpe-msa.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/hpe-msa.yaml
@@ -20,6 +20,7 @@ metrics:
# - OID: 1.3.6.1.3.94.1.8.1.6
# name: hpe.fibrechannel.connUnitSensorMessage
# string metric is not supported yet (keep this metric and this comment in profile until it's fixed)
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.3.94.1.8.1.3
@@ -63,6 +64,7 @@ metrics:
symbols:
- name: hpe.fibrechannel.connUnitPort
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.3.94.1.10.1.3
@@ -142,6 +144,7 @@ metrics:
tag: hpe_fibrechannel_conn_unit_port_name
description: String describing the addressed port
unit: "{port}"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
- symbol:
OID: 1.3.6.1.3.94.1.10.1.6
name: connUnitPortState
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/hpe-nimble.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/hpe-nimble.yaml
index 5bef2769611a4c..59fca3155a4838 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/hpe-nimble.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/hpe-nimble.yaml
@@ -48,6 +48,7 @@ metrics:
metric_type: monotonic_count
description: Total cumulative number of Write I/Os.
unit: "{write_io}"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.37447.1.2.1.3
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/huawei-routers.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/huawei-routers.yaml
index 1b5b3cbf77482e..fcad7ee8fba8d0 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/huawei-routers.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/huawei-routers.yaml
@@ -29,6 +29,7 @@ metrics:
OID: 1.3.6.1.4.1.2011.5.25.177.1.1.2.1.7
description: The counter that records the times the remote BGP peer is correctly connected.
unit: "s"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: huawei.hwBgpPeerVrfName
@@ -295,6 +296,7 @@ metrics:
symbols:
- name: huawei.hwNatSession
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.2011.5.25.226.2.14.2.1.1
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/huawei-switches.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/huawei-switches.yaml
index 800458849732fd..1a2f0729783aa2 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/huawei-switches.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/huawei-switches.yaml
@@ -47,6 +47,7 @@ metrics:
OID: 1.3.6.1.4.1.2011.5.25.183.1.20.1.2
description: Stack member's priority
unit: "{priority}"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: huawei.hwMemberStackMacAddress
@@ -81,6 +82,7 @@ metrics:
name: huawei.hwStackPortName
OID: 1.3.6.1.4.1.2011.5.25.183.1.21.1.3
tag: huawei_hw_stack_port_name
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
- symbol:
OID: 1.3.6.1.4.1.2011.5.25.183.1.21.1.5
name: huawei.hwStackPortStatus
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/huawei.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/huawei.yaml
index d5d10fb5ec462f..a5af6651a764f9 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/huawei.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/huawei.yaml
@@ -25,6 +25,7 @@ metrics:
OID: 1.3.6.1.4.1.2011.5.25.31.1.1.1.1.13
description: The voltage for the entity
unit: "V"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: huawei.hwEntityBoardName
@@ -85,6 +86,7 @@ metrics:
OID: 1.3.6.1.4.1.2011.5.25.31.1.1.10.1.5
description: This object indicates the rotation speed in percentage of the full speed of the fan
unit: "%"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: huawei.hwEntityFanSlot
@@ -163,6 +165,7 @@ metrics:
OID: 1.3.6.1.4.1.2011.5.25.155.6.1.15
description: Period in seconds after which this neighbor is declared dead
unit: "s"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: huawei.hwOspfv2SelfRouterId
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/ibm-datapower-gateway.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/ibm-datapower-gateway.yaml
index fde1d391c747ee..b9a507aa954bef 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/ibm-datapower-gateway.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/ibm-datapower-gateway.yaml
@@ -242,6 +242,7 @@ metrics:
metric_type: monotonic_count
description: Number of log target events pending
unit: "{event}"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: ibm.dpStatusLogTargetStatusLogTarget
@@ -306,6 +307,7 @@ metrics:
metric_type: monotonic_count
description: Number of transmitted packets dropped
unit: "{packet}"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: ibm.dpStatusNetworkInterfaceStatusInterfaceType
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/ibm-lenovo-server.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/ibm-lenovo-server.yaml
index f066ade8aecf56..60c741176e8c41 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/ibm-lenovo-server.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/ibm-lenovo-server.yaml
@@ -31,6 +31,7 @@ metrics:
OID: 1.3.6.1.4.1.2.3.51.3.1.1.2.1.3
description: The measured temperature.
unit: "Cel"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: ibm.imm.tempDescr
@@ -49,6 +50,7 @@ metrics:
OID: 1.3.6.1.4.1.2.3.51.3.1.2.2.1.3
description: The measured voltage.
unit: "V"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: ibm.imm.voltDescr
@@ -68,6 +70,7 @@ metrics:
extract_value: '(\d+)%' # Example value : '74% of maximum'
description: Fan speed expressed in percent of maximum RPM.
unit: "%"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: ibm.imm.fanDescr
@@ -84,6 +87,7 @@ metrics:
symbols:
- name: ibm.imm.systemHealthSummary
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: ibm.imm.systemHealthSummaryDescription
@@ -105,6 +109,7 @@ metrics:
name: ibm.imm.cpuVpdDescription
OID: 1.3.6.1.4.1.2.3.51.3.1.5.20.1.2
tag: ibm_imm_cpu_vpd_description
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
- symbol:
OID: 1.3.6.1.4.1.2.3.51.3.1.5.20.1.11
name: ibm.imm.cpuVpdHealthStatus
@@ -121,6 +126,7 @@ metrics:
name: ibm.imm.memoryVpdDescription
OID: 1.3.6.1.4.1.2.3.51.3.1.5.21.1.2
tag: ibm_imm_memory_vpd_description
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
- symbol:
OID: 1.3.6.1.4.1.2.3.51.3.1.5.21.1.8
name: ibm.imm.memoryHealthStatus
@@ -137,6 +143,7 @@ metrics:
name: ibm.imm.powerFruName
OID: 1.3.6.1.4.1.2.3.51.3.1.11.2.1.2
tag: ibm_imm_power_fru_name
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
- symbol:
OID: 1.3.6.1.4.1.2.3.51.3.1.11.2.1.6
name: ibm.imm.powerHealthStatus
@@ -153,6 +160,7 @@ metrics:
name: ibm.imm.diskFruName
OID: 1.3.6.1.4.1.2.3.51.3.1.12.2.1.2
tag: ibm_imm_disk_fru_name
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
- symbol:
OID: 1.3.6.1.4.1.2.3.51.3.1.12.2.1.3
name: ibm.imm.diskHealthStatus
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/infinera-coriant-groove.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/infinera-coriant-groove.yaml
index 3c5e2f5749d930..5f9cecc5a28854 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/infinera-coriant-groove.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/infinera-coriant-groove.yaml
@@ -26,6 +26,7 @@ metrics:
name: coriant.groove.shelfOutletTemperature # Type CoriantTypesTemperature is a string that represents a float in Celcius e.g. 23.8
description: Temperature at the monitoring point
unit: "Cel"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.42229.1.2.3.1.1.1.2
@@ -89,6 +90,7 @@ metrics:
name: coriant.groove.cardTemperature # Type CoriantTypesTemperature is a string that represents a float in Celcius e.g. 23.8
description: Temperature at the monitoring point
unit: "Cel"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.42229.1.2.3.3.1.1.1
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/infoblox-ipam.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/infoblox-ipam.yaml
index 8b6e5943b054d1..51093ec24ae73c 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/infoblox-ipam.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/infoblox-ipam.yaml
@@ -181,6 +181,7 @@ metrics:
symbols:
- name: ibMemberServiceStatus
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: ibServiceName
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/kyocera-printer.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/kyocera-printer.yaml
index 0048bdbec786d8..023b81742226dd 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/kyocera-printer.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/kyocera-printer.yaml
@@ -20,6 +20,7 @@ metrics:
name: kcprtAlertStateCode
description: "Alert state code"
unit: "TBD"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.1347.43.18.2.1.2
@@ -38,6 +39,7 @@ metrics:
name: kcprtMemoryDeviceUsedSize
description: "Used size of the memory device"
unit: "By"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.1347.43.20.1.1.2
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/meraki-cloud-controller.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/meraki-cloud-controller.yaml
index 26c3e948441d8b..f581075e46bc22 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/meraki-cloud-controller.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/meraki-cloud-controller.yaml
@@ -64,6 +64,7 @@ metrics:
constant_value_one: true
description: Meraki dev constant value one
unit: "TBD"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
# devMac is part of the devTable index
- symbol:
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/nasuni-filer.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/nasuni-filer.yaml
index c2bb7ed9785732..37b16909c136d3 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/nasuni-filer.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/nasuni-filer.yaml
@@ -353,6 +353,7 @@ metrics:
name: nasuni.volumeTableNumFtpdirs
description: Number of FTP directories in volume table
unit: "{directory}"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.42040.2.2.1.2
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/netgear-readynas.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/netgear-readynas.yaml
index 0b0e751d3f881f..4b5fa953fc2e94 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/netgear-readynas.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/netgear-readynas.yaml
@@ -32,6 +32,7 @@ metrics:
name: netgear.readynasos.diskTemperature
description: Disk temperature
unit: "Cel"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- tag: netgear_readynasos_disk_id
symbol:
@@ -123,6 +124,7 @@ metrics:
symbol:
OID: 1.3.6.1.4.1.4526.22.7.1.2
name: netgear.readynasos.volumeName
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
- symbol:
OID: 1.3.6.1.4.1.4526.22.7.1.4
name: netgear.readynasos.volumeStatus
@@ -149,6 +151,7 @@ metrics:
symbol:
OID: 1.3.6.1.4.1.4526.22.8.1.2
name: netgear.readynasos.psuDesc
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
- symbol:
OID: 1.3.6.1.4.1.4526.22.8.1.3
name: netgear.readynasos.psuStatus
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/opengear-console-manager.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/opengear-console-manager.yaml
index 1d457ee3dc590e..d8f0595a683e53 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/opengear-console-manager.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/opengear-console-manager.yaml
@@ -24,6 +24,7 @@ metrics:
metric_type: monotonic_count
description: Number of bytes transmitted on the serial port
unit: "By"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- index: 1 # ogSerialPortIndex
tag: og_serial_port_index
@@ -130,6 +131,7 @@ metrics:
metric_type: monotonic_count
description: Number of cell modem events counted
unit: "{event}"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- index: 1 # ogCellModemIndex
tag: og_cell_modem_index
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/opengear-infrastructure-manager.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/opengear-infrastructure-manager.yaml
index 50977948616486..7f44cc0aafc1e4 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/opengear-infrastructure-manager.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/opengear-infrastructure-manager.yaml
@@ -21,6 +21,7 @@ metrics:
metric_type: monotonic_count
description: Serial port bytes transmitted
unit: "By"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: ogSerialPortStatusPort
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/peplink.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/peplink.yaml
index 32f58eed28c59e..665228841b64c1 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/peplink.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/peplink.yaml
@@ -55,6 +55,7 @@ metrics:
OID: 1.3.6.1.4.1.23695.200.1.1.1.4.1.1.5
description: Device psu percentage
unit: "%"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.23695.200.1.1.1.4.1.1.1
@@ -76,6 +77,7 @@ metrics:
OID: 1.3.6.1.4.1.23695.200.1.1.1.4.2.1.3
description: Device fan speed
unit: "1.s"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.23695.200.1.1.1.4.2.1.1
@@ -97,6 +99,7 @@ metrics:
constant_value_one: true
description: Device power source
unit: "{power_source}"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: devicePowerSourceId
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/raritan-dominion.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/raritan-dominion.yaml
index 47a60c97c0f31b..d788cd6de7b6f4 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/raritan-dominion.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/raritan-dominion.yaml
@@ -33,6 +33,7 @@ metrics:
constant_value_one: true
description: System power supply presence
unit: "{system_power_supply}"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.13742.3.1.3.1.2
@@ -59,6 +60,7 @@ metrics:
OID: 1.3.6.1.4.1.13742.3.1.4.1.4
name: raritan.remotekvm.portDataType
tag: raritan_remotekvm_port_data_type
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
- symbol:
OID: 1.3.6.1.4.1.13742.3.1.4.1.5
name: raritan.remotekvm.portDataStatus
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/servertech-pdu3.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/servertech-pdu3.yaml
index 2e03c485c863e4..b9698894eeb19c 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/servertech-pdu3.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/servertech-pdu3.yaml
@@ -58,6 +58,7 @@ metrics:
OID: 1.3.6.1.4.1.1718.3.2.1.1.15
description: Tower line frequency
unit: "Hz"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: servertech.sentry3.towerID
@@ -127,6 +128,7 @@ metrics:
OID: 1.3.6.1.4.1.1718.3.2.2.1.20
description: Used infeed capacity
unit: "VA"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: servertech.sentry3.infeedID
@@ -192,6 +194,7 @@ metrics:
symbols:
- name: servertech.sentry3.outlet
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: servertech.sentry3.outletID
@@ -254,6 +257,7 @@ metrics:
symbols:
- name: servertech.sentry3.envMon
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: servertech.sentry3.envMonID
@@ -283,6 +287,7 @@ metrics:
OID: 1.3.6.1.4.1.1718.3.2.5.1.10
description: Humidity value from sensor
unit: "%"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: servertech.sentry3.tempHumidSensorID
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/servertech-pdu4.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/servertech-pdu4.yaml
index edc0dc7a0d26db..849ec3d3f4ea72 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/servertech-pdu4.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/servertech-pdu4.yaml
@@ -18,6 +18,7 @@ metrics:
constant_value_one: true
description: "Constant value one"
unit: "TBD"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: servertech.sentry4.st4UnitID
@@ -112,6 +113,7 @@ metrics:
OID: 1.3.6.1.4.1.1718.4.1.3.3.1.12
description: "Out of balance measurement of the input cord"
unit: "%"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: servertech.sentry4.st4InputCordID
@@ -325,6 +327,7 @@ metrics:
OID: 1.3.6.1.4.1.1718.4.1.4.3.1.5
description: "Utilized current of the line"
unit: "A"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: servertech.sentry4.st4LineID
@@ -437,6 +440,7 @@ metrics:
OID: 1.3.6.1.4.1.1718.4.1.5.3.1.13
description: "Energy consumed by the phase"
unit: "Wh"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: servertech.sentry4.st4PhaseID
@@ -561,6 +565,7 @@ metrics:
constant_value_one: true
description: "Constant value one"
unit: "TBD"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: servertech.sentry4.st4OcpID
@@ -637,6 +642,7 @@ metrics:
OID: 1.3.6.1.4.1.1718.4.1.7.3.1.5
description: "Utilized current of the branch"
unit: "A"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: servertech.sentry4.st4BranchID
@@ -733,6 +739,7 @@ metrics:
constant_value_one: true
description: "Constant value one"
unit: "TBD"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: servertech.sentry4.st4OutletID
@@ -822,6 +829,7 @@ metrics:
OID: 1.3.6.1.4.1.1718.4.1.9.3.1.1
description: "Temperature sensor value"
unit: "Cel"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: servertech.sentry4.st4TempSensorID
@@ -875,6 +883,7 @@ metrics:
OID: 1.3.6.1.4.1.1718.4.1.10.3.1.1
description: "Humidity sensor value"
unit: "%"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: servertech.sentry4.st4HumidSensorID
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/silverpeak-edgeconnect.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/silverpeak-edgeconnect.yaml
index fc75c6b729cc74..1fb222631ab8ed 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/silverpeak-edgeconnect.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/silverpeak-edgeconnect.yaml
@@ -40,6 +40,7 @@ metrics:
symbols:
- name: silverpeak.mgmt.spsActiveAlarm
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.23867.3.1.1.2.1.1.4
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/sinetica-eagle-i.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/sinetica-eagle-i.yaml
index f8de070afb1026..30ea5c766e7ca3 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/sinetica-eagle-i.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/sinetica-eagle-i.yaml
@@ -35,6 +35,7 @@ metrics:
symbols:
- name: hawk.i2.ipCont
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: hawk.i2.ipContName
@@ -69,6 +70,7 @@ metrics:
constant_value_one: true
description: Output constant value one.
unit: "{output}"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: hawk.i2.opName
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/sophos-xgs-firewall.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/sophos-xgs-firewall.yaml
index 87c1e275a68d98..bf339d5acd8bd8 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/sophos-xgs-firewall.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/sophos-xgs-firewall.yaml
@@ -105,6 +105,7 @@ metrics:
OID: 1.3.6.1.4.1.2604.5.1.6.1.1.1.1.8
description: "Count of active tunnel"
unit: "{tunnel}"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: sfosIPSecVpnConnName
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/synology-disk-station.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/synology-disk-station.yaml
index b451556570e858..ff6748cb459056 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/synology-disk-station.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/synology-disk-station.yaml
@@ -99,6 +99,7 @@ metrics:
OID: 1.3.6.1.4.1.6574.2.1.1.6
description: The temperature of each disk uses Celsius degree
unit: "Cel"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: synology.diskID
@@ -138,6 +139,7 @@ metrics:
metric_type: monotonic_count
description: The total size of raid
unit: "By"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: synology.raid.raidName
@@ -183,6 +185,7 @@ metrics:
OID: 1.3.6.1.4.1.6574.5.1.1.7
description: SMART attribute threshold value
unit: "1"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: synology.diskSMARTInfoDevName
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/tp-link.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/tp-link.yaml
index 20f195eab4dd43..38de407fba18d2 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/tp-link.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/tp-link.yaml
@@ -21,6 +21,7 @@ metrics:
OID: 1.3.6.1.4.1.11863.6.4.1.1.1.1.3
description: CPU utilization in 1 minute
unit: "%"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.11863.6.4.1.1.1.1.4
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/tripplite-pdu.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/tripplite-pdu.yaml
index 0ef1029ce975e3..731cbac054a514 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/tripplite-pdu.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/tripplite-pdu.yaml
@@ -18,6 +18,7 @@ metrics:
symbols:
- name: tlpDevice
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.850.1.1.1.2.1.4
@@ -105,6 +106,7 @@ metrics:
name: tlpPduDeviceOutputPowerTotal
description: The AC output total power for all circuits.
unit: "W"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- index: 1 # tlpDeviceIndex
tag: tlp_device_index
@@ -145,6 +147,7 @@ metrics:
name: tlpPduOutputFrequency
description: The present output frequency.
unit: "Hz"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- index: 1 # tlpDeviceIndex
tag: tlp_device_index
@@ -202,6 +205,7 @@ metrics:
symbols:
- name: tlpPduBreaker
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- index: 1 # tlpDeviceIndex
tag: tlp_device_index
@@ -228,6 +232,7 @@ metrics:
symbols:
- name: tlpAlarm
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- index: 1 # tlpAlarmId
tag: tlp_alarm_id
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/tripplite-ups.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/tripplite-ups.yaml
index aba03fb4749730..2f5102c179c8ac 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/tripplite-ups.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/tripplite-ups.yaml
@@ -118,6 +118,7 @@ metrics:
name: tlUpsInputVoltage
description: The magnitude of the present input voltage
unit: "V"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.850.100.1.3.2.1.1
@@ -147,6 +148,7 @@ metrics:
name: tlUpsOutputCircuitPower
description: The magnitude of the present power in watts
unit: "W"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- index: 1
tag: tl_ups_output_circuit_index
@@ -164,6 +166,7 @@ metrics:
symbols:
- name: tlEnvContact
constant_value_one: true
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.850.101.2.1.1.2
@@ -199,6 +202,7 @@ metrics:
OID: 1.3.6.1.4.1.850.100.1.11.2.1.4
name: tlUpsOutletGroupDesc
tag: tl_ups_outlet_group_desc
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
- symbol:
OID: 1.3.6.1.4.1.850.100.1.11.2.1.5
name: tlUpsOutletGroupState
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/ubiquiti-unifi.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/ubiquiti-unifi.yaml
index a3f665e86528a3..80a4e31be182a4 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/ubiquiti-unifi.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/ubiquiti-unifi.yaml
@@ -3,7 +3,7 @@ extends:
- _ubiquiti.yaml
sysobjectid:
- - 1.3.6.1.4.1.41112 # Ubiquiti Networks, Inc.
+ - 1.3.6.1.4.1.41112 # Ubiquiti Networks, Inc.
metrics:
- MIB: FROGFOOT-RESOURCES-MIB
@@ -18,23 +18,23 @@ metrics:
name: memory.free
description: Available physical memory (in KB)
unit: "kBy"
-# TODO: Update once we support matching on specific tag (`match_attributes`)
-# - MIB: FROGFOOT-RESOURCES-MIB
-# table:
-# OID: 1.3.6.1.4.1.10002.1.1.1.4.2
-# name: loadTable
-# symbols:
-# - OID: 1.3.6.1.4.1.10002.1.1.1.4.2.1.3.1
-# name: cpu.usage
-# metric_tags:
-# - index: 1
-# tag: cpu
-# - symbol:
-# OID: 1.3.6.1.4.1.10002.1.1.1.4.2.1.2.1
-# name: loadDescr
-# match_attributes:
-# - 1 Minute Average
-# tag: load_descr
+ # TODO: Update once we support matching on specific tag (`match_attributes`)
+ # - MIB: FROGFOOT-RESOURCES-MIB
+ # table:
+ # OID: 1.3.6.1.4.1.10002.1.1.1.4.2
+ # name: loadTable
+ # symbols:
+ # - OID: 1.3.6.1.4.1.10002.1.1.1.4.2.1.3.1
+ # name: cpu.usage
+ # metric_tags:
+ # - index: 1
+ # tag: cpu
+ # - symbol:
+ # OID: 1.3.6.1.4.1.10002.1.1.1.4.2.1.2.1
+ # name: loadDescr
+ # match_attributes:
+ # - 1 Minute Average
+ # tag: load_descr
- MIB: UBNT-UniFi-MIB
table:
OID: 1.3.6.1.4.1.41112.1.6.1.1
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/vertiv-watchdog.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/vertiv-watchdog.yaml
index 06b13e18766721..0670bfac17a400 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/vertiv-watchdog.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/vertiv-watchdog.yaml
@@ -25,19 +25,20 @@ metrics:
name: vertiv.internalDewPoint
description: Internal dew point temperature
unit: "Cel"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.21239.5.1.2.1.2
name: vertiv.internalSerial
tag: vertiv_internal_serial
-# - symbol:
-# OID: 1.3.6.1.4.1.21239.5.1.1.7.0
-# name: vertiv.temperatureUnits
-# tag: vertiv_temperature_units
-# mapping:
-# 0: fahrenheit
-# 1: celsius
-# TODO : Add this tag back when tagging by scalar symbol is implemented NDM-2247
+ # - symbol:
+ # OID: 1.3.6.1.4.1.21239.5.1.1.7.0
+ # name: vertiv.temperatureUnits
+ # tag: vertiv_temperature_units
+ # mapping:
+ # 0: fahrenheit
+ # 1: celsius
+ # TODO : Add this tag back when tagging by scalar symbol is implemented NDM-2247
- symbol:
OID: 1.3.6.1.4.1.21239.5.1.2.1.4
name: vertiv.internalAvail
@@ -56,6 +57,7 @@ metrics:
name: vertiv.tempSensorTemp
description: Temperature measured by the temperature sensor
unit: "Cel"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.21239.5.1.4.1.2
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/vmware-esx.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/vmware-esx.yaml
index 4d6e01957afdb8..9449d5c0698479 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/vmware-esx.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/vmware-esx.yaml
@@ -29,6 +29,7 @@ metrics:
name: vmwHbaDeviceName
OID: 1.3.6.1.4.1.6876.3.5.2.1.2
tag: vmw_hba_device_name
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
- symbol:
OID: 1.3.6.1.4.1.6876.3.5.2.1.4
name: vmwHbaStatus
@@ -50,6 +51,7 @@ metrics:
constant_value_one: true
description: "Hardware environment"
unit: "{hardware_environment}"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
name: vmwSubsystemType
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/zebra-printer.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/zebra-printer.yaml
index 9747337f9c2df7..a1dc2b6755b960 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/zebra-printer.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/zebra-printer.yaml
@@ -36,6 +36,7 @@ metrics:
constant_value_one: true
description: "Number of tracked alerts"
unit: "{alert}"
+ # TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- symbol:
OID: 1.3.6.1.4.1.10642.10.31.1.2
diff --git a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/zyxel-switch.yaml b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/zyxel-switch.yaml
index 592747d05e82fb..6db9260452cd4b 100644
--- a/src/go/plugin/go.d/config/go.d/snmp.profiles/default/zyxel-switch.yaml
+++ b/src/go/plugin/go.d/config/go.d/snmp.profiles/default/zyxel-switch.yaml
@@ -26,6 +26,7 @@ metrics:
description: Device memory usage (%)
unit: "%"
+# TODO: Check out metric_tags with symbols having mappings and/or expressing states/statuses. Need to convert to metrics.
metric_tags:
- OID: 1.3.6.1.4.1.890.1.15.3.1.1.0
symbol: sysSwPlatform
From dcbf9aaad6400fa2d8bcc0fe4637e5d1e3053651 Mon Sep 17 00:00:00 2001
From: Ilya Mashchenko
Date: Fri, 16 May 2025 14:35:48 +0300
Subject: [PATCH 46/51] fix(go.d): sanitize vnode labels before creating vnode
(#20293)
---
src/go/plugin/go.d/agent/module/job.go | 3 +++
1 file changed, 3 insertions(+)
diff --git a/src/go/plugin/go.d/agent/module/job.go b/src/go/plugin/go.d/agent/module/job.go
index a63c1700a2d7a6..48c962c9627e18 100644
--- a/src/go/plugin/go.d/agent/module/job.go
+++ b/src/go/plugin/go.d/agent/module/job.go
@@ -497,6 +497,9 @@ func (j *Job) sendVnodeHostInfo() {
if _, ok := j.vnode.Labels["_hostname"]; !ok {
j.vnode.Labels["_hostname"] = j.vnode.Hostname
}
+ for k, v := range j.vnode.Labels {
+ j.vnode.Labels[k] = lblReplacer.Replace(v)
+ }
j.api.HOSTINFO(netdataapi.HostInfo{
GUID: j.vnode.GUID,
From a3233ea864cb7e7b84235dbe0d449f1deb378d50 Mon Sep 17 00:00:00 2001
From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com>
Date: Fri, 16 May 2025 15:06:55 +0300
Subject: [PATCH 47/51] build(deps): bump k8s.io/client-go from 0.33.0 to
0.33.1 in /src/go (#20295)
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
---
src/go/go.mod | 6 +++---
src/go/go.sum | 12 ++++++------
2 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/src/go/go.mod b/src/go/go.mod
index 5443fe924f961d..5c95136ab3f09f 100644
--- a/src/go/go.mod
+++ b/src/go/go.mod
@@ -58,9 +58,9 @@ require (
gopkg.in/ini.v1 v1.67.0
gopkg.in/rethinkdb/rethinkdb-go.v6 v6.2.2
gopkg.in/yaml.v2 v2.4.0
- k8s.io/api v0.33.0
- k8s.io/apimachinery v0.33.0
- k8s.io/client-go v0.33.0
+ k8s.io/api v0.33.1
+ k8s.io/apimachinery v0.33.1
+ k8s.io/client-go v0.33.1
layeh.com/radius v0.0.0-20190322222518-890bc1058917
)
diff --git a/src/go/go.sum b/src/go/go.sum
index ad79aed7fc7ec4..f4b46fabab6857 100644
--- a/src/go/go.sum
+++ b/src/go/go.sum
@@ -649,12 +649,12 @@ gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
gotest.tools/v3 v3.0.3 h1:4AuOwCGf4lLR9u3YOe2awrHygurzhO/HeQ6laiA6Sx0=
gotest.tools/v3 v3.0.3/go.mod h1:Z7Lb0S5l+klDB31fvDQX8ss/FlKDxtlFlw3Oa8Ymbl8=
honnef.co/go/tools v0.0.1-2019.2.3/go.mod h1:a3bituU0lyd329TUQxRnasdCoJDkEUEAqEt0JzvZhAg=
-k8s.io/api v0.33.0 h1:yTgZVn1XEe6opVpP1FylmNrIFWuDqe2H0V8CT5gxfIU=
-k8s.io/api v0.33.0/go.mod h1:CTO61ECK/KU7haa3qq8sarQ0biLq2ju405IZAd9zsiM=
-k8s.io/apimachinery v0.33.0 h1:1a6kHrJxb2hs4t8EE5wuR/WxKDwGN1FKH3JvDtA0CIQ=
-k8s.io/apimachinery v0.33.0/go.mod h1:BHW0YOu7n22fFv/JkYOEfkUYNRN0fj0BlvMFWA7b+SM=
-k8s.io/client-go v0.33.0 h1:UASR0sAYVUzs2kYuKn/ZakZlcs2bEHaizrrHUZg0G98=
-k8s.io/client-go v0.33.0/go.mod h1:kGkd+l/gNGg8GYWAPr0xF1rRKvVWvzh9vmZAMXtaKOg=
+k8s.io/api v0.33.1 h1:tA6Cf3bHnLIrUK4IqEgb2v++/GYUtqiu9sRVk3iBXyw=
+k8s.io/api v0.33.1/go.mod h1:87esjTn9DRSRTD4fWMXamiXxJhpOIREjWOSjsW1kEHw=
+k8s.io/apimachinery v0.33.1 h1:mzqXWV8tW9Rw4VeW9rEkqvnxj59k1ezDUl20tFK/oM4=
+k8s.io/apimachinery v0.33.1/go.mod h1:BHW0YOu7n22fFv/JkYOEfkUYNRN0fj0BlvMFWA7b+SM=
+k8s.io/client-go v0.33.1 h1:ZZV/Ks2g92cyxWkRRnfUDsnhNn28eFpt26aGc8KbXF4=
+k8s.io/client-go v0.33.1/go.mod h1:JAsUrl1ArO7uRVFWfcj6kOomSlCv+JpvIsp6usAGefA=
k8s.io/klog/v2 v2.130.1 h1:n9Xl7H1Xvksem4KFG4PYbdQCQxqc/tTUyrgXaOhHSzk=
k8s.io/klog/v2 v2.130.1/go.mod h1:3Jpz1GvMt720eyJH1ckRHK1EDfpxISzJ7I9OYgaDtPE=
k8s.io/kube-openapi v0.0.0-20250318190949-c8a335a9a2ff h1:/usPimJzUKKu+m+TE36gUyGcf03XZEP0ZIKgKj35LS4=
From 2ac72594f9d6e42f1cfa0d04e0aa980a38c1d683 Mon Sep 17 00:00:00 2001
From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com>
Date: Fri, 16 May 2025 15:14:33 +0300
Subject: [PATCH 48/51] build(deps): bump github.com/prometheus/common from
0.63.0 to 0.64.0 in /src/go (#20296)
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
---
src/go/go.mod | 8 ++++----
src/go/go.sum | 16 ++++++++--------
2 files changed, 12 insertions(+), 12 deletions(-)
diff --git a/src/go/go.mod b/src/go/go.mod
index 5c95136ab3f09f..39d7f9d83cf881 100644
--- a/src/go/go.mod
+++ b/src/go/go.mod
@@ -41,7 +41,7 @@ require (
github.com/miekg/dns v1.1.66
github.com/mitchellh/go-homedir v1.1.0
github.com/prometheus-community/pro-bing v0.7.0
- github.com/prometheus/common v0.63.0
+ github.com/prometheus/common v0.64.0
github.com/prometheus/prometheus v2.55.1+incompatible
github.com/redis/go-redis/v9 v9.8.0
github.com/sijms/go-ora/v2 v2.8.24
@@ -132,7 +132,7 @@ require (
github.com/opentracing/opentracing-go v1.1.0 // indirect
github.com/pkg/errors v0.9.1 // indirect
github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2 // indirect
- github.com/prometheus/client_model v0.6.1 // indirect
+ github.com/prometheus/client_model v0.6.2 // indirect
github.com/shopspring/decimal v1.4.0 // indirect
github.com/sirupsen/logrus v1.9.3 // indirect
github.com/spf13/cast v1.7.0 // indirect
@@ -153,14 +153,14 @@ require (
go.uber.org/multierr v1.11.0 // indirect
golang.org/x/crypto v0.38.0 // indirect
golang.org/x/mod v0.24.0 // indirect
- golang.org/x/oauth2 v0.27.0 // indirect
+ golang.org/x/oauth2 v0.30.0 // indirect
golang.org/x/sync v0.14.0 // indirect
golang.org/x/sys v0.33.0 // indirect
golang.org/x/term v0.32.0 // indirect
golang.org/x/time v0.9.0 // indirect
golang.org/x/tools v0.32.0 // indirect
golang.zx2c4.com/wireguard v0.0.0-20230325221338-052af4a8072b // indirect
- google.golang.org/protobuf v1.36.5 // indirect
+ google.golang.org/protobuf v1.36.6 // indirect
gopkg.in/cenkalti/backoff.v2 v2.2.1 // indirect
gopkg.in/evanphx/json-patch.v4 v4.12.0 // indirect
gopkg.in/inf.v0 v0.9.1 // indirect
diff --git a/src/go/go.sum b/src/go/go.sum
index f4b46fabab6857..d2c90c8299a23d 100644
--- a/src/go/go.sum
+++ b/src/go/go.sum
@@ -381,10 +381,10 @@ github.com/prometheus-community/pro-bing v0.7.0 h1:KFYFbxC2f2Fp6c+TyxbCOEarf7rbn
github.com/prometheus-community/pro-bing v0.7.0/go.mod h1:Moob9dvlY50Bfq6i88xIwfyw7xLFHH69LUgx9n5zqCE=
github.com/prometheus/client_golang v1.21.0-rc.0 h1:bR+RxBlwcr4q8hXkgSOA/J18j6n0/qH0Gb0DH+8c+RY=
github.com/prometheus/client_golang v1.21.0-rc.0/go.mod h1:U9NM32ykUErtVBxdvD3zfi+EuFkkaBvMb09mIfe0Zgg=
-github.com/prometheus/client_model v0.6.1 h1:ZKSh/rekM+n3CeS952MLRAdFwIKqeY8b62p8ais2e9E=
-github.com/prometheus/client_model v0.6.1/go.mod h1:OrxVMOVHjw3lKMa8+x6HeMGkHMQyHDk9E3jmP2AmGiY=
-github.com/prometheus/common v0.63.0 h1:YR/EIY1o3mEFP/kZCD7iDMnLPlGyuU2Gb3HIcXnA98k=
-github.com/prometheus/common v0.63.0/go.mod h1:VVFF/fBIoToEnWRVkYoXEkq3R3paCoxG9PXP74SnV18=
+github.com/prometheus/client_model v0.6.2 h1:oBsgwpGs7iVziMvrGhE53c/GrLUsZdHnqNwqPLxwZyk=
+github.com/prometheus/client_model v0.6.2/go.mod h1:y3m2F6Gdpfy6Ut/GBsUqTWZqCUvMVzSfMLjcu6wAwpE=
+github.com/prometheus/common v0.64.0 h1:pdZeA+g617P7oGv1CzdTzyeShxAGrTBsolKNOLQPGO4=
+github.com/prometheus/common v0.64.0/go.mod h1:0gZns+BLRQ3V6NdaerOhMbwwRbNh9hkGINtQAsP5GS8=
github.com/prometheus/procfs v0.15.1 h1:YagwOFzUgYfKKHX6Dr+sHT7km/hxC76UB0learggepc=
github.com/prometheus/procfs v0.15.1/go.mod h1:fB45yRUv8NstnjriLhBQLuOUt+WW4BsoGhij/e3PBqk=
github.com/prometheus/prometheus v0.302.0 h1:47EsaoBRroS2ekSyMSOPIjXwYnY/mxoFk0xt2dkFvfI=
@@ -531,8 +531,8 @@ golang.org/x/net v0.0.0-20210405180319-a5a99cb37ef4/go.mod h1:p54w0d4576C0XHj96b
golang.org/x/net v0.0.0-20220722155237-a158d28d115b/go.mod h1:XRhObCWvk6IyKnWLug+ECip1KBveYUHfp+8e9klMJ9c=
golang.org/x/net v0.40.0 h1:79Xs7wF06Gbdcg4kdCCIQArK11Z1hr5POQ6+fIYHNuY=
golang.org/x/net v0.40.0/go.mod h1:y0hY0exeL2Pku80/zKK7tpntoX23cqL3Oa6njdgRtds=
-golang.org/x/oauth2 v0.27.0 h1:da9Vo7/tDv5RH/7nZDz1eMGS/q1Vv1N/7FCrBhI9I3M=
-golang.org/x/oauth2 v0.27.0/go.mod h1:onh5ek6nERTohokkhCD/y2cV4Do3fxFHFuAejCkRWT8=
+golang.org/x/oauth2 v0.30.0 h1:dnDm7JmhM45NNpd8FDDeLhK6FwqbOf4MLCM9zb1BOHI=
+golang.org/x/oauth2 v0.30.0/go.mod h1:B++QgG3ZKulg6sRPGD/mqlHQs5rB3Ml9erfeDY7xKlU=
golang.org/x/sync v0.0.0-20180314180146-1d60e4601c6f/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.0.0-20190423024810-112230192c58/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sync v0.0.0-20190911185100-cd5d95a43a6e/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
@@ -615,8 +615,8 @@ google.golang.org/genproto/googleapis/rpc v0.0.0-20250115164207-1a7da9e5054f h1:
google.golang.org/genproto/googleapis/rpc v0.0.0-20250115164207-1a7da9e5054f/go.mod h1:+2Yz8+CLJbIfL9z73EW45avw8Lmge3xVElCP9zEKi50=
google.golang.org/grpc v1.70.0 h1:pWFv03aZoHzlRKHWicjsZytKAiYCtNS0dHbXnIdq7jQ=
google.golang.org/grpc v1.70.0/go.mod h1:ofIJqVKDXx/JiXrwr2IG4/zwdH9txy3IlF40RmcJSQw=
-google.golang.org/protobuf v1.36.5 h1:tPhr+woSbjfYvY6/GPufUoYizxw1cF/yFoxJ2fmpwlM=
-google.golang.org/protobuf v1.36.5/go.mod h1:9fA7Ob0pmnwhb644+1+CVWFRbNajQ6iRojtC/QF5bRE=
+google.golang.org/protobuf v1.36.6 h1:z1NpPI8ku2WgiWnf+t9wTPsn6eP1L7ksHUlkfLvd9xY=
+google.golang.org/protobuf v1.36.6/go.mod h1:jduwjTPXsFjZGTmRluh+L6NjiWu7pchiJ2/5YcXBHnY=
gopkg.in/airbrake/gobrake.v2 v2.0.9/go.mod h1:/h5ZAUhDkGaJfjzjKLSjv6zCL6O0LLBxU4K+aSYdM/U=
gopkg.in/cenkalti/backoff.v2 v2.2.1 h1:eJ9UAg01/HIHG987TwxvnzK2MgxXq97YY6rYDpY9aII=
gopkg.in/cenkalti/backoff.v2 v2.2.1/go.mod h1:S0QdOvT2AlerfSBkp0O+dk+bbIMaNbEmVk876gPCthU=
From 7bd5557e3bdc2657d7f5452256da902bdcc16120 Mon Sep 17 00:00:00 2001
From: Stelios Fragkakis <52996999+stelfrag@users.noreply.github.com>
Date: Fri, 16 May 2025 23:33:41 +0300
Subject: [PATCH 49/51] Minor code adjustments (#20290)
nd_profile contains the configured tiers
---
src/database/rrd-retention.c | 2 +-
src/database/rrd-retention.h | 5 +----
2 files changed, 2 insertions(+), 5 deletions(-)
diff --git a/src/database/rrd-retention.c b/src/database/rrd-retention.c
index b9e183c44df2d1..76cf4714f465a6 100644
--- a/src/database/rrd-retention.c
+++ b/src/database/rrd-retention.c
@@ -37,7 +37,7 @@ RRDSTATS_RETENTION rrdstats_retention_collect(void) {
retention.storage_tiers = nd_profile.storage_tiers;
// Iterate through all available storage tiers
- for(size_t tier = 0; tier < retention.storage_tiers && tier < RRD_MAX_STORAGE_TIERS; tier++) {
+ for(size_t tier = 0; tier < retention.storage_tiers; tier++) {
STORAGE_ENGINE *eng = localhost->db[tier].eng;
if(!eng)
continue;
diff --git a/src/database/rrd-retention.h b/src/database/rrd-retention.h
index 0fc587b6948aa4..87e711cadb27ea 100644
--- a/src/database/rrd-retention.h
+++ b/src/database/rrd-retention.h
@@ -6,9 +6,6 @@
#include "libnetdata/libnetdata.h"
#include "storage-engine.h"
-// Maximum number of storage tiers the system supports
-#define RRD_MAX_STORAGE_TIERS 32
-
// Structure to hold information about each storage tier
typedef struct rrd_storage_tier {
size_t tier; // Tier number
@@ -38,7 +35,7 @@ typedef struct rrd_storage_tier {
// Main structure to hold retention information across all tiers
typedef struct rrdstats_retention {
size_t storage_tiers; // Number of available storage tiers
- RRD_STORAGE_TIER tiers[RRD_MAX_STORAGE_TIERS]; // Array of tier information
+ RRD_STORAGE_TIER tiers[RRD_STORAGE_TIERS]; // Array of tier information
} RRDSTATS_RETENTION;
// Function to collect retention statistics
From c9c131449b9732c00e3fd51f0f35539bebb2ba12 Mon Sep 17 00:00:00 2001
From: netdatabot
Date: Sat, 17 May 2025 00:23:23 +0000
Subject: [PATCH 50/51] [ci skip] Update changelog and version for nightly
build: v2.5.0-50-nightly.
---
CHANGELOG.md | 13 ++++++-------
packaging/version | 2 +-
2 files changed, 7 insertions(+), 8 deletions(-)
diff --git a/CHANGELOG.md b/CHANGELOG.md
index 59b727c79f8d95..5f1287ce88d703 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -6,14 +6,20 @@
**Merged pull requests:**
+- build\(deps\): bump github.com/prometheus/common from 0.63.0 to 0.64.0 in /src/go [\#20296](https://github.com/netdata/netdata/pull/20296) ([dependabot[bot]](https://github.com/apps/dependabot))
+- build\(deps\): bump k8s.io/client-go from 0.33.0 to 0.33.1 in /src/go [\#20295](https://github.com/netdata/netdata/pull/20295) ([dependabot[bot]](https://github.com/apps/dependabot))
+- fix\(go.d\): sanitize vnode labels before creating vnode [\#20293](https://github.com/netdata/netdata/pull/20293) ([ilyam8](https://github.com/ilyam8))
+- Minor code adjustments [\#20290](https://github.com/netdata/netdata/pull/20290) ([stelfrag](https://github.com/stelfrag))
- add "unix://" scheme to DOCKER\_HOST in run.sh [\#20286](https://github.com/netdata/netdata/pull/20286) ([ilyam8](https://github.com/ilyam8))
- Regenerate integrations docs [\#20284](https://github.com/netdata/netdata/pull/20284) ([netdatabot](https://github.com/netdatabot))
- Improved StatsD documentation [\#20282](https://github.com/netdata/netdata/pull/20282) ([kanelatechnical](https://github.com/kanelatechnical))
+- Improve agent shutdown [\#20280](https://github.com/netdata/netdata/pull/20280) ([stelfrag](https://github.com/stelfrag))
- Regenerate integrations docs [\#20279](https://github.com/netdata/netdata/pull/20279) ([netdatabot](https://github.com/netdatabot))
- docs: update mssql meta [\#20278](https://github.com/netdata/netdata/pull/20278) ([ilyam8](https://github.com/ilyam8))
- New Windows Metrics \(CPU and Memory\) [\#20277](https://github.com/netdata/netdata/pull/20277) ([thiagoftsm](https://github.com/thiagoftsm))
- chore\(go.d/snmp\): small cleanup snmp profiles code [\#20274](https://github.com/netdata/netdata/pull/20274) ([ilyam8](https://github.com/ilyam8))
- Switch to poll from epoll [\#20273](https://github.com/netdata/netdata/pull/20273) ([stelfrag](https://github.com/stelfrag))
+- comment metric tags that could be metrics [\#20272](https://github.com/netdata/netdata/pull/20272) ([Ancairon](https://github.com/Ancairon))
- build\(deps\): bump golang.org/x/net from 0.39.0 to 0.40.0 in /src/go [\#20270](https://github.com/netdata/netdata/pull/20270) ([dependabot[bot]](https://github.com/apps/dependabot))
- build\(deps\): bump github.com/miekg/dns from 1.1.65 to 1.1.66 in /src/go [\#20268](https://github.com/netdata/netdata/pull/20268) ([dependabot[bot]](https://github.com/apps/dependabot))
- Update Netdata README with improved structure [\#20265](https://github.com/netdata/netdata/pull/20265) ([kanelatechnical](https://github.com/kanelatechnical))
@@ -459,13 +465,6 @@
- fix\(go.d/nvidia\_smi\): handle xml gpu\_power\_readings change [\#19759](https://github.com/netdata/netdata/pull/19759) ([ilyam8](https://github.com/ilyam8))
- status file timings per step [\#19758](https://github.com/netdata/netdata/pull/19758) ([ktsaou](https://github.com/ktsaou))
- improvement\(go.d/sd/snmp\): support device cache ttl 0 [\#19756](https://github.com/netdata/netdata/pull/19756) ([ilyam8](https://github.com/ilyam8))
-- chore\(go.d/sd/snmp\): comment out defaults in snmp.conf [\#19755](https://github.com/netdata/netdata/pull/19755) ([ilyam8](https://github.com/ilyam8))
-- Add documentation outlining how to use custom CA certificates with Netdata. [\#19754](https://github.com/netdata/netdata/pull/19754) ([Ferroin](https://github.com/Ferroin))
-- status file version 8 [\#19753](https://github.com/netdata/netdata/pull/19753) ([ktsaou](https://github.com/ktsaou))
-- status file improvements \(dedup and signal handler use\) [\#19751](https://github.com/netdata/netdata/pull/19751) ([ktsaou](https://github.com/ktsaou))
-- build\(deps\): bump github.com/axiomhq/hyperloglog from 0.2.3 to 0.2.5 in /src/go [\#19750](https://github.com/netdata/netdata/pull/19750) ([dependabot[bot]](https://github.com/apps/dependabot))
-- build\(deps\): bump github.com/likexian/whois from 1.15.5 to 1.15.6 in /src/go [\#19749](https://github.com/netdata/netdata/pull/19749) ([dependabot[bot]](https://github.com/apps/dependabot))
-- build\(deps\): bump go.mongodb.org/mongo-driver from 1.17.2 to 1.17.3 in /src/go [\#19748](https://github.com/netdata/netdata/pull/19748) ([dependabot[bot]](https://github.com/apps/dependabot))
## [v2.2.6](https://github.com/netdata/netdata/tree/v2.2.6) (2025-02-20)
diff --git a/packaging/version b/packaging/version
index 8ec89c43344785..b8953f58e17713 100644
--- a/packaging/version
+++ b/packaging/version
@@ -1 +1 @@
-v2.5.0-43-nightly
+v2.5.0-50-nightly
From fba56c7ab24671583fce638fb6d40d8466e1ebe7 Mon Sep 17 00:00:00 2001
From: Felipe Santos
Date: Sun, 18 May 2025 04:19:40 -0300
Subject: [PATCH 51/51] Fix when docker socket group id points to an existing
group in container (#20288)
Co-authored-by: Ilya Mashchenko
---
packaging/docker/run.sh | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/packaging/docker/run.sh b/packaging/docker/run.sh
index 1b28c5dc7b0624..fd38710108eda2 100755
--- a/packaging/docker/run.sh
+++ b/packaging/docker/run.sh
@@ -17,16 +17,16 @@ function add_netdata_to_proxmox_conf_files_group() {
if ! getent group "${group_guid}" >/dev/null; then
echo "Creating proxmox-etc-pve group with GID ${group_guid}"
- if ! addgroup -g "${group_guid}" "proxmox-etc-pve"; then
+ if ! addgroup --gid "${group_guid}" "proxmox-etc-pve"; then
echo >&2 "Failed to add group proxmox-etc-pve with GID ${group_guid}."
return
fi
fi
- if ! getent group "${group_guid}" | grep -q netdata; then
- echo "Assign netdata user to group ${group_guid}"
- if ! usermod -a -G "${group_guid}" "${DOCKER_USR}"; then
- echo >&2 "Failed to add netdata user to group with GID ${group_guid}."
+ if ! getent group "${group_guid}" | grep -q "${DOCKER_USR}"; then
+ echo "Assigning ${DOCKER_USR} user to group ${group_guid}"
+ if ! usermod --apend --groups "${group_guid}" "${DOCKER_USR}"; then
+ echo >&2 "Failed to add ${DOCKER_USR} user to group with GID ${group_guid}."
return
fi
fi
@@ -79,10 +79,10 @@ if [ "${EUID}" -eq 0 ]; then
export DOCKER_HOST
if [ -n "${PGID}" ]; then
- echo "Creating docker group ${PGID}"
- addgroup --gid "${PGID}" "docker" || echo >&2 "Could not add group docker with ID ${PGID}, its already there probably"
- echo "Assign netdata user to docker group ${PGID}"
- usermod --append --groups "docker" "${DOCKER_USR}" || echo >&2 "Could not add netdata user to group docker with ID ${PGID}"
+ echo "Creating docker group with GID ${PGID}"
+ addgroup --gid "${PGID}" "docker" || echo >&2 "Failed to add group docker with GID ${PGID}, probably one already exists."
+ echo "Assigning ${DOCKER_USR} user to group with GID ${PGID}"
+ usermod --append --groups "${PGID}" "${DOCKER_USR}" || echo >&2 "Failed to add ${DOCKER_USR} user to group with GID ${PGID}."
fi
if [ -d "/host/etc/pve" ]; then