# AIStore Observability: Metrics Reference

AIStore (AIS) exposes a comprehensive set of metrics that provide insights into system performance, resource utilization, and operational status. This reference catalogs available metrics with descriptions and usage guidance.

## Table of Contents

- [Prometheus: major changes in v3.26](#prometheus-major-changes-in-v326)
- [Variable labels](#variable-labels)
- [Common metrics: AIS targets and gateways](#common-metrics-ais-targets-and-gateways)
- [Target metrics](#target-metrics)
- [Backend metrics](#backend-metrics)
- [Related Documentation](#related-documentation)

## Prometheus: major changes in v3.26

* So-called _default_ `go_*` counters and gauges (`go_gc`. `go_metstats`. etc.) are completely gone
* Metrics are now updated directly in real time
  - Previously: periodically via `prometheus.Collect` interface
  - See related note in stats/prom.go
* AIS is no longer publishing internally computed latencies and throughputs
* Use `*.ns.total` (nanoseconds) and `*.size` (bytes) metrics to compute latency and throughput, respectively
  - Based on user-controlled time intervals - for reference, see CLI [`performance throughput` and `performance latency`](/docs/cli/performance.md)
  - Note: for Prometheus client, internal `.ns.total` suffix becomes `_ns_total`, and `.size`, respectively, `_bytes`
* In addition to total aggregated numbers there are now separately computed per-backend latency and throughput numbers
  - Those with `aws.` prefix, for instance.

## Variable labels

Each AIS metric carries `node_id` - a *static label* in Prometheus terminology.

Starting v3.26, majority of the metrics will also contain _variable labels_:

- **Variable Labels:**
  - `bucket`: Name of the associated bucket.
  - `xkind`: Job kind.
  - `mountpath`: [Mountpath](/docs/overview.md#mountpath).

* All I/O metrics now carry the bucket name (or `Cname`, to be precise) as a Prometheus variable label
* All in-cluster writing generated by xactions (jobs) now also have this xaction label as well: the respective kind
  - One major side-effect of the above is that we will now see more PUT metrics, and not only those that result from user PUT requests
* All GET, PUT, and DELETE errors also have the bucket label
* All FSHC related errors (the so called IO errors) carry mountpath (ie., faulty disk) label.

## Common metrics: AIS targets and gateways

- **Request Metrics:**
  - `GetCount`: Total number of executed GET(object) requests.
    - **Variable Labels:** `bucket`
  - `PutCount`: Total number of executed PUT(object) requests.
    - **Variable Labels:** `bucket`, `xkind`
  - `HeadCount`: Total number of executed HEAD(object) requests (currently only remote HEAD).
    - **Variable Labels:** `bucket`
  - `AppendCount`: Total number of executed APPEND(object) requests.
    - **Variable Labels:** `bucket`
  - `DeleteCount`: Total number of executed DELETE(object) requests.
    - **Variable Labels:** `bucket`
  - `RenameCount`: Total number of executed rename(object) requests.
    - **Variable Labels:** `bucket`
  - `ListCount`: Total number of executed list-objects requests.
    - **Variable Labels:** `bucket`

## Common Error Counters
- **Error Metrics:**
  - `ErrGetCount`: Total number of GET(object) errors.
    - **Variable Labels:** `bucket`
  - `ErrPutCount`: Total number of PUT(object) errors.
    - **Variable Labels:** `bucket`, `xkind`
  - `ErrHeadCount`: Total number of HEAD(object) errors.
    - **Variable Labels:** `bucket`
  - `ErrAppendCount`: Total number of APPEND(object) errors.
    - **Variable Labels:** `bucket`
  - `ErrDeleteCount`: Total number of DELETE(object) errors.
    - **Variable Labels:** `bucket`
  - `ErrRenameCount`: Total number of rename(object) errors.
    - **Variable Labels:** `bucket`
  - `ErrListCount`: Total number of list-objects errors.
    - **Variable Labels:** `bucket`

## Common Latencies
- **Latency Metrics:**
  - `GetLatency`: GET average time (milliseconds) over the last periodic.stats_time interval.
    - **Variable Labels:** `bucket`
  - `GetLatencyTotal`: GET total cumulative time (nanoseconds).
    - **Variable Labels:** `bucket`
  - `ListLatency`: List-objects average time (milliseconds) over the last periodic.stats_time interval.
    - **Variable Labels:** `bucket`

For convenience, we also include here a (somewhat redundant) table that summarizes common metrics.

| Internal name | Public name | Internal Type | Description (Prometheus help) | Prometheus labels |
| --- | --- | --- | --- | --- |
| `get.n` | `get_count` | counter | total number of executed GET(object) requests | default |
| `put.n` | `put_count` | counter | total number of executed PUT(object) requests | default |
| `head.n` | `head_count` | counter | total number of executed HEAD(object) requests | default |
| `append.n` | `append_count` | counter | total number of executed APPEND(object) requests | default |
| `del.n` | `del_count` | counter | total number of executed DELETE(object) requests | default |
| `ren.n` | `ren_count` | counter | total number of executed rename(object) requests | default |
| `lst.n` | `lst_count` | counter | total number of executed list-objects requests | default |
| `err.get.n` | `err_get_count` | counter | total number of GET(object) errors | default |
| `err.put.n` | `err_put_count` | counter | total number of PUT(object) errors | default |
| `err.head.n` | `err_head_count` | counter | total number of HEAD(object) errors | default |
| `err.append.n` | `err_append_count` | counter | total number of APPEND(object) errors | default |
| `err.del.n` | `err_del_count` | counter | total number of DELETE(object) errors | default |
| `err.ren.n` | `err_ren_count` | counter | total number of rename(object) errors | default |
| `err.lst.n` | `err_lst_count` | counter | total number of list-objects errors | default |
| `err.http.write.n` | `err_http_write_count` | counter | total number of HTTP write-response errors | default |
| `err.dl.n` | `err_dl_count` | counter | downloader: number of download errors | default |
| `err.put.mirror.n` | `err_put_mirror_count` | counter | number of n-way mirroring errors | default |
| `get.ns` | `get_ms` | latency | GET: average time (milliseconds) over the last periodic.stats_time interval | default |
| `get.ns.total` | `get_ns_total` | total | GET: total cumulative time (nanoseconds) | default |
| `lst.ns` | `lst_ms` | latency | list-objects: average time (milliseconds) over the last periodic.stats_time interval | default |
| `kalive.ns` | `kalive_ms` | latency | in-cluster keep-alive (heartbeat): average time (milliseconds) over the last periodic.stats_time interval | default |
| `up.ns.time` | `uptime` | special | this node's uptime since its startup (seconds) | default |
| `state.flags` | `state_flags` | gauge | bitwise 64-bit value that carries enumerated node-state flags, including warnings and alerts; see https://github.com/NVIDIA/aistore/blob/main/cmn/cos/node_state.go |

## Target metrics

- **Out-of-Band Metrics:**
  - `VerChangeCount`: Number of out-of-band updates (by a 3rd party performing remote PUTs from outside this cluster).
    - **Variable Labels:** `bucket`
  - `VerChangeSize`: Total cumulative size (bytes) of objects updated out-of-band across all backends combined.
    - **Variable Labels:** `bucket`
  - `RemoteDeletedDelCount`: Number of out-of-band deletes (by a 3rd party remote DELETE(object) from outside this cluster).
    - **Variable Labels:** `bucket`

- **PUT Latency Metrics:**
  - `PutLatency`: PUT average time (milliseconds) over the last periodic.stats_time interval.
    - **Variable Labels:** `bucket`, `xkind`
  - `PutLatencyTotal`: PUT total cumulative time (nanoseconds).
    - **Variable Labels:** `bucket`, `xkind`

- **HEAD Latency Metrics:**
  - `HeadLatency`: HEAD average time (milliseconds) over the last periodic.stats_time interval.
    - **Variable Labels:** `bucket`
  - `HeadLatencyTotal`: HEAD total cumulative time (nanoseconds).
    - **Variable Labels:** `bucket`

- **APPEND Latency Metrics:**
  - `AppendLatency`: APPEND average time (milliseconds) over the last periodic.stats_time interval.
    - **Variable Labels:** `bucket`

- **Throughput Metrics:**
  - `GetThroughput`: GET average throughput (MB/s) over the last periodic.stats_time interval.
    - **Variable Labels:** `bucket`
  - `PutThroughput`: PUT average throughput (MB/s) over the last periodic.stats_time interval.
    - **Variable Labels:** `bucket`, `xkind`

- **Size Metrics:**
  - `GetSize`: GET total cumulative size (bytes).
    - **Variable Labels:** `bucket`
  - `PutSize`: PUT total cumulative size (bytes).
    - **Variable Labels:** `bucket`, `xkind`

- **Error Metrics:**
  - `ErrPutCksumCount`: PUT number of checksum errors.
    - **Variable Labels:** `bucket`, `xkind`
  - `ErrFSHCCount`: Number of times filesystem health checker (FSHC) was triggered by an I/O error or errors.
    - **Variable Labels:** `mountpath`
  - `IOErrGetCount`: GET number of I/O errors (excluding remote backend and network errors).
    - **Variable Labels:** `bucket`
  - `IOErrDeleteCount`: DELETE(object) number of I/O errors (excluding remote backend and network errors).
    - **Variable Labels:** `bucket`

For convenience, a table that summarizes target metrics follows below.

| Internal name | Public name | Internal Type | Description (Prometheus help) | Prometheus labels |
| --- | --- | --- | --- | --- |
| `disk.<DISK-NAME>.read.bps` | `disk_read_mbps` | computed-bandwidth | read bandwidth (MB/s) | map[disk:`<DISK-NAME>` node_id:`<AIS-NODE-ID>`] |
| `disk.<DISK-NAME>.avg.rsize` | `disk_avg_rsize` | gauge | average read size (bytes) | map[disk:`<DISK-NAME>` node_id:`<AIS-NODE-ID>`] |
| `disk.<DISK-NAME>.write.bps` | `disk_write_mbps` | computed-bandwidth | write bandwidth (MB/s) | map[disk:`<DISK-NAME>` node_id:`<AIS-NODE-ID>`] |
| `disk.<DISK-NAME>.avg.wsize` | `disk_avg_wsize` | gauge | average write size (bytes) | map[disk:`<DISK-NAME>` node_id:`<AIS-NODE-ID>`] |
| `disk.<DISK-NAME>.util` | `disk_util` | gauge | disk utilization (%%) | map[disk:`<DISK-NAME>` node_id:`<AIS-NODE-ID>`] |
| `lru.evict.n` | `lru_evict_count` | counter | number of LRU evictions | default |
| `lru.evict.size` | `lru_evict_bytes` | size | total cumulative size (bytes) of LRU evictions | default |
| `cleanup.store.n` | `cleanup_store_count` | counter | space cleanup: number of removed misplaced objects and old work files | default |
| `cleanup.store.size` | `cleanup_store_bytes` | size | space cleanup: total size (bytes) of all removed misplaced objects and old work files (not including removed deleted objects) | default |
| `ver.change.n` | `ver_change_count` | counter | number of out-of-band updates (by a 3rd party performing remote PUTs from outside this cluster) | default |
| `ver.change.size` | `ver_change_bytes` | size | total cumulative size (bytes) of objects that were updated out-of-band across all backends combined | default |
| `remote.deleted.del.n` | `remote_deleted_del_count` | counter | number of out-of-band deletes (by a 3rd party remote DELETE(object) from outside this cluster) | default |
| `put.ns` | `put_ms` | latency | PUT: average time (milliseconds) over the last periodic.stats_time interval | default |
| `put.ns.total` | `put_ns_total` | total | PUT: total cumulative time (nanoseconds) | default |
| `append.ns` | `append_ms` | latency | APPEND(object): average time (milliseconds) over the last periodic.stats_time interval | default |
| `get.redir.ns` | `get_redir_ms` | latency | GET: average gateway-to-target HTTP redirect latency (milliseconds) over the last periodic.stats_time interval | default |
| `put.redir.ns` | `put_redir_ms` | latency | PUT: average gateway-to-target HTTP redirect latency (milliseconds) over the last periodic.stats_time interval | default |
| `ratelim.retry.get.n` | `ratelim_retry_get_n` | counter | GET: number of rate-limited retries triggered by remote backends returning 409 and 503 status codes | default |
| `ratelim.retry.get.ns.total` | `ratelim_retry_get_ns_total` | total | GET: total retrying time (nanoseconds) caused by remote backends returning 409 and 503 status codes | default |
| `ratelim.retry.put.n` | `ratelim_retry_put_n` | counter | PUT: number of rate-limited retries triggered by remote backends returning 409 and 503 status codes | default |
| `ratelim.retry.put.ns.total` | `ratelim_retry_put_ns_total` | total | PUT: total retrying time (nanoseconds) caused by remote backends returning 409 and 503 status codes | default |
| `get.bps` | `get_mbps` | bandwidth | GET: average throughput (MB/s) over the last periodic.stats_time interval | default |
| `put.bps` | `put_mbps` | bandwidth | PUT: average throughput (MB/s) over the last periodic.stats_time interval | default |
| `get.size` | `get_bytes` | size | GET: total cumulative size (bytes) | default |
| `put.size` | `put_bytes` | size | PUT: total cumulative size (bytes) | default |
| `err.cksum.n` | `err_cksum_count` | counter | PUT: number of checksum errors | default |
| `err.fshc.n` | `err_fshc_count` | counter | number of times filesystem health checker (FSHC) was triggered by an I/O error or errors | default |
| `err.io.get.n` | `err_io_get_count` | counter | GET: number of I/O errors _not_ including remote backend and network errors | default |
| `err.io.put.n` | `err_io_put_count` | counter | PUT: number of I/O errors _not_ including remote backend and network errors | default |
| `err.io.del.n` | `err_io_del_count` | counter | DELETE(object): number of I/O errors _not_ including remote backend and network errors | default |
| `stream.out.n` | `stream_out_count` | counter | intra-cluster streaming communications: number of sent objects | default |
| `stream.out.size` | `stream_out_bytes` | size | intra-cluster streaming communications: total cumulative size (bytes) of all transmitted objects | default |
| `stream.in.n` | `stream_in_count` | counter | intra-cluster streaming communications: number of received objects | default |
| `stream.in.size` | `stream_in_bytes` | size | intra-cluster streaming communications: total cumulative size (bytes) of all received objects | default |
| `dl.size` | `dl_bytes` | size | total downloaded size (bytes) | default |
| `dl.ns.total` | `dl_ns_total` | total | total downloading time (nanoseconds) | default |
| `dsort.creation.req.n` | `dsort_creation_req_count` | counter | dsort: see https://github.com/NVIDIA/aistore/blob/main/docs/dsort.md#metrics | default |
| `dsort.creation.resp.n` | `dsort_creation_resp_count` | counter | dsort: see https://github.com/NVIDIA/aistore/blob/main/docs/dsort.md#metrics | default |
| `dsort.creation.resp.ns` | `dsort_creation_resp_ms` | latency | dsort: see https://github.com/NVIDIA/aistore/blob/main/docs/dsort.md#metrics | default |
| `dsort.extract.shard.dsk.n` | `dsort_extract_shard_dsk_count` | counter | dsort: see https://github.com/NVIDIA/aistore/blob/main/docs/dsort.md#metrics | default |
| `dsort.extract.shard.mem.n` | `dsort_extract_shard_mem_count` | counter | dsort: see https://github.com/NVIDIA/aistore/blob/main/docs/dsort.md#metrics | default |
| `dsort.extract.shard.size` | `dsort_extract_shard_bytes` | size | dsort: see https://github.com/NVIDIA/aistore/blob/main/docs/dsort.md#metrics | default |
| `lcache.collision.n` | `lcache_collision_count` | counter | number of LOM cache collisions (core, internal) | default |
| `lcache.evicted.n` | `lcache_evicted_count` | counter | number of LOM cache evictions (core, internal) | default |
| `lcache.flush.cold.n` | `lcache_flush_cold_count` | counter | number of times a LOM from cache was written to stable storage (core, internal) | default |
| `remais.get.n` | `remote_get_count` | counter | GET: total number of executed remote requests | map[backend:remais node_id:`<AIS-NODE-ID>`] |
| `remais.get.ns.total` | `remote_get_ns_total` | total | GET: total cumulative time (nanoseconds) to execute remote requests and store, copy, or transform objects | map[backend:remais node_id:`<AIS-NODE-ID>`] |
| `remais.get.size` | `remote_get_bytes_total` | size | GET: total cumulative size (bytes) of all remote GET transactions | map[backend:remais node_id:`<AIS-NODE-ID>`] |
| `remais.head.n` | `remote_head_count` | counter | HEAD: total number of executed remote requests to a given backend | map[backend:remais node_id:`<AIS-NODE-ID>`] |
| `remais.put.n` | `remote_put_count` | counter | PUT: total number of executed remote requests to a given backend | map[backend:remais node_id:`<AIS-NODE-ID>`] |
| `remais.put.ns.total` | `remote_put_ns_total` | total | PUT: total cumulative time (nanoseconds) to execute remote requests and store new object versions in-cluster | map[backend:remais node_id:`<AIS-NODE-ID>`] |
| `remais.e2e.put.ns.total` | `remote_e2e_put_ns_total` | total | PUT: total end-to-end time (nanoseconds) servicing remote requests; includes: receiving PUT payload, storing it in-cluster, executing remote PUT, finalizing new in-cluster object | map[backend:remais node_id:`<AIS-NODE-ID>`] |
| `remais.put.size` | `remote_e2e_put_bytes_total` | size | PUT: total cumulative size (bytes) of all PUTs to a given remote backend | map[backend:remais node_id:ClCt8081] |
| `remais.ver.change.n` | `remote_ver_change_count` | counter | number of out-of-band updates (by a 3rd party performing remote PUTs outside this cluster) | map[backend:remais node_id:`<AIS-NODE-ID>`] |
| `remais.ver.change.size` | `remote_ver_change_bytes_total` | size | total cumulative size of objects that were updated out-of-band | map[backend:remais node_id:`<AIS-NODE-ID>`] |
| `gcp.get.n` | `remote_get_count` | counter | GET: total number of executed remote requests | map[backend:gcp node_id:`<AIS-NODE-ID>`] |
| `gcp.get.ns.total` | `remote_get_ns_total` | total | GET: total cumulative time (nanoseconds) to execute remote requests and store, copy, or transform objects | map[backend:gcp node_id:`<AIS-NODE-ID>`] |
| `gcp.get.size` | `remote_get_bytes_total` | size | GET: total cumulative size (bytes) of all remote transactions | map[backend:gcp node_id:`<AIS-NODE-ID>`] |
| `gcp.head.n` | `remote_head_count` | counter | HEAD: total number of executed remote requests to a given backend | map[backend:gcp node_id:`<AIS-NODE-ID>`] |
| `gcp.put.n` | `remote_put_count` | counter | PUT: total number of executed remote requests to a given backend | map[backend:gcp node_id:`<AIS-NODE-ID>`] |
| `gcp.put.ns.total` | `remote_put_ns_total` | total | PUT: total cumulative time (nanoseconds) to execute remote requests and store new object versions in-cluster | map[backend:gcp node_id:`<AIS-NODE-ID>`] |
| `gcp.e2e.put.ns.total` | `remote_e2e_put_ns_total` | total | PUT: total end-to-end time (nanoseconds) servicing remote requests; includes: receiving PUT payload, storing it in-cluster, executing remote PUT, finalizing new in-cluster object | map[backend:gcp node_id:`<AIS-NODE-ID>`] |
| `gcp.put.size` | `remote_e2e_put_bytes_total` | size | PUT: total cumulative size (bytes) of all PUTs to a given remote backend | map[backend:gcp node_id:`<AIS-NODE-ID>`] |
| `gcp.ver.change.n` | `remote_ver_change_count` | counter | number of out-of-band updates (by a 3rd party performing remote PUTs outside this cluster) | map[backend:gcp node_id:`<AIS-NODE-ID>`] |
| `gcp.ver.change.size` | `remote_ver_change_bytes_total` | size | total cumulative size of objects that were updated out-of-band | map[backend:gcp node_id:`<AIS-NODE-ID>`] |
| `aws.get.n` | `remote_get_count` | counter | GET: total number of executed remote requests | map[backend:aws node_id:`<AIS-NODE-ID>`] |
| `aws.get.ns.total` | `remote_get_ns_total` | total | GET: total cumulative time (nanoseconds) to execute remote requests and store, copy, or transform objects | map[backend:aws node_id:`<AIS-NODE-ID>`] |
| `aws.get.size` | `remote_get_bytes_total` | size | GET: total cumulative size (bytes) of all remote transactions | map[backend:aws node_id:`<AIS-NODE-ID>`] |
| `aws.head.n` | `remote_head_count` | counter | HEAD: total number of executed remote requests to a given backend | map[backend:aws node_id:`<AIS-NODE-ID>`] |
| `aws.put.n` | `remote_put_count` | counter | PUT: total number of executed remote requests to a given backend | map[backend:aws node_id:`<AIS-NODE-ID>`] |
| `aws.put.ns.total` | `remote_put_ns_total` | total | PUT: total cumulative time (nanoseconds) to execute remote requests and store new object versions in-cluster | map[backend:aws node_id:`<AIS-NODE-ID>`] |
| `aws.e2e.put.ns.total` | `remote_e2e_put_ns_total` | total | PUT: total end-to-end time (nanoseconds) servicing remote requests; includes: receiving PUT payload, storing it in-cluster, executing remote PUT, finalizing new in-cluster object | map[backend:aws node_id:`<AIS-NODE-ID>`] |
| `aws.put.size` | `remote_e2e_put_bytes_total` | size | PUT: total cumulative size (bytes) of all PUTs to a given remote backend | map[backend:aws node_id:`<AIS-NODE-ID>`] |
| `aws.ver.change.n` | `remote_ver_change_count` | counter | number of out-of-band updates (by a 3rd party performing remote PUTs outside this cluster) | map[backend:aws node_id:`<AIS-NODE-ID>`] |
| `aws.ver.change.size` | `remote_ver_change_bytes_total` | size | total cumulative size of objects that were updated out-of-band | map[backend:aws node_id:`<AIS-NODE-ID>`] |
| `azure.get.n` | `remote_get_count` | counter | GET: total number of executed remote requests | map[backend:azure node_id:`<AIS-NODE-ID>`] |
| `azure.get.ns.total` | `remote_get_ns_total` | total | GET: total cumulative time (nanoseconds) to execute remote requests and store, copy, or transform objects | map[backend:azure node_id:`<AIS-NODE-ID>`] |
| `azure.get.size` | `remote_get_bytes_total` | size | GET: total cumulative size (bytes) of all remote transactions | map[backend:azure node_id:`<AIS-NODE-ID>`] |
| `azure.head.n` | `remote_head_count` | counter | HEAD: total number of executed remote requests to a given backend | map[backend:azure node_id:`<AIS-NODE-ID>`] |
| `azure.put.n` | `remote_put_count` | counter | PUT: total number of executed remote requests to a given backend | map[backend:azure node_id:`<AIS-NODE-ID>`] |
| `azure.put.ns.total` | `remote_put_ns_total` | total | PUT: total cumulative time (nanoseconds) to execute remote requests and store new object versions in-cluster | map[backend:azure node_id:`<AIS-NODE-ID>`] |
| `azure.e2e.put.ns.total` | `remote_e2e_put_ns_total` | total | PUT: total end-to-end time (nanoseconds) servicing remote requests; includes: receiving PUT payload, storing it in-cluster, executing remote PUT, finalizing new in-cluster object | map[backend:azure node_id:`<AIS-NODE-ID>`] |
| `azure.put.size` | `remote_e2e_put_bytes_total` | size | PUT: total cumulative size (bytes) of all PUTs to a given remote backend | map[backend:azure node_id:`<AIS-NODE-ID>`] |
| `azure.ver.change.n` | `remote_ver_change_count` | counter | number of out-of-band updates (by a 3rd party performing remote PUTs outside this cluster) | map[backend:azure node_id:`<AIS-NODE-ID>`] |
| `azure.ver.change.size` | `remote_ver_change_bytes_total` | size | total cumulative size of objects that were updated out-of-band | map[backend:azure node_id:`<AIS-NODE-ID>`] |

## Backend metrics

- **GET Metrics:**
  - `remote_get_count`: Total number of executed remote GET requests.
    - **Variable Labels:** `bucket`
  - `remote_get_ns_total`: Total cumulative time (nanoseconds) to execute remote requests and store, copy, or transform objects.
    - **Variable Labels:** `bucket`
  - `remote_get_bytes_total`: Total cumulative size (bytes) of all remote GET transactions.
    - **Variable Labels:** `bucket`

- **PUT Metrics:**
  - `remote_put_count`: Total number of executed remote PUT requests to a given backend.
    - **Variable Labels:** `bucket`, `xkind`
  - `remote_put_ns_total`: Total cumulative time (nanoseconds) to execute remote PUT requests and store new object versions in-cluster.
    - **Variable Labels:** `bucket`, `xkind`
  - `remote_e2e_put_ns_total`: Total end-to-end time (nanoseconds) servicing remote PUT requests (includes receiving PUT payload, storing it in-cluster, executing remote PUT, finalizing new in-cluster object).
    - **Variable Labels:** `bucket`, `xkind`
  - `remote_e2e_put_bytes_total`: Total cumulative size (bytes) of all PUTs to a given remote backend.
    - **Variable Labels:** `bucket`, `xkind`

- **HEAD Metrics:**
  - `remote_head_count`: Total number of executed remote HEAD requests to a given backend.
    - **Variable Labels:** `bucket`
  - `remote_head_ns_total`: Total cumulative time (nanoseconds) to execute remote HEAD requests.
    - **Variable Labels:** `bucket`

- **Out-of-Band Updates:**
  - `remote_ver_change_count`: Number of out-of-band updates (by a 3rd party performing remote PUTs outside this cluster).
    - **Variable Labels:** `bucket`
  - `remote_ver_change_bytes_total`: Total cumulative size (bytes) of objects that were updated out-of-band.
    - **Variable Labels:** `bucket`

## Related Documentation

| Document | Description |
|----------|-------------|
| [Overview](/docs/monitoring-overview.md) | Introduction to AIS observability |
| [CLI](/docs/monitoring-cli.md) | Command-line monitoring tools |
| [Logs](/docs/monitoring-logs.md) | Log-based observability |
| [Prometheus](/docs/monitoring-prometheus.md) | Configuring Prometheus with AIS |
| [Grafana](/docs/monitoring-grafana.md) | Visualizing AIS metrics with Grafana |
| [Kubernetes](/docs/monitoring-kubernetes.md) | Working with Kubernetes monitoring stacks |
