Rest Api
Rest Api
(aka slurmrestd)
Nathan Rini
SC'24
Questions?
Short answer:
Slurm without command line
What is the Slurm REST API?
● Exhaustive answers with live demos can be covered during Slurm onsite trainings:
○ Please email [email protected] to set up a training session.
slurmrestd - documentation
● See the following documentation for detailed explanations of the contents discussed in the
slides:
○ REST API Quick Start Guide
○ REST API Reference
○ REST API Client Writing Guide
○ REST API Generated Documentation from OpenAPI Schema
○ REST API Release Notes
○ SLURM Release NEWS
○ SLURM Release Notes
○ REST API Support matrix
We are trying to improve the documentation in every release. If something is missing, please
open a ticket via bugzilla and we can look into documenting it.
Example use cases via CLI
Preparation: Setup shell
● Example bash functions query and post to slurmrestd via TCP socket:
#!/bin/bash
function rest_query()
{
[ -z "$1" ] && echo "$0 {hostname:port} {Query Path}" >&2 && return 1
export $(unset SLURM_JWT; scontrol token lifespan=10)
curl -s -H "X-SLURM-USER-TOKEN:${SLURM_JWT}" -X GET "http://${1}/${2}"
}
function rest_post()
{
[ -z "$1" ] && echo "$0 {hostname:port} {Query Path} {file to post}" >&2 && return 1
export $(unset SLURM_JWT; scontrol token lifespan=10)
mime_type="$(file -i ${3} | cut -d ' ' -f2 - | tr -d ';')"
curl -s -H "X-SLURM-USER-TOKEN:${SLURM_JWT}" -H "Content-Type: ${mime_type}" -X POST \
--data-binary "@${3}" "http://${1}/${2}"
}
export -f rest_query rest_post
● User name is only required to act as a proxy, otherwise user encoded in token in used:
-H "X-SLURM-USER-NAME:$(whoami)"
● Make sure to always call `--data-binary` and not `--data` when using curl to avoid the
payload being corrupted by curl’s auto type conversion.
Query jobs information
● Get state of first Array Job task with state of Job 194_1:
● job.json:
{
"script": "#!/bin/bash\nsrun uptime",
"job": {
"environment": [
"PATH=/bin/:/usr/bin/:/sbin/"
],
"account": "test",
"name": "test slurmrestd job",
"current_working_directory": "/tmp/",
"memory_per_node": {
"set": true,
"number": 100
},
"tasks": 5,
"nodes": "2-10"
}
}
Example Array Job description
● array_job.json:
{
"script": "#!/bin/bash\nsrun uptime",
"job": {
"environment": [
"PATH=/bin/:/usr/bin/:/sbin/"
],
"current_working_directory": "/tmp/",
"account": "test",
"array": "100",
"name": "test slurmrestd array job",
"memory_per_node": {
"set": true,
"number": 100
},
"tasks": 5,
"nodes": "2-10"
}
}
Example HetJob description
● het_job.json:
{ "tasks": 1,
"script": "#!/bin/bash\nsrun uptime", "nodes": "1",
"jobs": [ "current_working_directory": "/tmp/",
{ "environment": [
"environment": [ "PATH=/bin/:/usr/bin/:/sbin/"
"PATH=/bin/:/usr/bin/:/sbin/" ]
], },
"current_working_directory": "/tmp/", {
"account": "test", "current_working_directory": "/tmp/",
"name": "test slurmrestd job", "nodes": "1",
"memory_per_node": { "environment": [
"set": true, "PATH=/bin/:/usr/bin/:/sbin/"
"number": 100 ]
}, }
"tasks": 5, ]
"nodes": "2-10" }
},
{
"memory_per_node": {
"set": true,
"number": 15
},
Submit example jobs
● Cancel a job:
● Prerequisite:
○ Install openapi-generator-cli
● Compile and install library for client
● Run python3 in interactive mode and setup environment for all examples:
$ python3
from pprint import pprint
import openapi_client
import subprocess
import os
import re
from openapi_client import ApiClient as Client
from openapi_client import Configuration as Config
c = openapi_client.Configuration(host = "http://localhost:8080/",
access_token = subprocess.run(['scontrol', 'token',
'lifespan=9999'], check=True, capture_output=True,
text=True).stdout.replace('SLURM_JWT=','').replace('\n',''))
slurm = openapi_client.SlurmApi(openapi_client.ApiClient(c))
slurmdb = openapi_client.SlurmdbApi(openapi_client.ApiClient(c))
Inspection of generated OpenAPI client
resp = slurm.slurm_v0042_get_jobs()
for job in resp.jobs:
print(job.job_id)
● Get state of first Array Job task with state of all jobs known to slurmctld:
resp = slurm.slurm_v0042_get_jobs()
for job in resp.jobs:
for state in job.job_state:
print(state)
Query jobs information
● Cancel a job:
slurm.slurm_v0042_delete_job(job_id='3694')
slurm.slurm_v0042_delete_job(job_id='3694', signal='SIGINT')
Control Jobs
job = JobDesc(tasks=15)
print(slurm.slurm_v0042_post_job(job_id=’3697’, v0042_job_desc_msg=job))
Slurm CLI: YAML & JSON
JSON and YAML for the command line
● Functionality from slurmrestd has been added to existing CLI commands to provide JSON
and YAML output:
sshare --json sshare --yaml
● Get state of first Array Job task with state of all jobs known to slurmctld:
● CLI commands
○ –yaml/–json without an argument defaults to latest version (v0.0.42 on 24.11)
client
MUNGE security domain client
client
client
slurmctld login node(s) client
client
client
slurmdbd sshd
slurmd
slurmd
slurmd Authentication provided by ssh
slurmd connections. MUNGE honors
slurmd user/group as reported by kernel.
Slurm REST API using only MUNGE and command line
sshd client
slurmdbd
User Workflow
Manager Authentication provided by
slurmrestd ssh connections. MUNGE
slurmd (Inet Mode)
slurmd honors user/group as reported
slurmd shell script
by kernel.
slurmd (via pipes)
slurmd
Unprivileged client invokes
slurmrestd directly via pipes.
Slurm REST API using JSON web tokens in an existing cluster
JWT authentication
client
client
MUNGE security domain client
slurmrestd client
slurmctld client
client
client
slurmdbd
Authenticating Proxy
slurmdbd
slurmdbd JWKS
certificate JWT identity service
JWT authentication
Cluster A: Cluster B: Cluster C:
slurmrestd slurmrestd slurmrestd
● How to compile
○ Follow normal configuration procedure first
○ slurmrestd will be automatically compiled if all prerequisites are present
A
■ checking whether to compile slurmrestd... yes
■ checking for slurmrestd default port... 6820
○ Possible to explicitly request slurmrestd
■ ../configure --enable-slurmrestd
● slurmrestd is just another unprivileged binary callable by any user
○ Installed at EPREFIX/sbin/slurmrestd
■ Possible to change install path when calling configure:
● ../configure --prefix=$NEW_INSTALL_PATH
● ../configure --sbindir=$NEW_INSTALL_PATH
● slurmrestd must be able to communicate with slurmctld and slurmdbd via TCP
connections.
slurmrestd - Invoked directly
● Call slurmrestd directly from a shell script or from a in-cluster workflow manager
○ Avoids requiring any new authentication for the cluster
○ Requires that client handle HTTP communications
● Example (truncated):
$ echo -e 'GET /slurm/v0.0.42/jobs HTTP/1.1\r\n' | slurmrestd
HTTP/1.1 200 OK
Content-Length: 8758
Content-Type: application/json
{
"jobs": [
{
"job_id": 192,
"job_state": [
"RUNNING"
],
slurmrestd - Proxying
● Start daemon listening on IPv4 localhost TCP port 8080, IPv6 localhost TCP port 8080, IPv6
and IPv4 on all interfaces TCP port 8181, streaming Unix socket at /path/to/unix.socket with
Slurm-24.11 content plugins only using JWT authentication for a Slurm-24.11 install.
● Start daemon listening on IPv4 localhost TCP port 8080, IPv6 localhost TCP port 8080, IPv6
and IPv4 on all interfaces TCP port 8181, streaming Unix socket at /path/to/unix.socket with
Slurm-24.05 content plugins only using JWT authentication for a Slurm-24.11 install.
cp $BUILD_PATH/etc/slurmrestd.service
/usr/lib/systemd/system/slurmrestd.service
echo ‘SLURMRESTD_LISTEN=:8080’ > /etc/default/slurmrestd
systemctl daemon-reload
systemctl start slurmrestd
Optimization
slurmrestd: Fast mode Parser (v0.0.40+,23.11+)
● Generated JSON/YAML outputs are by default done with extra characters to improve
readability.
○ For some sites with large number of requests to slurmrestd, skipping unnecessary
whitespace characters can have considerable performance benefit in processing
time and reduced network usage.
● Environment variables to activate compact mode:
○ SLURMRESTD_YAML=compact
○ SLURMRESTD_JSON=compact
● Example:
env SLURMRESTD_JSON=compact SLURMRESTD_YAML=compact slurmrestd
-d v0.0.42 -s slurmdbd,slurmctld $SLURMRESTD_LISTEN
Client compatibility
slurmrestd - Plugins lifetime matrix
● Unversioned slurmctld and slurmdbd content plugins added in Slurm-23.11 have no planned removal date.
Compatibility Testing
● slurmrestd is currently tested using:
○ Golang codegen
■ Used as client generator for Slinky (Slurm’s Kubernetes project)
○ openapi-generator-cli generated python client
■ Tests use static driver code against generated python clients
■ New test units are required for each data_parser version and the major version
of openapi-generator-cli.
■ Arguably the most popular client generator for OpenAPI due to heritage from
Swagger.
○ curl
■ Direct queries of slurmrestd using hand crafted requests
■ Use of curl for site scripting is not advised
● Breaking changes of existing clients of the same version are considered a bug.
● General goal of reducing changes required for porting to newer versions.
○ Depending on the relevant change(s), requests in prior accepted formats may still be
accepted but with warnings sent to client.
openapi-generator.tech: OpenAPI Standard Compliance
● openapi-generator.tech created clients can not handle or refuse unexpected data types
○ In most cases, the client will assert but others just result in a segfault.
● OpenAPI standard includes oneOf() and anyOf() operators to allow for polymorphism
○ Allows return of null when a field isn’t set.
○ Slurm makes heavy use of polymorphism internally.
■ slurmrestd designed to handle polymorphic formats
● openapi-generator.tech’s generator is not monolithic
○ Uses a plugin based approach to create generators for many languages and some
languages have more than one generator.
○ Clients for each language have varying level of OpenAPI standard support
● openapi-generator.tech generated clients will crash when handed (some) schemas using
oneOf(). All usage of oneOf() has been removed (v0.0.37+) to avoid breaking clients.
To Infinity and… Assert!
● Slurm makes heavy use of Infinity or Unlimited, usually as a way to disable a limit.
● ECMA-404 JSON does not support a value of infinity (or ±infinity or ±NaN)
○ Most JSON parsers actually support infinity
■ Some silently convert to max of the internal type:
$ echo infinity | jq
1.7976931348623157e+308
○ OpenAPI standard does not support (or explicitly ban) use of infinity
■ openapi-generator-cli python client will assert upon receiving infinity
● slurmrestd supports infinity (and NaN which is not used)
○ slurmrestd can automatically convert “inf”, “+inf”, “-inf”, “infinity”, “+infinity”, “-infinity”
string values to OpenAPI number format for inputs.
■ Warnings will still be issued about non-compliance with OpenAPI specification
for such format conversions for any given field.
■ slurmrestd should not output infinity or NaN to avoid breaking clients.
slurmrestd and the non-compliant clients?
● Format and layout of schemas are designed to be consistent between all Slurm releases
where the versioned plugin is originally tagged in the release.
○ A query to v0.0.40 endpoint in Slurm-23.11 should work the same as a query to
v0.0.40 endpoint in Slurm-24.05 and Slurm-24.11.
○ Schemas changes during patchset releases are only done to correct breaking issues,
such as ones causing most openapi-generator-cli clients to crash.
○ Schemas between different data_parser versions are not guaranteed to be
compatible and in some cases may be entirely different. Make sure to test clients
when porting between versions.
● OpenAPI Specifications are tagged with the data_parser plugin version and have same
version stability.
Major release changes
Changes in Slurm-24.05
● Removal of v0.0.38 endpoints.
● Added v0.0.41 endpoints.
● Partial support for gracefully handling soft memory limits (ticket#19899,18406)
● Add easily overridable environment variable SLURMRESTD_LISTEN in systemd unit
slurmrestd.service (ticket#18693)
● Populating “deprecated” fields in OpenAPI schema (ticket#17916)
● Change OpenAPI schema to reduce “$ref” entries with `+prefer_refs` flags to reverse
change (ticket#20378)
● Add `slurmrestd --generate-openapi-spec` arg to allow generating OpenAPI schema
without running daemon or slurm.conf being present (ticket#19303)
● Support running slurmrestd without slurmdbd configured/online (ticket#19899)
● Add comment descriptions to all fields in >=v0.0.41 in OpenAPI schema (ticket#16961)
● Size buffering per kernel hints to reduce memory usage (ticket#19641)
Changes in Slurm-24.11
● Removal of v0.0.39 endpoints. (ticket#18484)
● Added v0.0.42 endpoints. (ticket#18484)
● Removal of all deprecated fields in v0.0.42 endpoints. (ticket#19938)
● Add `GET slurm/v0.0.42/nodes` endpoint (ticket#19745)
● Error with Authentication Failure [401] instead of Internal Server Error [500] (ticket#18516)
● Added support for ‘slurmrestd -d latest’ arg (ticket#20615)
● Added ‘DataParserParameters’ to slurm.conf (ticket#21121)
● Switch to `+prefer_refs` flag as default with `+minimize_refs` flag to allow reverse of
change (ticket#20378)
● New formatting and machine friendly for `scontrol ping –json` and ‘GET
/slurm/v0.0.42/ping` (ticket#20324)
● New ‘sacctmgr ping –json` and ‘GET /slurmdb/v0.0.42/ping` endpoint (Issue#17)
● Latency improvements (ticket#20114)
Planned changes in Slurm-25.05
Any other changes will be announced in Tim Wickberg’s 24.05, 24.11 and Beyond presentation at
the Slurm BOF at 12:15pm - 1:15pm EST in B203.
Questions?