DGX Spark CLI

A powerful CLI tool to manage connections, SSH tunnels, GPU monitoring, and AI/ML workloads for your DGX Spark system.

Features

SSH Connection Management - Quick access to your DGX Spark
Dynamic Port Forwarding - Create and manage SSH tunnels on the fly
GPU Monitoring - Real-time GPU status, memory usage, and process tracking
File Synchronization - Easy rsync-based file transfers
Configuration Management - Persistent connection settings
Integrated Playbooks - Run Ollama, vLLM, NVFP4 quantization, and more with simple commands
Docker Model Runner Integration - Install and drive Docker's DMR (docker model CLI) directly on your DGX Spark
Mutagen-Powered Sync - Create/pause/resume monitorable sync sessions via dgx mutagen ...
Secret & API Key Management - Store HF, W&B, Codex tokens on the DGX with dgx env ... and dgx codex ...

Installation

Prerequisites

Go 1.24+ (for building from source)
SSH client
rsync (for file sync)
Task (optional, for build automation)

Build from Source

# Clone the repository
git clone [email protected]:jwjohns/dgx-spark-cli.git
cd dgx-spark-cli

# Option 1: Using the install script (recommended)
./install.sh

# Option 2: Manual install to ~/.local/bin
go build -o dgx ./cmd/dgx
mkdir -p ~/.local/bin
cp dgx ~/.local/bin/
# Add to PATH if needed:
echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc
source ~/.bashrc

# Option 3: System-wide install
go build -o dgx ./cmd/dgx
sudo cp dgx /usr/local/bin/

Download Prebuilt Binaries

Grab the latest release from the GitHub Releases page. Each tag ships macOS (arm64/amd64), Linux (arm64/amd64), and Windows (amd64) archives—extract the dgx (or dgx.exe) binary and place it somewhere on your PATH.

Update Existing Installation

cd dgx-spark-cli
./update.sh

Quick Start

1. Configure Your DGX Connection

dgx config set

The interactive setup will guide you through:

Hostname/IP of your DGX Spark
SSH port (default: 22)
Username for SSH access
SSH key - automatically detects existing keys or provides setup instructions

Note: When NVIDIA Sync is installed (macOS, Ubuntu, or Windows), dgx config set pre-loads the host, user, port, and Sync-managed SSH key (e.g., ~/Library/Application Support/NVIDIA/Sync/config/ssh_config on macOS, ~/.local/share/NVIDIA/Sync/config/ssh_config on Ubuntu, %APPDATA%/NVIDIA/Sync/config/ssh_config on Windows). On Arch—or any system without Sync—the wizard falls back to your standard ~/.ssh/id_ed25519 / id_rsa keys and shows you how to generate and upload a key if needed.

2. Test Connection

dgx status

3. Connect to DGX

dgx connect

Usage

Connection Management

# Open interactive SSH shell
dgx connect
# or
dgx ssh

# Check connection status
dgx status

# Show current configuration
dgx config show

SSH Tunnel Management

# Create a tunnel: local port 8888 -> remote port 8888
dgx tunnel create 8888:8888 "Jupyter Notebook"

# List active tunnels
dgx tunnel list

# Kill a specific tunnel
dgx tunnel kill <PID>

# Kill all tunnels
dgx tunnel kill-all

Common Tunnel Examples

# Jupyter Notebook
dgx tunnel create 8888:8888 "Jupyter"

# TensorBoard
dgx tunnel create 6006:6006 "TensorBoard"

# VS Code Server
dgx tunnel create 8080:8080 "VSCode"

# JupyterLab
dgx tunnel create 8889:8889 "JupyterLab"

GPU Monitoring

# Show formatted GPU status
dgx gpu

# Show raw nvidia-smi output
dgx gpu --raw

# Sample output:
# ┌─────────────────────────────────────────────────────────────────────┐
# │                         DGX GPU Status                              │
# ├─────────────────────────────────────────────────────────────────────┤
# │ GPU 0: NVIDIA GB100                                                 │
# │   Memory: 2048 MiB / 81920 MiB (Util: 15%)     Temp: 45°C          │
# │   Processes:                                                        │
# │     - PID 12345  python train.py              1024 MiB             │
# ├─────────────────────────────────────────────────────────────────────┤
# └─────────────────────────────────────────────────────────────────────┘

Docker Model Runner (DMR)

Integrated commands

# Prepare Docker + GPU runtime bits
dgx run dmr setup

# Install/upgrade the standalone runner
dgx run dmr install

# Manage models via Docker Hub/Hugging Face/nvcr.io
dgx run dmr pull ai/smollm2:360M-Q4_K_M
dgx run dmr list

dgx run dmr run ai/smollm2:360M-Q4_K_M "Explain quantum computing"
dgx run dmr status
dgx run dmr logs --tail 100

# Update or remove the controller
dgx run dmr update
dgx run dmr uninstall

Remote control quick reference

Use the built-in dgx exec and dgx tunnel commands when you need custom Docker Model Runner invocations:

# Run any docker model command on the DGX
dgx exec "docker model run ai/smollm2:360M-Q4_K_M 'Explain quantum computing'"

# Inspect health/logs
dgx exec "docker model status"
dgx exec "docker model logs --tail 100"

# Forward the HTTP API (default 12434) to localhost
dgx tunnel create 12434:12434 "Docker Model Runner"

# Shut down the tunnel when finished
dgx tunnel kill <PID>

Sample remote workflow

Ensure Docker Model Runner is installed on the DGX (follow the Docker docs).
From your laptop, run dgx exec "docker model install-runner --gpu auto" to provision/upgrade the controller.
Pull a model via dgx exec "docker model pull ai/smollm2:360M-Q4_K_M".
Forward the API with dgx tunnel create 12434:12434 "Docker Model Runner" and access it locally (for example curl http://localhost:12434/models).
Send prompts non-interactively through dgx exec "docker model run ... 'prompt'" or open dgx connect for interactive chats.
Close the tunnel with dgx tunnel kill <PID> and manage the runner lifecycle (logs, uninstall) using dgx exec as needed.

Check the Docker Model Runner blog, the official docs, and the docker/model-runner repository for full workflows.

Environment Tokens (HF / W&B / Codex)

Use the built-in helpers to persist secrets on the DGX (they're stored in ~/.config/dgx/env.sh and sourced via ~/.bashrc):

# Hugging Face
dgx env hf-token
dgx env hf-token --value hf_xxx

# Weights & Biases
dgx env wandb
dgx env wandb --value xxx

# OpenAI Codex
dgx codex set-api-key
dgx codex set-api-key --value sk-...

Need to copy an existing Codex login from your laptop?

# Sync ~/.codex up to the DGX
dgx codex import-config --path ~/.codex

After running these commands, reconnect or source ~/.config/dgx/env.sh on the DGX so shells and playbooks see the new values.

OpenAI Codex CLI

API key flow: dgx codex set-api-key [--value sk-...] saves CODEX_API_KEY and updates the Codex CLI config on the DGX so browser auth isn’t needed.
Browser flow (import): Run codex login on your laptop, then dgx codex import-config --path ~/.codex to copy the resulting credentials to the DGX.
Usage reminder: After either flow, run Codex commands on the DGX via dgx exec "codex <cmd>" (or inside dgx connect).

File Synchronization

# Upload files to DGX
dgx sync ./local/path dgx:~/remote/path

# Download files from DGX
dgx sync dgx:~/remote/path ./local/path

# Sync with delete (removes extraneous files)
dgx sync --delete ./local/path dgx:~/remote/path

Mutagen (continuous sync)

# Create a long-lived two-way sync (requires mutagen CLI)
dgx mutagen create ./app dgx:~/app --name app-sync --mode two-way-resolved

# Inspect, pause/resume, monitor, or tear down sessions
dgx mutagen list
dgx mutagen pause app-sync
dgx mutagen resume app-sync
dgx mutagen monitor app-sync
dgx mutagen terminate app-sync

# Apply a Mutagen project file
dgx mutagen project-apply mutagen.yml

Mutagen offers low-latency syncing for large projects. Install it from mutagen.io on your laptop—the DGX agent is deployed automatically over SSH using your configured key/port.

DGX Spark Playbooks

Run AI/ML workloads with integrated playbook support:

# List available playbooks
dgx playbook list

# Ollama - Local model runner
dgx run ollama install
dgx run ollama pull qwen2.5:32b
dgx run ollama serve

# vLLM - Optimized inference
dgx run vllm pull
dgx run vllm serve meta-llama/Llama-2-7b-hf

# NVFP4 - 4-bit quantization
dgx run nvfp4 setup
dgx run nvfp4 quantize meta-llama/Llama-2-7b-hf

# Execute custom commands
dgx exec docker ps
dgx exec nvidia-smi

Ollama install may prompt for your DGX sudo password so the installer can write to /usr/local.

See PLAYBOOKS.md for complete documentation and examples.

Workflow Examples

Start a Jupyter Session

# 1. Create tunnel for Jupyter
dgx tunnel create 8888:8888 "Jupyter"

# 2. Connect and start Jupyter on DGX
dgx connect
# On DGX:
jupyter lab --no-browser --port 8888

# 3. Open browser to http://localhost:8888

Monitor Training Jobs

# Check GPU status
dgx gpu

# Create TensorBoard tunnel if needed
dgx tunnel create 6006:6006 "TensorBoard"

# Upload training code
dgx sync ./my-model dgx:~/experiments/

# Connect and start training
dgx connect

Development Workflow

# Create tunnels for common services
dgx tunnel create 8888:8888 "Jupyter"
dgx tunnel create 6006:6006 "TensorBoard"

# Sync code to DGX
dgx sync ./project dgx:~/work/project

# Monitor GPU usage
dgx gpu

# When done, clean up tunnels
dgx tunnel kill-all

Configuration

Configuration is stored in ~/.config/dgx/config.yaml:

host: dgx-spark.example.com
port: 22
user: username
identity_file: /home/user/.ssh/id_ed25519
tunnels: []

You can edit this file manually or use dgx config set. If NVIDIA Sync metadata is present (macOS/Ubuntu/Windows), the CLI seeds this file automatically the first time you run it so those platforms work without additional prompts while other distros continue to use the standard SSH key locations.

Development

Project Structure

dgx-manager/
├── cmd/dgx/           # Main application entry point
├── internal/
│   ├── config/        # Configuration management
│   ├── ssh/           # SSH client implementation
│   ├── tunnel/        # Tunnel management
│   └── gpu/           # GPU monitoring
├── pkg/types/         # Shared types
├── Taskfile.yaml      # Build automation
└── README.md

Build Commands

# Build
task build

# Run tests
task test

# Lint code
task lint

# Format code
task fmt

# Install locally
task install

# Build release binaries
task release

Python tooling is managed with uv — use it whenever you need to run or install Python-based utilities.

Troubleshooting

Connection Fails

# Verify SSH key permissions
chmod 600 ~/.ssh/id_ed25519

# Test SSH connection manually
ssh -i ~/.ssh/id_ed25519 user@dgx-host

# Check configuration
dgx config show

Tunnel Port Already in Use

# List active tunnels
dgx tunnel list

# Kill specific tunnel
dgx tunnel kill <PID>

# Or use a different local port
dgx tunnel create 8889:8888 "Jupyter Alt Port"

GPU Command Fails

# Ensure nvidia-smi is available on DGX
dgx connect
nvidia-smi  # Should work

# Try raw output for more details
dgx gpu --raw

Tips & Tricks

Shell Aliases

Add to your ~/.bashrc or ~/.zshrc:

alias dgx-gpu='dgx gpu'
alias dgx-ssh='dgx connect'
alias dgx-jupyter='dgx tunnel create 8888:8888 "Jupyter"'
alias dgx-tensorboard='dgx tunnel create 6006:6006 "TensorBoard"'

Persistent Tunnels

Create a script to set up your common tunnels:

#!/bin/bash
# ~/bin/dgx-setup-tunnels.sh

dgx tunnel create 8888:8888 "Jupyter"
dgx tunnel create 6006:6006 "TensorBoard"
dgx tunnel create 8080:8080 "VSCode"
echo "All tunnels created"
dgx tunnel list

Quick Status Check

# One-liner to check everything
dgx status && dgx gpu && dgx tunnel list

License

MIT

Contributing

Contributions welcome! This is a personal development tool but feel free to fork and customize for your needs.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github/workflows		.github/workflows
cmd/dgx		cmd/dgx
internal		internal
pkg/types		pkg/types
.gitignore		.gitignore
.golangci.yml		.golangci.yml
PLAYBOOKS.md		PLAYBOOKS.md
QUICKSTART.md		QUICKSTART.md
README.md		README.md
ROADMAP.md		ROADMAP.md
Taskfile.yaml		Taskfile.yaml
go.mod		go.mod
go.sum		go.sum
install.sh		install.sh
update.sh		update.sh

jwjohns/dgx-spark-cli

Folders and files

Latest commit

History

Repository files navigation