Overview of HPC clusters with enhanced cluster management capabilities

To create the infrastructure for tightly-coupled applications that scale across multiple nodes, you can create a cluster of virtual machine (VM) instances. This guide provides a high-level overview of the key considerations and steps to configure a cluster of virtual machine (VM) instances for high performance computing (HPC) workloads using dense resource allocation.

With H4D, Compute Engine adds support for running massive HPC workloads by treating an entire cluster of VM instances as a single computer. Using topology-aware placement of VMs lets you access many instances within a single networking superblock and minimizes network latency. You can also configure Cloud RDMA on these instances to maximize inter-node communication performance, which is crucial for tightly-coupled HPC workloads.

You create these HPC VM clusters with H4D by reserving blocks of capacity instead of individual resources. Using blocks of capacity for your cluster enables enhanced cluster management capabilities.

HPC clusters with H4D instances can be created either with or without enhanced cluster management capabilities. If you don't require enhanced cluster management capabilities features with your H4D HPC cluster, or if you want to create HPC clusters using a machine series other than H4D, then use the following instructions for creating HPC instances or clusters:

Cluster terminology

When working with blocks of capacity, the following terms are used:

Blocks

Multiple sub-blocks interconnect with a non-blocking fabric, providing a high-bandwidth interconnect. Any CPU within the block is reachable in a maximum of two network hops. The system exposes block and sub-block metadata to orchestrators to enable optimal job placement.

Clusters

Multiple blocks interconnect to form a cluster that scales to thousands of CPUs for running large-scale HPC workloads. Each cluster is globally unique. Communication across different blocks adds only one additional hop, maintaining high performance and predictability, even at a massive scale. Cluster-level metadata is also available to orchestrators for intelligent, large-scale job placement.

Cluster Toolkit

An open source tool offered by Google that simplifies the configuration and deployment for clusters that use either Slurm or Google Kubernetes Engine. You use predefined blueprints to build a deployment folder that is based on the blueprint. You can modify blueprints or the deployment folder to customize deployments and your software stack. You then use Terraform or Packer to run the commands generated by Cluster Toolkit to deploy the cluster.

Dense deployment

A resource request that allocates your compute instance resources physically close to each other to minimize network hops and optimize for the lowest latency.

Network fabric

A network fabric provides high-bandwidth, low-latency connectivity across all blocks and Google Cloud services in a cluster. Jupiter is Google's data center network architecture that leverages software-defined networking and optical circuit switches to evolve the network and optimize its performance.

Node or host

A single physical server machine in the data center. Each host has its associated compute resources CPUs, memory, and network interfaces. The number and configuration of these compute resources depend on the machine family. VM instances are provisioned on top of a physical host.

Orchestrator

An orchestrator automates the management of your clusters. With an orchestrator, you don't have to manage each VM instance in the cluster. An orchestrator, such as Slurm or Google Kubernetes Engine (GKE), handles tasks like job queueing, resource allocation, auto scaling (with GKE), and other day-to-day cluster management tasks.

Sub-blocks

These are foundational units where a group of hosts physically co-locates on a single rack. A Top-of-Rack (ToR) switch connects these hosts, enabling extremely efficient, single-hop communication between any two CPUs within the sub-block. Cloud RDMA facilitates this direct communication.

Overview of cluster creation process with H4D VMs

To create HPC clusters on reserved blocks of capacity, you must complete the following steps:

Provisioning models for VM and cluster creation

When creating VM instances, you can use the provisioning models described in Compute Engine instances provisioning models.

To create a tightly-coupled H4D instances, you must use one of the following provisioning models to obtain the necessary resources for creating compute instances:

Reservation-bound: you can reserve resources at a discounted price for a future date and duration. At the start of your reservation period, you can use the reserved resources to create VMs or clusters. You have exclusive access to your reserved resources for the reservation period.
Flex-start: you can request discounted resources for up to seven days. Compute Engine makes best-effort attempts to schedule the provisioning of your requested resources as soon as they're available. You have exclusive access to your obtained resources for your requested period.
Spot: based on availability, you can immediately obtain deeply discounted resources. However, Compute Engine might stop or delete the VM instances at any time to reclaim capacity.

Reservation-bound provisioning model

The reservation-bound provisioning model links your created VM instances to the capacity that you previously reserved. When you reserve capacity, Compute Engine creates an empty reservation. Then, at the reservation start time, the following occurs:

Compute Engine adds your reserved resources to the reservation. You have exclusive access to the reserved capacity until the reservation end time.
Google Cloud charges you for the reserved capacity until the end of your reservation period, whether you use the capacity or not.

You can then use the reserved resources to create VMs without additional charges. You only pay for resources that aren't included in the reservation, such as disks or IP addresses.

You can reserve resources for as many VMs as you like for as long as you like for a future date. Then, you can use the reserved resources to create and run VMs until the end of the reservation period. If you reserve resources for one year or longer, then you must purchase and attach a resource-based commitment.

To provision resources using the reservation-bound provisioning model, see:

For long-running, large-scale distributed workloads with densely allocated resources: Reserve capacity through your account team
For short-running (up to 90 days) distributed workloads with densely allocated resources: Future reservation requests in calendar mode

You can use reservation-bound provisioning with H4D instances by specifying the reservation-bound provisioning model when creating individual VMs, a HPC cluster, or a group of VMs.

Flex-start provisioning model

To run short-duration workloads that require densely allocated resources, you can request compute resources for up to seven days by using Flex-start. Whenever resources are available, Compute Engine creates your requested number of VMs. You can stop standalone Flex-start VMs, but you can't stop Flex-start VMs that a managed instance group (MIG) creates through resize requests. The Flex-start VMs exist until you delete them, or until Compute Engine deletes the VMs at the end of their run duration.

Flex-start is ideal for workloads that can start at any time. The flex-start provisioning model provisions resources from a secure capacity pool, so the allocated resources are densely allocated to minimize network latency.

When you add Flex-start VMs to a managed instance group (MIG) by using resize requests, the MIG creates the VMs all at once. This approach helps you avoid unnecessary charges for partial capacity that Compute Engine might deliver while you wait for the full capacity needed to start your workload.

You can use Flex-start provisioning with H4D instances, using any available deployment model.

Spot provisioning model

To run fault-tolerant workloads, you can obtain compute resources immediately based on availability. You get resources at the lowest price possible. However, Compute Engine might stop or delete the created Spot VMs at any time to reclaim capacity. This process is called preemption.

Spot VMs are ideal for workloads where interruptions are acceptable, such as:

Batch processing
High performance computing (HPC)
Data analytics
Continuous integration and continuous deployment (CI/CD)
Media encoding

You can use Spot VMs with any machine type, except A4X, X4, and bare metal machine types. Dense allocation depends on resource availability. To help ensure a closer allocation, you can apply a compact placement policy to the Spot VMs.

You can use Spot VMs with the following dense deployment options:

Choose a consumption option and obtain capacity

Consumption options determine how resources are obtained for your cluster. To create a cluster that uses enhanced cluster management capabilities, you must request blocks of capacity for a dense deployment.

The following table summarizes the key differences between the consumption options for blocks of capacity:

Consumption option	Future reservations for capacity blocks	Future reservations for up to 90 days (in calendar mode)	Flex-start	Spot
Workload characteristics	Long-running, large-scale distributed workloads that require densely allocated resources	Short-duration workloads that require densely allocated resources	Short-duration workloads that require densely allocated resources	Fault-tolerant workloads
Lifespan	Any time	Up to 90 days	Up to 7 days	Any time, but subject to preemption
Preemptible	No	No	No	Yes
Capacity assurance	Very high	Very high	Best effort	Best effort
Quota	Check that you have enough quota before creating instances.	No quota is charged	Preemptible quota is charged.	Preemptible quota is charged.
Pricing	See pricing for VMs. If you reserve resources for a year or longer, then you must purchase and attach a resource-based commitment to your reserved resources. You're charged for the reservation period. See reservations billing.	Discounted (up to 25%). See Dynamic Workload Scheduler pricing. You're charged for the reservation period. See reservations billing.	Discounted (up to 25%). See Dynamic Workload Scheduler pricing. You pay as you go (PAYG).	Deeply discounted (60-91%). See Spot VMs pricing and pricing for compute-optimized VMs. You pay as you go (PAYG).
Resource allocation	Dense	Dense	Dense	Standard (Compact placement policy optional)
Provisioning model	Reservation-bound	Reservation-bound	Flex-start	Spot
Creation method	To create HPC clusters and VMs, you must do the following: Reserve capacity through your account team At your chosen date and time, you can use the reserved capacity to create HPC clusters. See Choose a deployment option.	To create HPC clusters and VMs, you must do the following: Create a future reservation request in calendar mode At your chosen date and time, you can use the reserved capacity to create HPC clusters. See Choose a deployment option.	To create VMs, select one of the following options: Create a standalone Flex-start VMs. Create Flex-start VMs all at once by using MIG resize requests. Use GKE to run high performance computing (HPC) workloads with H4D. When your requested capacity becomes available, Compute Engine provisions it.	You can immediately create VMs. See Choose a deployment option.

Choose a deployment option

High performance computing (HPC) workloads aggregate computing resources to gain performance greater than that of a single workstation, server, or computer. HPC is used to solve problems in academic research, science, design, simulation, and business intelligence.

For HPC clusters with enhanced cluster management capabilities, choose the H4D machine series. If you plan to use a different machine series, follow the documentation at Create an HPC-ready VM instance instead of using the deployment methods listed on this page.

Some of the available deployment options include the installation and configuration of an orchestrator for enhanced management of the HPC cluster.

For the most appropriate option to create your VMs or clusters for your use case, choose one of the following:

Option	Use case
Cluster Toolkit	You want to use open-source software that simplifies the process for you to deploy both Slurm and Google Kubernetes Engine (GKE) clusters. Cluster Toolkit is designed to be highly customizable and extensible. To learn more, see the following: Create an H4D Slurm cluster with enhanced cluster management capabilities Quickstart:Create a Cloud RDMA-enabled HPC Slurm cluster
GKE	You want maximum flexibility in configuring your Google Kubernetes Engine cluster based on the needs of your workload. To learn more, see Run HPC workloads with H4D.
Use Compute Engine	You want full control of the infrastructure layer so that you can set up your own orchestrator. To learn more, see the following: Create an HPC-optimized instance (non-dense deployments) Create an HPC-ready VM instance Create an instance that uses Cloud RDMA Create H4D instances in bulk Create a managed instance group (MIG) with H4D instances Create a HPC MIG with H4D machine series Quickstart: Create a MIG with H4D machine types and flex-start Quickstart: Create a MIG for HPC workloads with reservation-bound consumption

Option

Use case

Cluster Toolkit

You want to use open-source software that simplifies the process for you to deploy both Slurm and Google Kubernetes Engine (GKE) clusters. Cluster Toolkit is designed to be highly customizable and extensible. To learn more, see the following:

GKE

You want maximum flexibility in configuring your Google Kubernetes Engine cluster based on the needs of your workload. To learn more, see Run HPC workloads with H4D.

Use Compute Engine

You want full control of the infrastructure layer so that you can set up your own orchestrator. To learn more, see the following:

Create an HPC-optimized instance (non-dense deployments)
- Create an HPC-ready VM instance
- Create an instance that uses Cloud RDMA
Create H4D instances in bulk
Create a managed instance group (MIG) with H4D instances

Choose the operating system image

The operating system (OS) image you choose depends on the service you use to deploy your cluster.

For clusters on GKE: Use a GKE node image, such as Container-Optimized OS. If you use Cluster Toolkit to deploy your GKE cluster, a Container-Optimized OS image is used by default. For more information about node images, see Node images in the GKE documentation.
For clusters on Compute Engine: You can use one of the following images:
- HPC VM image: A Rocky Linux 8 image that is optimized for tightly-coupled HPC workloads.
- OS image provided by Google Cloud: OS images that support H4D. You will need to configure these for your HPC workloads.
- Custom images: You can create and use your own custom images. To include HPC-specific optimizations, we recommend that you create a custom image using the HPC VM image.
For Slurm Clusters: Cluster Toolkit deploys the Slurm Cluster with a HPC VM image based on Rocky Linux 8 that is optimized for tightly-coupled HPC workloads.

Create your HPC cluster

After you review the cluster creation process and make preliminary decisions for your workload, create your cluster by using any of the deployment options.

Enhanced cluster management capabilities for your HPC cluster

When you create H4D instances with densely allocated resources using the deployment methods mentioned in Choose a deployment option, you can use enhanced HPC cluster management capabilities with your instances.

For more information about these capabilities, see Enhanced HPC cluster management with H4D instances.

What's next

Learn more about Cluster Toolkit.
Try the Quickstart tutorial Deploy an HPC cluster with Slurm.
Review best practices for running HPC workloads