Essential HPC Terminology: A Glossary of 46 Terms

Forrest BurtFebruary 9, 2023

High Performance Computing (HPC) is ultimately the leveraging of advanced computing resources for some specific purpose. A wide variety of tasks across many different fields are necessarily done via HPC due to the massive computational needs involved in them; this can include data science, chip design, vehicle safety testing, weather simulations, and a myriad of others. With all these different use cases, it follows that there are many technologies, applications, paradigms, best practices, and other aspects to HPC, and thus a significant amount of basic terminology about it. Let’s demystify some of the basics, and learn more about what exactly “supercomputing” is.

Note: This is the abridged version of the HPC glossary. For a list of complete definitions, visit our wiki, CIQ’s glossary of industry terms.

Accelerator card, accelerator - A catch-all term for devices that accelerate a specific type of computing. Some of these devices can work with a broad set of applications and workloads, while some are more specialized. For example, GPUs can be used to speed up a broad variety of HPC workloads like those in computational fluid dynamics, molecular dynamics, and AI training, while FPGAs can be used to speed up cryptography, digital signal processing, or even to implement whole custom systems-on-a-chip.

AI training/inference - AI training and inference usually have to leverage HPC-type resources in order to run efficiently, even if these operations are sometimes more associated with enterprise computing based on the context they’re being done in.

Application - Software run on HPC infrastructure to do computational jobs, like an AI training framework or a CFD program.

Apptainer - An open source container solution that unites the world of containers with the world of HPC in unique ways that other containerization solutions haven’t traditionally addressed, allowing for containers to be used in HPC. A mature technology and the continuation of the open source side of the original Singularity project, Apptainer is currently a part of the Linux Foundation and is deployed at thousands of sites worldwide.

Beowulf cluster, Beowulf-style cluster, Beowulf architecture - A cluster comprised of multiple consumer-grade computers, like common rack-mount servers, tower desktops, or even at some points in history, gaming console systems, all networked together over an interconnect and capable of pooling their computational power together on one task.

Batch jobs, batch processing - Computational jobs that a user submits and leaves to run without the job expecting additional input from them.

Cluster - A set of computers linked to each other in such a way that their resources can be collectively pooled together on computational jobs. In the case of HPC, the word typically refers to a Beowulf-style cluster for specific industrial computing applications.

Computational fluid dynamics, CFD - The simulation of air and fluid flows. For example, simulating the flow of water through a hydroelectric turbine or the flow of air over a car or jet in motion.

Computational job - A single, discrete task that is done on some set of HPC resources. This could be a data processing step of a, for example, large genomics workflow, or the calculation step for the forces in aerodynamics (CFD), or a run of AI training.

Compute node, node - A single server (on-prem) or instance (in the cloud) in an HPC cluster acting as a provider of computing resources for computational jobs to run on.

Container - A self-contained package that wraps software and its dependencies together in a way that is portable and standardized. Can quite literally be viewed as a standardized vehicle for software in the same way a physical metal intermodal shipping container is a standardized vehicle for cargo.

Data science - A catch-all term for the leveraging of dataset analysis for some specific goal. As “big data” has become a concept here in the past decade, practices for meaningfully exploring very large datasets computationally have been developed.

Electronic design automation, EDA - A common HPC use case involving the simulation of computer chip designs, circuit boards, or similar items. For example, simulating how the design of a die for a given CPU chip will actually work when powered up.

Embarrassingly parallel - Performing the same type of calculation as part of a computational job over a broad set of data in parallel on some HPC resources.

Finite element analysis, FEM - A common HPC workload involving the solving of certain types of advanced equations across a wide variety of fields. FEM problems are found in simulations of structural loads on buildings, the forces on a vehicle in a crash safety test, the movement of biological structures like joints, and many other problems.

FPGA, field-programmable gate array - A specialized computer component that is essentially a chip full of low-level computer logic gates that can be reprogrammed to implement algorithms for specific workloads. FPGAs can also include other components with their logic gate array like dedicated on-chip memory or small built-in accelerators for certain types of low-level computations.

Fuzzball - Fuzzball is the next generation of high performance computing. An upcoming product from CIQ, Fuzzball leverages the capabilities of enterprise, cloud, and hyperscale computing within the context of HPC.

Fuzzball workflow - A YAML document that defines a set of jobs, volumes, and data movement so that a Fuzzball cluster can parse the workflow document and set up the job environment for execution on a set of compute resources.

Genomic sequencing - When genes are sequenced from some biological sample via laboratory equipment, the initially-post-processed output is files with sequences of base pairs. A base pair is one of the chemicals adenine (A), thymine (T), guanine (G), or cytosine (C), and they are the basic building blocks of DNA—sequences of them represent genes.

GPU, graphical processing unit, graphics card - A specialized computer component normally used for the processing of graphics, but widely applicable to a large number of general-purpose computing use cases.

HPC, high performance computing, supercomputing - Using some form of dedicated computing resources to accelerate a computational workload. There’s not generally a “one size fits all” definition of HPC, as the scales it can happen at are very broad.

HPC cluster, supercomputer, also just “cluster” - A Beowulf-style cluster designed specifically to run computational jobs.

HPC interconnect, interconnect, high-speed interconnect - A high-speed network capable of passing data around an HPC system at the speeds necessary to facilitate effective use of things like MPI and accelerator cards around an HPC cluster.

HPC triangle - Generally, an HPC cluster requires three basic resources: fast compute, fast storage, and fast networking. These three components can each bottleneck the others very easily in an HPC cluster, making the design and planning of a supercomputer deployment a complex task. Together, they form an “HPC triangle,” analogous to the “cheap, fast, good” triangle.

HPC workload, workload - A high-level term for what’s done on an HPC cluster. The distinction between a “workload” and a “workflow” is that “workload” is more generically used as a catch-all for the different tasks a given field might be using HPC resources for, while a “workflow” is usually a specific set of codified computational jobs accomplishing one of those specific tasks within a given field.

HTC, high throughput computing, grid computing, similar to “distributed computing” - The use of large-spread networks of resources to do computational jobs that don’t require massively parallel resources spanning multiple compute nodes or even multiple cores, as frequently the case with computational jobs in HPC.

Infrastructure, HPC infrastructure - The implementation of the HPC triangle in hardware (and sometimes software) that enables HPC applications to run in parallel over a given set of HPC resources.

Interactive jobs - Computational jobs that a user can interact with directly via some kind of interface. Contrasted with batch jobs, or batch processing, as one of the two primary types of jobs a user may want to run on an HPC cluster.

Jupyter notebook - A popular modern browser-based software solution used ubiquitously in many sciences that allows for written text and code to be packaged together in a format that other users can easily import, run, edit, and generally interact with.

Linux Kernel, kernel, the kernel - The low-level code that powers the Linux operating system. All Linux distributions are ultimately based on the same kernel, which is open source and located on GitHub at https://github.com/torvalds/linux. The term also refers to the space of a Linux operating system in the kernelspace-userspace model where the operating system runs its privileged system-level operations like interacting with the computer’s hardware, drivers, and administrative functions.

Machine learning accelerator - Any one of a few different specialized accelerator cards out there that are made specifically to accelerate machine learning calculations.

MPI, “message passing interface” - A standard that defines a number of different operations for intra- or inter-node communication of data between CPU cores. There are a number of different implementations of the MPI standard out there built by different organizations, but at their core, they are all the same set of operations.

Node - see “compute node”

Parallel computing, parallelism - Running calculations from a given computational job somehow over many resources at once, accelerating their completion by allowing them to run “in parallel.” HPC enables parallel computing by providing the resources that large-scale parallel computing needs in order to run efficiently.

Parallel filesystem - A computer filesystem designed to serve many read/write operations in parallel in support of computational jobs in HPC. A parallel filesystem is the storage component of the “HPC triangle.”

Pipeline - The primary processing component of a workflow, i.e., the main sequence of computational jobs some HPC-based task will perform as a part of execution.

RDMA, remote direct memory access - One of the most important technologies underlying HPC and that the HPC interconnect provides. RDMA allows for compute nodes to directly move data involved in a given computational job between each other without the involvement of the OS running on those nodes.

Research Computing - A department commonly found at many universities that manages their HPC clusters and resources. The department is responsible for the setup and management of both HPC infrastructure and HPC applications in the academic computing environment.

Resources - What a computer uses to accomplish computational jobs. This includes compute nodes, along with the CPUs, GPUs, FPGAs, other accelerators, RAM, etc.

Rocky Linux - An open source, community-driven, community-developed Enterprise Linux operating system based on RHEL and acting as one of the successors to the original CentOS Linux distribution.

Scaling out - Expanding a cluster’s capabilities “horizontally” by adding additional nodes like those already existing, thus creating additional resources without implying a change in the capabilities of existing resources. In HPC, this typically means adding additional compute nodes to a supercomputer, which adds new resources without upgrading the existing compute nodes already in the supercomputer.

Scaling up - Expanding a cluster’s capabilities “vertically” by increasing the capabilities of existing components, but without adding more functional units at the node level as with scaling out. In both cloud and on-prem HPC, this means doing something like upgrading the CPU, RAM, or accelerator card capacity of existing resources, though the methods to do this are once again physical versus virtual.

Supercomputer - see “HPC cluster”

Supercomputing - see “HPC”

Userspace - The user-facing code of the Linux operating system, where most user applications are run. The userspace sits on top of the kernel in the kernelspace-userspace model.

Video transcoding - The process of converting media from one format or codec to another. For example, taking the very large, multi-terabyte-size, high-resolution, master copy of a movie and converting it into formats/resolutions/codecs that can be played on a mobile phone, a smart TV, an aircraft in-flight entertainment system, etc.

Warewulf - An open source cluster provisioning system for HPC and enterprise computing. Warewulf utilizes container-based node images and iPXE to easily serve a specified operating system configuration out to potentially thousands of nodes at once.

Weather/climate modeling - Climactic modeling in order to make predictions about upcoming weather patterns is computationally intensive. At a high level, it involves taking some input data about the current weather conditions, segmenting the atmosphere in this data up into a large, three-dimensional grid of cubes some certain length on a side, and then running simulations of how weather conditions evolve over a certain time period.

Workload - A catch-all term for a related group of tasks you do on an HPC cluster, such as running a genomics mapping workload, a weather modeling workload, an AI-training workload, etc.

Summary

High Performance Computing comes with a large vocabulary describing it. With all the complex use cases and technologies it involves, it’s easy to get lost within the sometimes dizzying scales that HPC systems do work at. This guide introduces some of the basic use cases, terms, software paradigms, and technologies that make up the language and practice of High Performance Computing.

Learn more

If you’re interested in more information about how to integrate Apptainer with your HPC operations, keep an eye out for an upcoming article series by CIQ presenting a modern look at containers and HPC in the context of cutting-edge use cases.