High Performance Computing (HPC) is ultimately the leveraging of advanced computing resources for some specific purpose. A wide variety of tasks across many different fields are necessarily done via HPC due to the massive computational needs involved in them; this can include data science, chip design, vehicle safety testing, weather simulations, and a myriad of others. With all these different use cases, it follows that there are many technologies, applications, paradigms, best practices, and other aspects to HPC, and thus a significant amount of basic terminology about it. Let’s demystify some of the basics, and learn more about what exactly “supercomputing” is.
Accelerator card, accelerator – A catch-all term for devices that accelerate a specific type of computing. Some of these devices can work with a broad set of applications and workloads, while some are more specialized. For example, GPUs can be used to speed up a broad variety of HPC workloads like those in computational fluid dynamics, molecular dynamics, and AI training, while FPGAs can be used to speed up cryptography, digital signal processing, or even to implement whole custom systems-on-a-chip. There are also accelerators made for more specific applications than “general purpose” accelerators like GPUs and FPGAs; these more specialized devices can implement algorithms like those found in machine learning, cryptography, high-frequency trading, or other fields in silicon as an ASIC (application-specific integrated circuit) that usually also has other design considerations taken for that specific workload to make the device run it as fast as possible versus more general-purpose accelerators.
AI training/inference – AI training and inference usually have to leverage HPC-type resources in order to run efficiently, even if these operations are sometimes more associated with enterprise computing based on the context they’re being done in.
As mentioned above under the “GPU” definition, it frequently takes special accelerator cards like GPUs to effectively train AI, and the most complex models can require dozens, or even thousands, of GPUs to all be coordinated together at once on a single computational job. This is because AI training ultimately involves a massive network of neurons doing simple calculations all at once more or less independently, and these networks can be so large as to necessitate many GPUs being pooled together to give the memory space necessary to fit the entire model as it’s being trained and the compute power necessary to do all the calculations necessary in parallel.
AI inference can typically utilize smaller scales of GPU resources, as the finished model requires less data to be held in memory/computed at once to run in the end due to how the model no longer requires keeping track of a large number of variables that are all being tuned individually at once.
As the needs for computational power in AI training and inference increase, numerous specialized machine learning accelerators are being developed to move beyond the capabilities of GPUs.
Application – Software run on HPC infrastructure to do computational jobs, like an AI training framework or a CFD program. Usually highly sophisticated in many ways and produced by domain experts.
Apptainer – An open-source container solution that unites the world of containers with the world of HPC in unique ways that other containerization solutions haven’t traditionally addressed, allowing for containers to be used in HPC. A mature technology and the continuation of the open source side of the original Singularity project, Apptainer is currently a part of the Linux Foundation and is deployed at thousands of sites worldwide. Apptainer is built with a focus on security for the multi-tenant environments commonly found in HPC, has support for the native integration of hardware/software components commonly found in an HPC cluster like GPUs or MPI stacks with the container environment and the application within it, and uses the SIF format for its containers which has a easy to use definition file and container format that can easily be moved from system to system.
Beowulf cluster, Beowulf-style cluster, Beowulf architecture – A cluster comprised of multiple consumer-grade computers, like common rack-mount servers, tower desktops, or even at some points in history, gaming console systems, all networked together over an interconnect and capable of pooling their computational power together on one task. Before the mid-1990s, it was common for parallel computing and HPC to require an entirely specialized computer architecture designed for that dedicated purpose, but with the advent of faster networks and beginnings of the MPI standard around that time, the development of specialized architectures fell out of favor versus just wiring together large numbers of commodity-grade computers. Most all modern supercomputers now follow the Beowulf architecture—they are comprised of multiple standard server-type computers all networked together over an extremely fast interconnect, with some kind of fast storage attached as well. So, scaling out is as easy as buying more of the same type of consumer-grade compute node servers as you already have, while scaling up would look like upgrading the CPUs or RAM, or maybe adding some GPUs, to the servers you already have.
Batch jobs, batch processing – Computational jobs that a user submits and leaves to run without the job expecting additional input from them. Most traditional HPC use cases would be called batch processing jobs—e.g., submitting a genomic sequencing job and waiting for the gene identification results, submitting a CFD simulation and waiting for aerodynamics simulation results, submitting an EDA job and waiting for circuit analysis results, etc.
Cluster – A set of computers linked to each other in such a way that their resources can be collectively pooled together on computational jobs. “Cluster” is not a term exclusive to HPC (there are, for example, Kubernetes clusters that are multiple computers configured to run a Kubernetes instance together), but in the case of HPC, the word typically refers to a Beowulf-style cluster for specific industrial computing applications. Clusters in any form are commonly just called a “cluster” without modifier.
Computational fluid dynamics, CFD – The simulation of air and fluid flows. For example, simulating the flow of water through a hydroelectric turbine or the flow of air over a car or jet in motion. A simulation here may involve inputs like a physical model of the object being simulated along with specifications of different initial physical conditions, which are then iterated over time according to certain CFD equations to produce a completed simulation that can usually be replayed visually through some kind of graphical interface. In HPC, CFD is typically an MPI-based workload, as the simulations involved involve the same space of calculations—i.e., the workload revolves around computing how all the physical forces in the simulation interact with each other at once.
Computational job – A single, discrete task that is done on some set of HPC resources. This could be a data processing step of a, for example, large genomics workflow, or the calculation step for the forces in aerodynamics (CFD), or a run of AI training. Any pre- and post-processing steps like setting up some directories for a simulation to use or running a Python script to aggregate some results in a standard data format would also be considered computational jobs, though based on the organization of the HPC workflow at hand, these may all be distinctly separate computational jobs run at different points rather than one job run all at once. A computational job need not necessarily be run with some kind of parallel computing, especially in the pre- or post- processing job case, but in many cases, a computational job is also a parallel computing task. In this case, the parallelism can be MPI-based, or embarrassingly parallel-based, each with a different purpose and suited to different tasks being accomplished by the underlying application.
Compute node, node – A single server (on-prem) or instance (in the cloud) in an HPC cluster acting as a provider of computing resources for computational jobs to run on. More generally, a “node” is the same, a single server (on-prem) or instance (in the cloud), but could be part of an HPC cluster, a Kubernetes cluster, etc.,—i.e., “node” is not just limited to an HPC cluster like “compute node” typically is.
Container – A self-contained package that wraps software and its dependencies together in a way that is portable and standardized. Can quite literally be viewed as a standardized vehicle for software in the same way a physical metal intermodal shipping container is a standardized vehicle for cargo. More technically, a container (in Linux) is a packaged userspace that you can deploy and use on top of the Linux kernel your Linux computer is already running through container runtimes like Apptainer or Docker. This “hot-swapping” of the user environment allows you to run applications requiring a specific dependency configuration through the container/container runtime without having to install the application or its dependencies on the compute host itself.
Data science – A catch-all term for the leveraging of dataset analysis for some specific goal. As “big data” has become a concept here in the past decade, practices for meaningfully exploring very large datasets computationally have been developed. While there are many parts of data science that fall outside of HPC, many other parts rely on the massive computing power of HPC to do all kinds of different data analytics tasks. Can be both MPI or embarrassingly parallel depending upon the workload within data science being examined.
Electronic design automation, EDA – A common HPC use case involving the simulation of computer chip designs, circuit boards, or similar items. For example, simulating how the design of a die for a given CPU chip will actually work when powered up, or doing the same for an entire circuit board of components like a server motherboard design. Usually an MPI-based workload, as the simulations involved involve the same space of calculations—i.e., the workload revolves around computing how all the electromagnetic forces in a given simulation all interact with each other.
Embarrassingly parallel – Performing the same type of calculation as part of a computational job over a broad set of data in parallel on some HPC resources. One of the two ways that parallel computing is implemented in HPC, having an embarrassingly parallel computational job means that you have, for example, thousands of input files containing some standard formatted (but individually different) data. If you need to do the same calculation on each of those input files, and the result of any given input file’s calculations doesn’t depend on any other input file’s calculations, then you can spread the completion of each file’s calculations out over dozens, hundreds, thousands, or in the largest cases, millions of CPU cores at once in order to gain a performance increase by having multiple files being processed at once instead of just potentially a single one in a non-parallelized configuration.
Embarrassingly parallel workloads are ubiquitous across many fields, and while MPI is a massive use case that is used very widely across many fields for doing parallel computing, it can often be found that the embarrassingly parallel paradigm is used more generally for solving a wider set of problems than MPI is.
Finite element analysis, FEM – A common HPC workload involving the solving of certain types of advanced equations across a wide variety of fields. FEM problems are found in simulations of structural loads on buildings, the forces on a vehicle in a crash safety test, the movement of biological structures like joints, and many other problems. Most often an MPI-based workload, as the overall goal, regardless of the underlying application technicalities, is to simulate certain types of forces across a given calculation space all relative to and interacting with each other.
FPGA, field-programmable gate array – A specialized computer component that is essentially a chip full of low-level computer logic gates that can be reprogrammed to implement algorithms for specific workloads. FPGAs can also include other components with their logic gate array like dedicated on-chip memory or small built-in accelerators for certain types of low-level computations. FPGAs give a way to create an arbitrary reprogrammable chip for some specialized purpose—like video transcoding, signal processing, cryptography, or even the implementation of whole system-on-a-chip architectures. These components are seeing increased use in HPC as of late and are becoming more popular for a wide variety of workloads in a number of fields.
Fuzzball – Fuzzball is the next generation of high performance computing. An upcoming product from CIQ, Fuzzball leverages the capabilities of enterprise, cloud, and hyperscale computing within the context of HPC, allowing for an entirely new way of doing HPC to emerge beyond the Beowulf model—we at CIQ call it “HPC 2.0.”
Fuzzball workflow – A YAML document defining a specific set of jobs, volumes, and data movements that comprise some workload in such a way that a Fuzzball cluster can parse the workflow document and set up the job environment for execution on a set of compute resources.
Genomic sequencing – When genes are sequenced from some biological sample via laboratory equipment, the initially-post-processed output is files with sequences of base pairs. A base pair is one of the chemicals adenine (A), thymine (T), guanine (G), or cytosine (C), and they are the basic building blocks of DNA—sequences of them represent genes. The base pair sequences found in the sample by the laboratory equipment are from an unknown position in the organism the sample is from’s overall genome. To figure out where in the genome these sequences line up to, and thus what genes they are, a search of the organism’s genome is done against the sequences in the files generated from the laboratory equipment. This is a computationally intensive task as it is, in basic form, taking potentially thousands of strings of text like AGTCACTGAGT and matching it against the text of a genome (which is also a long sequence of base pairs, ex. 3.3 billion base pairs long in the human genome’s case). This is further complicated by heuristics and advanced searching techniques that have to be used to find genes optimally and efficiently. Genomic sequencing is frequently an embarrassingly parallel workload as it requires many gene samples to have the same analysis done over them, for the most part independent of any other sample.
GPU, graphical processing unit, graphics card – A specialized computer component normally used for the processing of graphics, but widely applicable to a large number of general-purpose computing use cases. GPUs have a large number of small cores (potentially numbering in the thousands) that are good for both doing certain types of calculations related to graphics as well as for doing the calculations that underlie AI training and inference, as well as a wide array of other computing tasks. The GPU has been one of the most ubiquitously used accelerator cards in HPC for some time, especially as AI has become a more and more prominent use case.
GPUs are good for AI training/inference over CPUs because those processes rely on a very large number of simple calculations being able to be done in parallel. There are two points here: first, CPUs typically have 8, 16, 32, 64, or up to 128 or more cores in high-end modern server processors versus potentially thousands in a GPU, and second, these cores are highly sophisticated and include a great deal of attention in their design to doing pipelining and other sequential-type operations—much of this complexity is wasted in the case of AI training/inference, where as said, the need is for a very large number of simple calculations to be done at once. The GPU, with thousands of simple cores that can all run in parallel on these calculations, can perform them in the end much faster and more efficiently than the CPU can with its much lower core count and cores much better suited to more advanced operations.
HPC, high performance computing, supercomputing – Using some form of dedicated computing resources to accelerate a computational workload. There’s not generally a “one size fits all” definition of HPC, as the scales it can happen at are very broad; the term can encompass a graduate student using a single high-powered cloud instance to accelerate their AI research, small or mid-sized HPC clusters serving researchers and engineers at university campuses or private businesses, or even much larger scale clusters like the world’s largest found on the Top500—the intention is mostly that some dedicated resources are being used to accelerate some specific computational workload beyond that accessible with the resources on a typical personal system.
HPC cluster, supercomputer, also just “cluster” – In the modern day, a Beowulf-style cluster designed specifically to run computational jobs. Could be built as a general-purpose cluster designed to support a wide variety of HPC workloads, but could also be built with more specific components for more specific/target workloads.
HPC interconnect, interconnect, high-speed interconnect – A high-speed network capable of passing data around an HPC system at the speeds necessary to facilitate effective use of things like MPI and accelerator cards around an HPC cluster. While consumer network speeds are typically somewhere around 1 gigabit/second, HPC network speeds in an HPC cluster typically start at 100 gigabits/second and increase from there. HPC interconnects can run at speeds of 100, 200, 400, and very soon, up to 800 gigabits/second, and can provide exotic connections like direct GPU-storage communication in some configurations. There are a number of different interconnect products out there providing these speeds from several vendors. The interconnect is one of the three components of the HPC triangle.
HPC triangle – Generally, an HPC cluster requires three basic resources: fast compute, fast storage, and fast networking. The fast compute is provided by CPUs, GPUs, FPGAs, and other accelerator cards; this is the part of the cluster doing the actual calculations involved in a given computational job. The fast storage is provided by parallel filesystems, storage tiering, and fast underlying storage hardware like SSDs; this is the part of the cluster providing a fast filesystem for a computational job to read/write data to/from. The fast networking is provided by an HPC interconnect; this is the part of the cluster that networks the fast compute all together so the calculation data involved in a given computational job can be effectively communicated across the cluster as necessary, and frequently also connects the fast storage to the fast compute. These three components–compute, storage, and networking–can each bottleneck the others very easily in an HPC cluster, making the design and planning of a supercomputer deployment a complex task. Together, they form an “HPC triangle,” analogous to the “cheap, fast, good” triangle.
HPC workload, workload – A high-level term for what’s done on an HPC cluster. As in, “We’re using our cluster primarily to run computational fluid dynamics workloads, but we have a few other research groups that are running some genomics workloads on it as well.”
The distinction between a “workload” and a “workflow” is that “workload” is more generically used as a catch-all for the different tasks a given field might be using HPC resources for, while a “workflow” is usually a specific set of codified computational jobs accomplishing one of those specific tasks within a given field. So, as with the above example, if “we’re using our cluster primarily to run computational fluid dynamics workloads,” then one may expect to see a wide variety of different computational fluid dynamics workflows running in that environment.
HTC, high throughput computing, grid computing, similar to “distributed computing” – The use of large-spread networks of resources to do computational jobs that don’t require massively parallel resources spanning multiple compute nodes or even multiple cores, as frequently the case with computational jobs in HPC. For example, one of the most well-known HTC grids is the Open Science Grid, which leverages resources worldwide to provide a general-purpose grid computing network used by a variety of institutions. Much of the OSG’s computing capacity powers the data analysis of the Large Hadron Collider’s data. Another well-known HTC project was [email protected], which leverages spare CPU cycles from computers all over the world to do protein folding and has over an exaflop of distributed computing capacity.
Infrastructure, HPC infrastructure – The implementation in hardware (and sometimes software) of the HPC triangle that allows HPC applications to run in parallel over a given set of HPC resources.
Interactive jobs – Computational jobs that a user can interact with directly via some kind of interface. For example, a Jupyter notebook instance a user is developing code that’s running on a compute node in an HPC cluster so the user can test their code with the resources available on the compute node. Contrasted with batch jobs, or batch processing, as one of the two primary types of jobs a user may want to run on an HPC cluster. Interactive jobs can sometimes create unique resource needs for an HPC cluster, such as the need to have a dedicated GPU-enabled compute node for visualization purposes.
Jupyter notebook – A wildly popular modern browser-based software solution used ubiquitously in many sciences that allows for written text and code to be packaged together in a format that other users can easily import, run, edit, and generally interact with. Jupyter Notebook installs are commonly built-in with containers so that the dependencies for the code inside the notebook can be easily brought with the install, and all a user has to do is start up their Jupyter Notebook server on a compute host with the appropriate resources.
For example, a user may have a container with Python, TensorFlow, and Jupyter Notebook installed in it. They run a Jupyter Notebook server from this container on a local computer they have that has a GPU on it. They can then access this server from their web browser with an address referring to their own machine like localhost:8080, which will present the Jupyter interface in the browser. The user can use the interface to create a notebook, write some TensorFlow code to train an AI model along with the instructions on how to use it in the notebook, run and test the code directly from the browser notebook interface, and then distribute the notebook and container definition to allow others to replicate the results in the same manner on their own resources.
Linux Kernel, kernel, the kernel – Firstly, the low-level code that powers the Linux operating system. All Linux distributions are ultimately based on the same kernel, which is open source and located on GitHub at https://github.com/torvalds/linux. The kernel is developed by a vast number of community contributors both past and current, with discussion around this development primarily done on the long-running Linux kernel mailing list.
Secondly, the space of a Linux operation system in the kernelspace-userspace model where the operating system runs its privileged system-level operations like interacting with the computer’s hardware, drivers, and administrative functions. Accessible by default to the root level account, with sudo being used to provide configurable access to other users.
Containers allow for swapping out the userspace of a given running host without needing to change the running kernel.
Machine learning accelerator – Any one of a few different specialized accelerator cards out there that are made specifically to accelerate machine learning calculations. Google has one (“Coral”), Intel/Habana Labs has one (“Gaudi”), and there are others like the Cerebras Labs WSE-2 and Graphcore Intelligence Processing Unit. These accelerators generally iterate on the design paradigms of the GPUs, but hone in on improvements in an environment where AI is the only focus.
MPI, “message passing interface” – A standard that defines a number of different operations for intra- or inter-node communication of data between CPU cores. One of the two primary paradigms of how parallel computing is done in HPC, MPI-based communications allow for a single program to spread itself over potentially dozens, hundreds, thousands, or in the largest cases, millions of CPU cores at once in order to gain a performance increase by having many cores working together on the same overall set of calculations at once. There are a number of different implementations of the MPI standard out there built by different organizations, but at their core, they are all the same set of operations. These operations are things like data transfers between two nodes, data transfers from one node to every other node, data transfers from every node to every other node, etc., to allow for flexibility in supporting the needs of various applications.
There are also exotic forms of MPI (or sometimes, more so “MPI”) that provide similar operations for communications directly between GPUs and other devices.
Node – see “compute node”
Parallel computing, parallelism – Running calculations from a given computational job somehow over many resources at once, accelerating their completion by allowing them to run “in parallel.” HPC enables parallel computing by providing the resources that large-scale parallel computing needs in order to run efficiently. There are two primary paradigms for how parallel computing is done in HPC: embarrassingly parallel and MPI. While these both accomplish parallel computing, they are suited for vastly different workloads and have often very different toolsets and methods involved in their individual implementation for accelerating a given workload.
To what extent the efforts to parallelize a given application will speed up that application is dependent upon how much of the application’s overall runtime is spent on the parallelized portions.
Parallel filesystem – A computer filesystem designed to serve many read/write operations in parallel in support of computational jobs in HPC. One of the three components of the “HPC triangle” as the storage component.
Pipeline – The primary processing component of a workflow, i.e., the main sequence of computational jobs some HPC-based task will perform as a part of execution. Generally refers to execution outside of a workflow engine’s control–so, for example, if you have a bunch of scripts you’re running that are coordinated by some master script, you have and are running a pipeline; but if all those scripts, their execution order, necessary resources, container image, etc., are codified with some kind of workflow engine, maybe with a YAML-based DSL, and the execution of this codification is then done by the workflow engine, then you have and are running a workflow.
RDMA, remote direct memory access – One of the most important technologies underlying HPC and that the HPC interconnect provides. RDMA allows for compute nodes to directly move data involved in a given computational job between each other without the involvement of the OS running on those nodes. So, in essence, a portion of RAM is set aside by the NICs of each machine so that certain RDMA operations can be done between them without the operating system having to be involved. HPC interconnects, in addition to speed, provide the RDMA capabilities that allow data to pass over RDMA through the interconnect, thus allowing, for example, a massive MPI-based computational job to pass data between its nodes without having to run a very large amount of OS-level networking code, thus massively speeding up the application.
It can often be found that even in the presence of a high-speed interconnect, RDMA is the necessary component to enable HPC levels of speed, as the operating system of each compute node trying to process so many network requests will inevitably bog down long before the full network speed capabilities are realized without RDMA.
Research Computing – A department commonly found at many universities that manages the HPC clusters and resources the campus has. Responsible for the setup and management of both HPC infrastructure and HPC applications in the academic computing environment.
Resources – What a computer uses to accomplish computational jobs. The compute nodes, along with the CPUs, GPUs, FPGAs, other accelerators, RAM, etc., that are in the compute nodes, are all generally referred to as “resources” in different contexts.
Rocky Linux – An open-source, community-driven, community-developed Enterprise Linux operating system based on RHEL and acting as one of the successors to the original CentOS Linux distribution. 100% bug-for-bug compatible with RHEL, Rocky Linux is made with a focus on providing a robust, high-quality, community-supported Enterprise Linux operating system, especially for HPC users.
Scaling out – Expanding a cluster’s capabilities “horizontally” by adding additional nodes like those already existing, thus creating additional resources without implying a change in the capabilities of existing resources. In HPC, this typically means adding additional compute nodes to a supercomputer, which adds new resources without upgrading the existing compute nodes already in the supercomputer. In on-prem HPC, this would be racking and connecting a new server in a server rack with existing compute node servers, while in cloud HPC, this would mean spinning up a new instance and networking it with existing compute node instances. It follows that it’s generally easier to scale out in the cloud than it is on-prem.
Scaling up – Expanding a cluster’s capabilities “vertically” by increasing the capabilities of existing components, but without adding more functional units at the node level as with scaling out. In both cloud and on-prem HPC, this means doing something like upgrading the CPU, RAM, or accelerator card capacity of existing resources, though the methods to do this are once again physical versus virtual. In general, cloud instances are more individually static than on-prem resources; while you may swap the CPU of an arbitrary on-prem compute node for a more powerful one or add more accelerator cards to it, these same operations in the cloud typically have to be accomplished by spinning up a larger instance template and transferring the existing workload onto it.
Supercomputer – see “HPC cluster”
Supercomputing – see “HPC”
Userspace – The user-facing code of the Linux operating system, where most user applications are run. Sits on top of the kernel in the kernelspace-userspace model. Containers are, in a high-level way, swappable userspaces that utilize the underlying host’s kernel as the backbone for privileged, system-level, kernelspace execution needs, while the applications and operating system environment presented to the user within the unprivileged userspace can be changed via the container.
Video transcoding – The process of converting media from one format or codec to another. For example, taking the very large, multi-terabyte-size, high-resolution, master copy of a movie and converting it into formats/resolutions/codecs that can be played on a mobile phone, a smart TV, an aircraft in-flight entertainment system, etc. Frequently, in the 2020s, an HPC or HPC-adjacent workload performed on FPGAs or other accelerator cards made specifically for media transcoding.
Warewulf – An open-source cluster provisioning system for HPC and enterprise computing. Warewulf utilizes container-based node images and iPXE to easily serve a specified operating system configuration out to potentially thousands of nodes at once. Warewulf also allows for configuration overlays to be applied to a given node, so it’s possible, for example, to have a base compute node image that you have Warewulf serve for CPU nodes, and then an overlay with GPU settings that gets applied to that base image on your GPU nodes and automatically brings them up with the correct configuration. Warewulf has been used by a wide variety of sites for over 20 years and is continuously improving with a robust development community behind it.
Weather/climate modeling – Climactic modeling in order to make predictions about upcoming weather patterns is computationally intensive. At a high level, it involves taking some input data about the current weather conditions, segmenting the atmosphere in this data up into a large, three-dimensional grid of cubes some certain length on a side, and then running simulations of how weather conditions evolve over a certain time period. The data required here often has to be very up to date, necessitating some data transfer setup to bring the latest weather data in regularly. These simulations are extremely complex, and typically take into account factors like air pressure, humidity, sunlight intensity, time of year, and even more granular details such as expected heat rise from cities. Weather modeling is done both for scientific research as well as for practical purposes; for example, learning more about the Earth’s climate on a global scale relative to industrial growth, or a public power utility running weather models to determine what level of power will be available from renewable energy sources within the coming few days.
Workload – Catch-all term for a related group of tasks you do on an HPC cluster–e.g., you are running a genomics mapping workload, a weather modeling workload, an AI-training workload, etc.
High Performance Computing comes with a large vocabulary describing it. With all the complex use cases and technologies it involves, it’s easy to get lost within the sometimes dizzying scales that HPC systems do work at. This guide introduces some of the basic use cases, terms, software paradigms, and technologies that make up the language and practice of High Performance Computing.
If you’re interested in more information about how to integrate Apptainer with your HPC operations, keep an eye out for an upcoming article series by CIQ presenting a modern look at containers and HPC in the context of cutting-edge use cases.
Forrest Burt is an HPC systems engineer at CIQ, where he works in-depth with containerized HPC and the Fuzzball platform. He was previously an HPC system administrator while a student at Boise State University, supporting campus and national lab researchers on the R2 and Borah clusters while obtaining a B.S. in computer science.