Hands on Warewulf: Solving Cluster Provisioning & Management

March 31, 2022

Webinar Synopsis:

An Overview of WareWulf
Using Warewulf
The Beginning of Warewulf
Use Cases
Nodes
Clusters and Controllers
Presets and Tools in Warewulf
Warewulf 4
DEMO on Configuring Profile Set Up with Nodes
Node Configurations and IMPI
Warewulf moving to API’s
Kernel Parameters
Improving Warewulf

Speakers:

Zane Hamilton, Vice President of Sales Engineering, CIQ
- LinkedIn
Gregory Kurtzer, Founder of Rocky Linux, Singularity/Apptainer, Warewulf, CentOS, and CEO of CIQ
- LinkedIn
Michael L. Young, Linux Support Engineer, CIQ
- LinkedIn

Note: This transcript was created using speech recognition software. While it has been reviewed by human transcribers, it may contain errors.

Full Webinar Transcript:

Zane Hamilton:

Good morning. Good evening. Welcome back to another CIQ webcast. We appreciate you joining us. If you would,like and subscribe to make sure we can stay in touch with you. Today we're going to be talking about Warewulf. I know we talked about this a few weeks back, but we’re going to do a little bit of a deeper dive, and today I have Greg and Michael here with me.

Zane Hamilton:

Michael, you’ve been on with us before, right, Michael?

Michael L. Young:

No, this is my first time.

Zane Hamilton:

Introduce yourself.

Michael L. Young:

Okay. My name is Michael Young. I work at CIQ as a Linux support engineer. Since starting to work here, I was tasked with learning Warewulf. I got to know it pretty well and actually have been able to use it with several customers. I’m glad to be here and see what I can do to help offer more help on understanding what a great tool Warewulf is.

Zane Hamilton:

Greg, to start off, would you give a basic overview of what Warewulf is?

An Overview of Warewulf [01:17]

Gregory Kurtzer:

I would be happy to do that. It started in 2001, when I was tasked with running a bunch of Linux clusters at Berkeley Lab for the Department of Energy. I was tasked with running a few clusters for different groups within Berkeley Lab. I was already completely overwhelmed by the work, as everybody can imagine. When they proposed to me that I needed to be able to maintain clusters of hundreds of nodes, I wanted to figure out if there was a better way of managing those nodes. At that point, there were a few cluster toolkits that existed: Rocks Cluster (not related to Rocky) was one; OSCAR was another one that was highly utilized; Donald Becker was working on one called Scyld at the time. But I felt as though we needed to approach this differently, and I reallywanted to focus on the idea of stateless.

What Warewulf is is a way of administrating the operating system of lots of compute resources in an extraordinarily scalable and configurable way. That's what we've been able to create with Warewulf. The latest version leverages the container ecosystem. It makes it so you can take a container or build a container (the way you would normally do it) and import it into Warewulf. Then you can boot a bunch of nodes with this container and it will actually put that container on the physical bare metal that you want to provision out to. It makes it easy to maintain and very easy to provision large clusters – and also to provision different roles and purposes for your clusters via different container images. That is what Warewulf is.

Zane Hamilton:

Thank you for that overview. I think it's really appropriate that we have Michael here now because the next question I have now is: what makes Warewulf simple to deploy, use, and administer? And since you’re new to it – or you were new to it; you aren’t now – what made that simple?

Using Warewulf [4:00]

Michael L. Young:

I think what made it simple was being able to install an RPM. And then I actually went through the he Getting Started Guide on Warewulf’s website. From a system administrator viewpoint, it was pretty quick to grasp all the different services being set up for you. I didn't have to go and configure DHCP; I didn't have to set up TFTP; I didn't have to set up all those services separately. Warewulf did it for me. It was a simple config file, set my IP address and my head node, and run a quick wwctl configure all, and boom! Warewulf was up and running. Then, it was just learning how to set up my nodes so that they would configure. It's simple command line arguments that you're just passing into Warewulf. It’s doing a lot in the background to make it easy to have your cluster ready. And through iPXE, it boots right up. As soon as your node can get its configuration and get provisioned, your node is running, stateless. That made it very simple.

Zane Hamilton:

Greg, I know that the intent was to make Warewulf simple to deploy, use, and administer. Having someone new come in, such as Michael, go through this process and now become an expert, it’s impressive for a piece of software that is doing something as complicated as what it is doing. Do you have anything else to add to what Michael had to say? I know there was intent there, when you went down that path of making it simple. I’ll let you fill in that blank.

Gregory Kurtzer:

You can’t take what Michael says at face value because he is a brilliant engineer. He was able to pick this up easily because he is awesome. With that being said, we were trying to make this as simple as possible. But Michael’s just awesome. I’m glad tohave him on our team.

Zane Hamilton:

You have decades of experience. How long has Warewulf been around?

The Beginning of Warewulf [06:28]

Gregory Kurtzer:

It has been over 20 years. I founded it in 2001. There are some funny stories about the foundation of it as well and how much I completely blundered the first implementation of it. But it stabilized pretty quickly, once I fixed the major thing that was blundered. Everyone is probably curious now that I said that. 2001 was before PXE was standard on network interface cards, so we actually started off not on PXE. As a matter of fact, we started on bootable ISO images. You would use your Warewulf control server to create ISO images for each one of your nodes. You’d burn a CD-ROM (CDR) and you’d put that into your node CDR, and it knew how to boot from there.

That was the very first version of Warewulf. I went and presented it at Linux World. The feedback that I got was: everyone liked the administration model; they liked how it operated, the tools, but what the heck am I thinking regarding these CD-ROMs and isn’t there a better way of doing that? One person raised their hand and said, “You know, there’s a booth in the back near the expo hall for Etherboot. Go talk to them.” I said, “Oh, great feedback. Awesome.” Right after the talk, I answered a few questions, then went over to the back of the expo hall and met with the Etherboot team. Etherboot, if you’re not familiar, was the predecessor (at least in the open source world) to PXE. They created images that you can flash on the ROMs of your NICs or boot, like on a floppy drive – because, again, this was 2001 – on a floppy drive or a CD, a tiny image that basically loads what is called an Option ROM into the BIOs, which tells the BIOs that this device is actually bootable. It will now show up in your BIOs configuration. You can select your network card as a bootable device, like a SCSI card; select that as a bootable device. That Option ROM will then give the necessary capability to that network card to do a boot. It worked very similarly to how PXE ended up taking off and moving from there, but that was the predecessor to PXE. We then moved to Etherboot. For a while people had to flash their ROMs and their NICS or get NICs that had special ROMs on them for Etherboot, or just use a floppy drive. For awhile, in early Warewulf, you always had a floppy disc that you put in that matched your NIC. That was how you booted before PXE. As soon as PXE started taking off, that was a very easy transition and motivation. We basically moved everything from Etherboot over to PXE, and it's been PXE ever since. Well, it was PXE until we transitioned to iPXE.

Zane Hamilton:

Thank you for that. Do we have support for x86 and ARM?

Gregory Kurtzer:

Yes.

Zane Hamilton:

We should talk about use cases. What are some of the use cases for doing a Warewulf install? What are the different types?

Use Cases [10:07]

Gregory Kurtzer:

So Warewulf is originally designed for high performance computing clusters. The predominant reason for that is in an HPC cluster, you can have tens, hundreds, or even thousands of compute nodes that are all virtually identical. Every one of them is running the same OS, the same versions, or you hope and want them to be, otherwise you get version creep and you run into other problems.

Zane Hamilton:

That never happens!

Gregory Kurtzer:

That never happens! Not with Warewulf, actually – that never happens. You want everything to be as similar and homogenous as absolutely possible. Now, you could have groups of nodes that are different – you could have a hundred nodes over here that’s running one version of the OS, and a hundred nodes over here that’s running a different version of the OS, and a hundred nodes over there that might have different configurations, permissions, or access controls. Warewulf was designed for that. Warewulf was designed to be able to bring large amounts of resources together.

Now, in terms of use cases, HPC was the obvious one. It was very easy; it was what it was originally developed for. But one of the interesting things that happened, I think it was around 2010, I started to hear of people using Warewulf for different purposes. The first one that I heard was Guitar Center and Musician's Friend were using it to power their entire web infrastructure. For awhile, although I doubt they’re still using it, but this was a while ago. Musician’s Friend and Guitar Center reached out to me, and I said “Well, if you ever need any help, I could use a Gibson Les Paul.” It didn’t happen. But there were people using Warewulf, just as an example, for different kinds of use cases, but the majority of it ishigh performance computing. Michael will be able to talk more definitively on this, but we now have actually even considered and have a proof of concept for being able to spin up Kubernetes clusters and other types of clusters using Warewulf. The fact that we’re based on a container image means that anything you can put into a container that you want to run on bare metal, you can pretty much do.

Zane Hamilton:

Michael, I can toss it over to you now. Have you been seeing that as a use case you’re running into?

Michael L. Young:

Yeah, for one of our customers, we used Warewulf to provision control plane nodes. We’re deploying Kubernetes on those control plane nodes and then bringing up our compute nodes. Part of that has to do with another product we have at CIQ called Fuzzball Substrate, so that’s where we’ve been using that. We have a nice tool called IQ, which allows us to deploy from the Warewulf control node to our three control play nodes, then bring up our compute nodes. And they’re all running Fuzzball Substrate and immediately start talking with our cluster. Depending on how fast your machine is, in anything from 15 minutes to half an hour, you have a fully functional cluster, ready to go and run some workflows. So that’s one of the use cases we’ve been using. I’ve been using it a lot as well for testing. For testing our products, it makes it easy, especially because it’s stateless. One nice thing is I’m able to bring something up, and if it doesn’t work, I just reboot the machine and try it again. Then with the little bit of scripts that you have to clean things up, it’s made it much easier for testing and development purposes.

Nodes [14:20]

Gregory Kurtzer:

Michael brings up a really good point, and it’s something we didn’t talk about, which is the concept of stateless. When we provision nodes in a stateless way, we're basically provisioning them. So when you turn off the computer, the whole concept of what was installed, the operating system, everything goes away. It’s like it never was installed. There are several different ways that you can achieve stateless compute fabric. The way that Warewulf does by default is to actually run it out of system memory. For example, if you have a container image and you want this container image to run on these hundred compute nodes, it makes it very easy to manage, if you just turn on those compute nodes and they all get that exact same container image.

If you have one node that is doing something a little bonkers, a little weird, it is very easy to say, “Well, it’s not software, because the nodes above and below are running just fine.” So you’ve got some kind of weird hardware issue going on and you know you can start looking at the hardware to see what’s going on there . I can tell you, some of the initial goals of why we created Warewulf and what we were trying to do with it was really to simplify the management of these sorts of systems, and that helps a lot. If in your cluster, you’re managing a a large number of servers or workstations, if that’s what your administration model is, then if you have a thousand nodes in your clusters and each one is installed directly, you have to manage each one directly. There are tools to facilitate the configuration management, provisioning tools, Kickstart, and things like that, but if each one is physically installed, you cannot start over. Over time, you can end up with version drift, where you can have slightly different versions of different aspects of your cluster. It can be very difficult to find if you have a very big cluster.

Warewulf does well here because every time you reboot that computer, you can guarantee the image and its integrity as it's being provisioned, making it again very easy to install. Whether you're doing a traditional HPC stack, InfiniBand drivers, GPU drivers, or Lustre file system drivers in there, you can distribute all of that into one container when you provision that container out. It's gonna boot that whole stack. If you provision it out to another cluster, it'll boot that cluster with that exact same stack. If you put that up into Docker hub or any of the public registries, you can actually share that with other people.This means that for a traditional HPC stack, an open HPC stack, it makes it really easy to distribute that system profile to any clusters out there. Then Warewulf will handle the configuration management that you need to be running in order to manage differences between nodes, IP addressing, host names, and service configurations. Warewulf comes with a very small, simple configuration management system that is all templatized. So you can have one file for all nodes that boot. When it boots, it will get the template that you've configured for that node.

That could be things like IP addressing. If you have InfiniBand on some nodes, you can define that as well. In this regard, there's a lot you can do with Warewulf. Going back to the original question regarding use cases. You can do things such as traditional HPC; you can do an open HPC stack very easily. Michael brought up Fuzzball. This is our HPC 2.0 stack. You can use Warewulf to provision out that entire stack: the compute resource and the management resource as well as the control plane for that entire cluster. There is a lot you can do with it. It is also operating system neutral. We are closely associated with Rocky Linux, but you don’t have to run Rocky Linux with it. You can run any of the enterprise Linux derivatives as well as any of the Debian derivatives. You can run SUSE on it and through it. As a matter of fact, SUSE has been an amazing partner with Warewulf. They're contributing and helping us with the development of Warewulf, and SUSE runs fantastic with Warewulf. Another aspect to consider is just how portable, simple, and lightweight it is. It is a very easy to deal with solution.

Zane Hamilton:

Yeah. The partnership part is exciting. When you have multiple clusters or several different clusters, can you control all of that from one Warewulf controller? Or do I need a separate Warewulf controller for each type of cluster that is running?

Clusters and Controllers [19:42]

Gregory Kurtzer:

You can control it from all of one Warewulf control surface. As long as you're managing your broadcast domain. Michael mentioned that TFTP and DHCP are required services for PXE. Warewulf will configure that for you. But at some point you're scaling up your clusters so big that you have to start managing your broadcast domain. The last thing you want is storms, loops, or something like that to take out your network when you're dealing with large clusters. The traditional Beowulf model is very flat. Many people say, “we're just going to put that on one gigantic network segment and not manage the broadcast domain. You can do that for 1000-2500 nodes. At some point, you really need to start dealing with your broadcast domains.

Now there are multiple ways of dealing with your broadcast domains. The first is you create relays between your routed networks, then route between them, and do DHCP relaying. The other way is you have multiple Warewulf control surfaces. Perhaps do one on this logical network and then one on this logical subnet, and you now have two Warewulf servers. Now this also mitigates some of the scalability factors. Warewulf will scale thousands of nodes with one Warewulf server. For the sake of resiliency, you don’t want to put many thousands of nodes on one Warewulf server. You are going to need to spread the load out.

Now this is a very long way of answering your question of: can Warewulf do multiple clusters? With however you're managing your broadcast domains, no matter how you're managing your Warewulf infrastructure, Warewulf is designed to be able to have multiple node groups via profiles and different node configurations. You can absolutely separate this into network segments or subnetting, managing your broadcast domain between systems, and have Warewulf clusters run all of that. But another way of doing this is to have one big cluster that has different groups of nodes for different purposes or research initiatives, different projects, or PIs within your organization. You can then break apart your nodes into different groups. Based on these other factors, you can break them apart; it could also be in terms of vintage hardware.

Many organizations building HPC clusters go into it and say, “This year we've got $3 million. We are going to spend it on capital equipment purchases. We're gonna buy our base cluster.” The following year, they have another million bucks and now they're going to add to it. But the node profiles have now changed a little bit, and you might be able to still bulk them together. But at some point, if your organization keeps doing this every year, you are going to have older and newer nodes. They might still be compatible but maybe you have to put them on different InfiniBand fabrics, because InfiniBand has now changed so much. You have to start managing your network fabric. At some point your end-of-lifing. And you don’t want to have some big parallel NPI job between the newest nodes and the old ones because then you’re ou going to get mixed performance. You can only go as fast as the slowest node. Your newest nodes will sit there, not being completely utilized.

Long story short, you could create pseudo partitions within Warewulf based on configuration node groups, profiles, different kernel images, different OS images, and different roles. You could have file system IO based nodes. You could have your Lustre storage nodes and then your compute nodes over here. There are so many ways you can mix and dice. No matter your cluster infrastructure, Warewulf is flexible.

Zane Hamilton:

When you start working with a tool like Warewulf, are there preconfigured cluster images that we can use? Will I have to build them from scratch? Is there something I can install and start using it out of the box?

Presets and Tools in Warewulf [24:27]

Gregory Kurtzer:

There are some preconfigured images out there. We post some; we’ve got some up in Docker hub for note images. You will locate them under the organization Warewulf. You will find Rocky, CentOS, and OpenHPC images there. For the most part, you can use almost any image that is up in Docker hub, which is actually kind of crazy to think about.

Now there are two “gotcha” things you need to watch out for. The first is when people create the default container images that they put up into Docker hub, they don't have full boot capability associated with them. Systemd has masked out a bunch of services; they have coreutils that are designed for single user mode and not multi-user mode, so you have to make a few changes to them. But once you make those changes – and those changes, by the way, are public in Warewulf, so you can see how to make them –once you've made those changes, you will be able to boot almost any image that's in Docker hub today and actually provision that out to bare metal hardware and then run that in a stateless way. In terms of turnkey solutions, wee provide a few. If you are interested in Warewulf turnkey solutions, we can help you. We have ones that are pre-made, but we can also help tune and create them for customers. OpenHPC and HPC 2.0, our cloud native, cloud hybrid federated meta orchestration platform for performance critical workloads and data, we have that too.

Configuration Management System [26:19]

Zane Hamilton:

I find integration with other tools very interesting. I know we want Warewulf to do configuration management on its own, but will it also integrate with other configuration management tools, correct?

Gregory Kurtzer:

The configuration management system in Warewulf is designed around how do you provision configuration changes to a system before sbin/init is called? Sbin/init, in case you aren’t familiar, is the parent of all processes. You do a PS on your system, it is PID 1, which means: how do you make changes? How do you get configurations onto a node before sbin/init is called asPID 1? And that’s what Warewulf does. Warewulf will make changes. If you have system configuration changes, network configuration, IP addressing, all of that sort of stuff, it has to be there before systemd starts. And Warewulf will be in charge of that.

You can use it for other things in dynamic files, like your credentialed files, your /etc/passwd, and /etc/group files. You can use it for that to ensure that your users have accounts on the system, and Warewulf is good at that. But if you need anything that's more complex than base level configuration management, you should take a look at layering Ansible, Puppet, CFEngine, or Chef. Whatever you are most comfortable using, justput that into the node image and configure it in the node image. Then when all the nodes boot, they will not only boot with the Warewulf configuration that you’ve set through Warewulf to ensure that everything’s there when the systemd is running and is called, but then you can actually do further configuration management using whatever tools you prefer. So yes to both.

Zane Hamilton:

We have a question from Jonathan about Warewulf 4. What is the status of OpenHPC support (packaging, documentation, tutorials) for Warewulf 4?

Warewulf 4 [28:52]

Gregory Kurtzer:

OpenHPC currently supports Warewulf version 3. Warewulf 3 is a bit long in the tooth in terms of age and has not had any active development in quite some time. Everything has been now moved over to Warewulf 4, but to your point, OpenHPC has not adopted Warewulf 4 yet. We are currently working with the OpenHPC team. The OpenHPC team has been contributing fixes and patches into Warewulf 4, as it is being prepared to be brought into OpenHPC. It shouldn’t be much longer. It is in progress, but not quite there yet. I would anticipate it should get there in the next major release of OpenHPC. I’m sorry I don’t recall which release that’s going to be, but I have strong confidence that it’s going to be an OpenHPC in the near future.

DEMO on Configuring Profile Set Up with Nodes [30:10]

Zane Hamilton:

Michael, can you share the demonstration that you have prepared for us?

Michael L. Young:

So I wanted to show briefly how to initially configure a profile to set up some nodes. I am using three nodes up on a VPS provider. These are not bare metal machines. This provider supports iPXE booting. I have my three nodes here and they're trying to communicate, but I don't have these nodes set up yet. It keeps retrying every minute. We are going to watch these 3 nodes get their configuration through Warewulf and boot up.

The first thing I want to do is set some defaults in a profile.You can have multiple profiles. I'm going to use the default profile to set some things. I know that on my three nodes that I showed you, I have two interfaces on them. One of the interfaces is set up to be on a private network. That's where I'm going to manage over that private network from Warewulf these three nodes. So one of the first things I'm going to do is set my Ether-1 device network interface because that is on the private network. And I'm going to set a default net mask.

The reason I want to do this is because it will be applied to all my nodes. I can add a new node; that will be there. I do not have to set that now for that node. It's automatically going to be applied to all nodes that I add. That's one of the benefits of having profiles. You can set a profile that can be applied to multiple nodes at once without having to sit there and do that for every single node that I'm bringing up.

The other thing that I'm going to do is import a container. Notice over here that I have the ability to set a default container. Again, this helps simplify my need to, every time I add a node, having to set a container per node.

There will be some use cases where you will need to override that. And you can do that: you can override these settings per node, but if you know that the majority of the nodes are going to have the same container, you can just set that up here. I'm going to go ahead and import this image (which is what Greg mentioned earlier: we have some images ready to go on Docker hub under Warewulf name space). I'm going to bring in the Rocky image. I had already done this before. You will notice it went pretty quick. It was cached, but right now I'm bringing in that Rocky Linux container.

Now it's building my container. The next thing I want to do, as soon as it’s done building, is set this as my default container. I could have done that in one command line. I probably should have done that to show you, but there's a flag that you can add to make that your default, instead of having two separate steps. I'm going to go ahead and set this container as my default container on my profile. So now, if I list the values that I've just set, I now have my default container; I have my default net mask for that interface.

The next thing I'm going to do now is something that I know I need to do with the current Rocky image that was brought in. I'm going to enable the network so that when I boot up my nodes, it will automatically bring up the network interfaces. Now that I have the container set up and built here in Warewulf, I can go inside that container. It is as simple as doing a wwctl container exec. Then specify my container name. In this circumstance, my container name is rocky-8. Then do /bin/bash/ and now I’m inside my container. From here I will do a simple systemctl enable network.

It's going to create the SIM link necessary and I'm going to exit. Then it will rebuild my container. I don’t have time to show this today but another thing you can do is go into the container and you can bind in a directory and install packages. If you wanted to make sure Rocky Linux was up to date, you can update the container by “execing” into it and running a simple YUM or DNF update. Or you can also install packages. I use it for installing our Fuzzball Substrate packages into containers. Once the container has been rebuilt, all you have to do is reboot your node and it will pick up the latest container. So I have that network set.

The next thing I’m going to do is set our container kernel. We do a kernel list. We don't have any kernels set up. So I'm gonna just go ahead and do a kernel import. This command is going to import the kernel that's been installed on the Warewulf node. I'm going to run this and it's going to bring in that kernel. You can specify other kernels. I'm just gonna go with the one that's currently running on my Warewulf node. And you noticed I was able to set that as my default. So now when I look at my profile, I now have not only my container but also the kernel that's going to boot up on those nodes.That should be it, as far as this setup is concerned.

Now we want to set up our nodes. We want to see some nodes boot. This is the command we want to run. I already know that my network card, that that’s its MAC address, its hardware address. With the command ‘wwtcl node add,’ Again I need to refer to which node net device network device we're gonna use, I can set its hardware address, I can set its IP address, and I can name my node however I want to do it. In this case, I named it ‘control1.’ I'm going to go ahead and add that node. And I didn't want to miss us seeing it pick up that node. So I'm gonna start adding my other nodes here.

This is my second control node and then I'm going to do my other one here. I've added my three nodes. I can take a look here and here you can see all three nodes that I added. You can also see what is being set for them. For instance, you notice they're going to pick up Rocky 8 because I set that in my default profile. They're also going to boot up this kernel. Here is my hardware address. Here is the IP address. Here is the net mass that came from that default profile. Let's see if anything happens. Did I miss something, Greg? It didn’t pick it up. Let me try ‘wwctl overlay build.’ If that doesn’t work, I will restart. Of course, when you’re live…

Zane Hamilton:

Absolutely. Doesn't matter how many times you tested it before, it's going to fail.

Michael L. Young:

Yep.

Zane Hamilton:

Not fail. It's a hiccup.

Michael L. Young:

This is funny because I had it sitting here booting and I thought “this is going to be awesome to show everyone.” Give me some time to reload Warewulf just to make sure.

Zane Hamilton:

While Michael is reloading Warewulf, does anyone have some questions? Please put them in the chat.

Gregory Kurtzer:

Restarting the service. This was fixed in a later version to where you didn't have to restart the Warewulf service or reload the Warewulf service. Hopefully that will do it. See how it says it's an unknown unconfigured note.

Michael L. Young:

I'm just gonna speed this up instead of waiting the full minute. Now I can see it's grabbing its image. So that must be it. I'm using Warewulf 4.2 for the demo.

Gregory Kurtzer:

It has been a while since I’ve played with a production release because I am always messing with the development versions. That has actually been optimized to automatically reload the configurations.

Michael L. Young:

I recall seeing that get fixed. I was going to try to have a dangerous demo here and run the latest development branch. Then I said, “Uh, maybe not.”

Gregory Kurtzer:

There's a couple subtle differences in the new version. Obviously this is one of them. Another difference is the kernel is no longer a required attribute. The kernel is going to be taken from the container itself. The container that you imported actually has the kernel within it. This makes it super nice for us to be able to say, “Well, we're going to include in this container all of the InfiniBand, GPU support, or anything else that you need inside of that entire container.” It gives us the ability to package up into an OCI container the entire stack.

Michael L. Young:

That's the extent of what I wanted to show. Obviously, there is more that you can do. I wanted to demo how easily, once everything is configured – andit doesn’t take much time to configure the profile andthe nodes – once you have that defined, it's even faster. You just bring up nodes and you're adding their hardware address in a sense. That way it can provision it and then you're in. You can actually see that the network is up and running here. There is more to it, but since we are short on time, I don’t know if there are any other questions?

Node Configurations and IMPI [45:21]

Gregory Kurtzer:

That was an awesome demo. One other facet is Warewulf will actually auto-discover nodes as well. If you set up a node configuration and it does not have a MAC address configured, and you set the discoverable flag in the node config, Warewulf will automatically as an unknown node join and check in. It will automatically put that in the first available node slot. It will inject the MAC address into that node. Warewulf will also do IPMI. There's a whole group of power status commands, power cycle, power everything, and even serial over LAN (SOL) support inside of Warewulf, so you can integrate directly into IPMI too. Warewulf can even configure your IPMI – or actually I should say try to configure your IPMI, due to many vendors having slightly different versions of IPMI and how IPMI is supposed to be a standard, but everybody's adding different kind of capabilities and things, and it's not an extraordinarily stable standard. It does the best it can to configure IPMI for you. And most systems, it just works out of the box, but it will do the best it can.

Michael L. Young:

IPMI works pretty well for one of our customers. They're using it quite a bit. I was even able to set up virtual IPMI on a KVM machine and do some testing that way, bringing up and bringing down KVM machines or guests.

Zane Hamilton:

I think we talked about this last time. I want to make sure we talk about it again. Could you update us on the Warewulf community moving things to APIs?

Warewulf moving to API’s [47:25]

Gregory Kurtzer:

Right now, Warewulf is a non-client server architecture with regards to the configuration and management of it. We are changing that. The Warewulf server is going to end up being the entire control plane for the system. The CLI is going to interact with that Warewulf control server over an API. This gives us the ability to add additional features around that. For example, we talked a little bit about the HPC 2.0 stack. Fuzzball is going to be able to send provisioning commands directly to Warewulf without having to shell out. Everything will go over to APIs. This gives us the ability to also create nice graphical interfaces around Warewulf. That process and that work is already underway. We expect to see that around Q3. We should be bringing that out at least for early beta by about then. We should see a stable release by the end of Q3.

Zane Hamilton:

Can you remove the kernel parameter from all profiles?

Kernel Parameters [48:53]

Michael L. Young:

I think the question is in regards to the pry you saw when I did a node list. There were some default kernel arguments there. Yes, you can actually override that. You can set your kernel arguments that you want for that node. In fact, we have been using the kernel arguments there for deploying Kubernetes to cgroups v2 or on Rocky, we want to set that so that it enables cgroups v2. So we can override the kernel arguments. That's definitely configurable there.

Gregory Kurtzer:

I would also answer this in a different way as well. I don’t know if Nick was referring to the comment that I made regarding the kernel being inside of the container once you move to 4.3. If you're using a kernel inside of the container, you would be able to remove the kernel parameter. I believe how it's currently stated, it will override the kernel that's in the container if you don't remove it. The kernel argument is actually becoming a kernel override argument. If you specify your kernel or kernel override parameter, it will override what is inside of the container. But if you remove it, and if there's a kernel inside of your container, you can then remove it safely. Warewulf will boot the kernel that exists inside of that container.

Zane Hamilton:

I think that answered Nick’s question. So, last thing before we have to drop off, CIQ value adds: what are we doing to improve Warewulf to make it better?

Improving Warewulf [50:46]

Gregory Kurtzer:

One of the things that we're working on is FIPS compliance. In the future you will see a FIPS compliant version of Warewulf coming out, probably around Q3.This will be a great set of features to have especially as we move towards an API base. We can then put GUIs and other interfaces on top of that. That compliance is going to be able to guarantee and validate the crypto that we're leveraging, so you know you have a nice and secure system. We're also going to be creating pre-configured node containers, profiles, and turnkey solutions around Warewulf, traditional HPC, HPC 2.0, and Fuzzball.

We are going to be making those available and support services around that. That could be anything from helping with solutions architecture, helping with integration, or just support. If you have any sort of issues, you can give us a call. One of the cool things that we're doing with support is focusing on supporting people, not supporting cores, sockets, nodes, or entitlements. That support model is really cool. We become the escalation point for people and teams within organizations. We don't care how many nodes, how many systems, cloud instances, containers, or VMs that somebody has to maintain. That's their job. That's their responsibility. Our responsibility is to make sure that they're successful and they have the means to get questions answered whenever they need those questions answered. That's our support model that is offered not only for Warewulf, but also Singularity, Apptainer, and Rocky Linux. If you have any questions, if this is something you're interested in, please do reach out to us and contact us. Michael, anything I left out that you wanted to add?

Michael L. Young:

It’s open source, sowe're active and involved in that community. A lot of times there's a Slack channel. We're here at CIQ, keeping an eye on things, and we help answer questions. If anyone has an issue, we’re immediately looking into it ourselves and trying to add value in that way. We are helping everyone to be able to do what they need to get done.

Zane Hamilton:

Nick has one last question: will the APIs be versioned so it doesn’t break integrations across releases?

Gregory Kurtzer:

The APIs are built by good engineers, not me, so yes, it will definitely be versioned.

Zane Hamilton:

Excellent. Thank you for the question. Guys, I think we're actually at the end of the time. I want to thank Greg as always for being here. Michael, thank you for joining; thank you for the demo. Glad it worked out; it was fantastic. I really appreciate the time you put into it. Don’t forget to like and subscribe. We will see you again next week.

Gregory Kurtzer:

We are hiring; we are scaling the company up. If anybody is interested, please check out our Careers page and reach out to us. We'd love to talk with you.