Setting up a cluster with workload distribution

Question

I want to setup a server cluster which can keep by servers as busy as possible while still giving fair compute time to everyone. I have setup a basic Kubernetes setup but the issue is that if some user releases a pod which can parallelize upto say 256 cores, but the machines have maximum of 96 cores in them, then the workload won't split onto different machines. Rather the workload will slow down on 96 cores itself. I want soomething which can split the pods onto different machines so that all the cores in the cluster are kept busy.

score 0 · Answer 1 · answered Feb 14 '24 at 15:09

There's an option to use topologySpreadConstraints wherein you can spread the pods across the cluster among the nodes. This depends as well with the number of replica on your workload/deployments.

I think you need to let the scheduler distribute the loads on the nodes which is the purpose of kubernetes deployment.

score 0 · Answer 2 · answered Feb 17 '24 at 17:56

I guess you may either use some old workstation or (better) pinch out one machine out of your current Kubernetes cluster and convert it into one machine SLURM cluster just to get things up and running. The fairly typical route is to use RPM-based distro (nowadays Rocky Linux or Alma Linux) but Debian should work fine as well. Once sbatch is working start setting up few power user accounts, mount the NFS(?) drives install the most commonly used user software (check spack, apptainer and conda). Give yourself some time to iron out the "bare bones" configuration issues before moving to more advanced issues like cgroups or proper slurmdb accounting.

Even with 20mins jobs make sure that these are optimized whenever feasible. At times simply installing pigz, zstd, pypy3, ripgrep, duckdb can have immediate effect.

score 0 · Answer 3 · answered Feb 22 '24 at 09:55

I want to setup a server cluster which can keep by servers as busy as possible while still giving fair compute time to everyone

That's what the Kubernetes scheduler does.

I have setup a basic Kubernetes setup but the issue is that if some user releases a pod which can parallelize upto say 256 cores, but the machines have maximum of 96 cores in them, then the workload won't split onto different machines

A pod is a atomic unit of work and cannot be split.

I want soomething which can split the pods onto different machines so that all the cores in the cluster are kept busy.

That is not possible. A pod will be scheduled on a single server and cannot be "split".

I think the word you are looking for is "spread" here. You want to spread yor workload among many servers.

You and the users likely share responsibilities and thus have to work together to correctly configure the application and the platform to work smoothly.

The application(s) running in a pod need to be correctly configure to use only a certain amount of compute. You can either configure the application to use only a certain number of threads or track CPU time.

On top of that, you MUST configure the pod with resource requests and limits. That way the Kubernetes schedule has more information to know where the pod will fit (or not). The Linux kernel (through cgroups) will enforce the CPU limits and throttle the process.

My suggestion is to work with developers to correctly configure their deployments that have pods with resources that will fit your hardware platform.

If you want to prevent them from ever deploying a workload that will create problems or malfunction in terms of resource allocation, you can deploy something like Kyverno, OPA or even improve your CI/CD pipelines to catch these errors.

As a research point for you, if you really want to have a platform that will grab workloads and load balance them among many servers seamlessly, look for HPC solutions. Your developers will need to be heavily involved though to use the proper libraries for this.

Setting up a cluster with workload distribution

3 Answers3