How to make kubernetes easy to use for thousands users

Not being an expert in research and academic publications, I was genuinely excited about the opportunity to contribute to the scientific discourse and share the work we have been doing at CERN. Participating in CHEP 2024 with a paper titled “How to Make Kubernetes Easy to Use for Thousands of Users” felt like an important milestone: transforming daily operational experience into something structured, reviewed, and shared with the wider community.

In the paper, we describe the challenge we face at CERN: Kubernetes is the state-of-the-art platform for deploying applications, but it comes with significant operational complexity and a steep learning curve. At the same time, CERN hosts thousands of active developers who need to deploy applications reliably, without necessarily becoming Kubernetes or DevOps experts. Our goal was therefore not simply to run clusters, but to design an infrastructure that abstracts complexity while preserving flexibility.

The solution we implemented is based on OKD, the community distribution of Kubernetes optimized for multitenancy. Rather than operating a single generic cluster, we designed a multi-cluster architecture tailored to distinct use cases. We identified four main categories: one-click provisioning of popular applications such as Grafana, Discourse, and Nexus; CMS-based website hosting with WordPress and Drupal; simple web hosting for static content or lightweight PHP/Python scripts; and generic application hosting for both monolithic and microservices-based workloads. Each category is backed by clusters configured and optimized for its specific purpose, allowing us to balance isolation, scalability, and operational efficiency.

A key enabler of this model is the use of the Operator SDK. By codifying operational knowledge into Kubernetes Operators, we allow clusters to autonomously handle configuration updates, scaling, and failure recovery. This significantly reduces manual intervention and ensures that each “specific purpose cluster” remains aligned with its intended workload profile. Operators power the one-click provisioning experience: users can request a service through a web interface and receive a fully configured, production-ready deployment without interacting directly with Kubernetes primitives.

For developers who want to deploy their own code, we rely on OKD’s Source-to-Image (S2I) workflow. S2I automatically builds a container image from a Git repository by combining the source code with a predefined builder image containing the required runtime and dependencies. This removes the need to write Dockerfiles for common scenarios, accelerating adoption and lowering the barrier to entry. If a Dockerfile is present, it is used instead; alternatively, users can deploy pre-built images from external registries or even use Helm charts when full control is required. Automatic redeployment on new commits further streamlines the development lifecycle.

To unify this heterogeneous infrastructure, we developed an in-house portal using Python and React. This portal acts as a single entry point across all clusters, giving users the perception of interacting with one cohesive system. It centralizes application management, simplifies deployment, and exposes the features of each specialized cluster without revealing the underlying complexity. From the user’s perspective, Kubernetes disappears; from an infrastructure perspective, it remains fully available when needed.

Writing this paper allowed me to step back from daily operations and reflect on the architectural principles behind our platform: abstraction without loss of power, automation through codified operational knowledge, and user-centric design at scale. Contributing these ideas to the broader community was both a learning experience and a privilege.

Computing on High Energy Physics (CHEP) 2024 — Poster