Platform Engineering Case Study

Leveraging Kubernetes to enable Self-Service, Observability & Enterprise Readiness at Scale

By focusing the platform team on features and polish instead of firefighting, we enabled application teams to reap the benefits of Kubernetes without needing to know it in detail

Client: Private Sector - Business Services

About the Client (Private Sector - Business Services)

Our client, a large business services provider with over $1 billion in revenue, was tasked with developing a platform to modernize their approach to building, deploying, and scaling software using Kubernetes as the platform foundation. They had capable engineers diving into Kubernetes for the first time who achieved initial success, but quickly realized the challenges of scaling from an initial deployment to supporting a global enterprise. We needed to address the architectural considerations for scale while training the platform team on best practices for operating and maintaining a Kubernetes-based platform.

Outcomes

  • Platform team can automatically deploy and bootstrap new clusters within minutes
  • Development teams can deploy, test, and promote multiple builds per hour with minimal friction
  • Significantly accelerated development lifecycle by enabling developer self-service and automating promotion between environments
  • Improved platform observability with distributed monitoring across all clusters
Ultimately, we freed up the platform team to focus on features and polish instead of firefighting, while enabling development teams to reap the benefits of Kubernetes without needing to know Kubernetes in detail. We established a foundational example that could be implemented across the client’s other product lines.

Problem | Before working with us

The setup was typical for what we see in organizations beginning their Kubernetes journey. The client had made a first pass at automation, but many manual and error-prone processes remained in place.

The challenge was clear: deploy roughly 25 applications across 10 clusters without it turning into an unorganized mess. The scale they needed to achieve meant any process treating infrastructure as pets instead of cattle would fall short. Furthermore, this was only for a single product line, and they needed a foundational example for other product lines to follow.

The client also had a rocky relationship with some vendors. Promises around software fell short, costs were rising, and support was lacking. They had purchased software and experimental open-source software that was either antiquated or sitting unused, collecting dust on the shelf.

Beyond technology, there were team challenges. The team lead had prior Kubernetes experience but found themselves juggling architectural decisions, implementation, firefighting existing problems, and training other team members simultaneously. This was an incredibly stressful position, especially when mistakes could easily cost millions of dollars.

Solution | After working with us

We guided the platform team to make sound platform decisions by leaning on almost a decade of experience with Kubernetes in production across dozens of large clients. Our platform engineering approach transformed their operations:

  • Pathway to Production: We architected and implemented a platform that builds application team software into containers, leverages GitOps to continuously sync platform manifests directly from version control, and fully automates promotion between environments.

  • Distributed Monitoring: We deployed distributed monitoring that was automatically installed alongside new clusters. This helped development teams troubleshoot tough problems and narrow them down to root causes. Dashboards and alerting were exposed to development teams for self-monitoring.

  • Software Rationalization: We evaluated which software (whether purchased or open source) was worth keeping, created a plan to polish the keepers, and helped decommission or replace the rest. This reduced overall platform complexity while saving licensing costs.

  • Platform Team Consulting: We rapidly upskilled the platform engineering team by walking them through architectural decision-making and cross-pollinated knowledge to develop well-rounded Kubernetes foundations. This reduced decision-making burden on the team lead and helped the team balance feature development with high-priority issues.

The result was a modern, scalable platform that enabled the organization to focus on delivering business value instead of wrestling with infrastructure complexity.

Services Provided

Our partnership delivered the right expertise at the right time, addressing both Day 1 concerns and the scaling challenges of Day 2 and beyond. Our Platform Engineering Services for this project included:

Platform Engineering

We brought a Platform-as-a-Product mindset to our services, treating the infrastructure and platform as customer-facing software for developers. Our engineers helped address not only Day 1 concerns, but the scaling challenges associated with Day 2 and beyond.

Kubernetes Training & Best Practices

We ran workshops and training to upskill both the platform team and application teams around the changes that come with Kubernetes. We dove into best practices for containerizing and scaling applications on the platform, and helped sift through the myriad of technologies that exist on top of Kubernetes to determine which would add value versus complexity.

Kubernetes Migrations

Leaning on our software development backgrounds, we paired with development teams to determine if issues were platform-specific or application-specific. We also helped address challenges with migrating apps from a bare-metal VM approach to a cloud-native containerized approach.

Client Advocate

We served as a vendor-neutral client advocate in meetings, helping keep focus on enabling the client’s outcomes. We leveraged our prior relationships with vendors to provide feedback on where their software was hitting the mark and where it was falling short.

How we worked together

Working with a distributed and remote workforce, we ran daily sessions on Zoom and alternated between two primary working methods:

  • When context-sharing was important, we would mob together on problems to allow the entire team to understand the solution and underlying decision-making
  • When velocity was more important, we split into separate pairing sessions, typically pairing one client engineer with one of our engineers to encourage knowledge sharing
  • We demoed major features and accomplishments to the wider team to promote broader understanding of the platform
  • We prototyped new ideas in a lab environment that replicated the client’s infrastructure
  • When the team lead needed to firefight, we continued to drive the team and distribute knowledge amongst other members

This collaborative approach not only accelerated results but gave the client true ownership of the platform, ensuring long-term sustainability and return on investment.

Tech Stack Leveraged

Platform Engineering Kubernetes ArgoCD Crossplane VMware vSphere Kubernetes Service (VKS) Grafana Prometheus Mimir OpenTelemetry Argo Workflows kargo MinIO Envoy Gateway