Platform Engineering Case Study

Transforming an Unstable Platform and Inexperienced Team into a Self-Sustaining Operation

From reactive firefighting to proactive automation, with capabilities that survived complete team turnover

"You're the first contractor we could actually say did its job. We don't need you anymore."

— Platform Champion and Technical Manager

Client: Federal Systems Integrator - Aerospace & Defense

About the Client (Federal Systems Integrator - Aerospace & Defense)

Our federal systems integrator client had inherited a Tanzu Application Service platform with significant technical debt and an inexperienced team. They lacked training, methodology, and effective management capabilities. Their environment was unstable, inconsistent across classification levels, and heavily dependent on manual UI operations. The challenge was clear: transform a struggling platform into a reliable, automated foundation while building team capabilities from the ground up.

Outcomes

  • Same-day application deployments - developers could request an environment and be running apps in production the same day
  • ~1,000 application instances running across gov cloud (800 AIs per environment) and public cloud (200 AIs) with consistent configuration and security posture
  • Major version TAS upgrades completed in a single weekend - work that previously took weeks or months
  • Complete team transformation - the entire team cycled out once during the three-year engagement, with each generation going from complete beginners to confident platform operators
  • Sandbox to production deployments in one sprint - achieving rapid integration cycles that were previously impossible
  • Fully automated monthly stem cell patching - unattended pipeline runs from sandbox through production with automated change management integration
This engagement demonstrated that platform transformation is as much about people as it is about technology. By combining rigorous automation with immersive knowledge transfer, we built a self-sustaining operation that continued to improve even as the team evolved. The client’s own words capture it best: “You’re the first contractor we could actually say did its job. We don’t need you anymore.”

Problem | Before working with us

When we arrived, the platform was in critical condition. The sandbox environment was completely dead with expired certificates, making it impossible to test changes safely. Years of manual UI changes had introduced severe configuration drift across environments-no one knew the actual state of anything.

There was no source control and no authoritative configuration. Every change was a leap of faith. The team had no exposure to automation, infrastructure as code, or Agile methodologies. They were operating blind.

Draconian change management policies required 2-3 weeks advance notice for any production change, with work restricted to weekends only. This created a painful cycle: changes were rushed, mistakes were made, and trust eroded further.

The underlying AWS infrastructure was inconsistent-a mix of Terraform-managed resources and manual builds accumulated over time. Even basic tasks required navigating this minefield of undocumented configurations.

Solution | After working with us

We started by baselining the current platform deployments, capturing the actual state of every environment into version control. This wasn’t glamorous, but it was essential. For the first time, the team had a source of truth.

With Platform Automation Toolkit, we built repeatable deployment pipelines for Cloud Foundry / Tanzu. Our sandbox-first workflow meant every change was validated before touching production. We applied the same rigor to AWS infrastructure, achieving Terraform parity across all environments.

The transformation accelerated with our modular pipeline orchestrator, a single YAML configuration that could drive deployments across any environment. Monthly stem cell patching became fully automated, running unattended from sandbox through production with CMDB integration for change tracking.

Most importantly, we earned trust through consistency. Our demonstrated reliability convinced change management to grant “standard change” approval, eliminating the weeks of advance notice. Weekend-only restrictions lifted as confidence grew.

For developers, we conducted user interviews to understand their pain points and built streamlined onboarding processes. What once took weeks now happened same-day. Our platform engineering approach treated the platform as a product, with developers as the customers of the platform.

Services Provided

Our partnership delivered both technical transformation and lasting capability transfer. We embedded with the client team, ensuring that every task was a learning opportunity. Our services included:

Platform Engineering & SRE

We embedded engineers who worked side-by-side with the client team. Each troubleshooting session, deployment, and incident became a teaching moment. This immersive approach built capabilities that survived complete team turnover.

Infrastructure as Code

We transformed inconsistent AWS infrastructure into fully Terraform-managed resources, ensuring that VPCs, EC2 instances, S3 buckets, NAT Gateways, and RDS databases were version-controlled, reproducible, and auditable.

Automation & Pipeline Development

Our modular Concourse pipelines provided flexibility without complexity. A single YAML configuration drove deployments across environments, with built-in validation gates and automatic rollback capabilities.

Container Lifecycle Management

We built custom Concourse containers that met strict security compliance requirements, ensuring that each container image was scanned, validated, and maintained on a predictable lifecycle.

Developer Enablement

Through user interviews and iterative improvements, we created onboarding processes that got developers from request to running application in the same day. Reference applications demonstrated buildpack usage and best practices.

Change Management Integration

We didn’t fight the bureaucracy-we earned its trust. By demonstrating consistent, reliable operations, we progressed from emergency-only approvals to standard change status, dramatically reducing deployment friction.

FinOps

We implemented resource sizing optimization and established fiscal accountability practices, ensuring the platform delivered value efficiently while maintaining the performance the mission required.

How we worked together

Our methodology was simple: “I do, we do, you do.” Each task followed this progression:

  • First, we demonstrated while the team observed
  • Then, we worked together with the team taking increasing ownership
  • Finally, the team led while we provided backup and guidance

This approach proved its worth when the entire team cycled out during our engagement. The new team members went through the same progression, and within months they were operating confidently. When that happened again, the pattern held.

We paired daily, solved problems side-by-side, and documented thoroughly. Knowledge didn’t live in our heads-it lived in runbooks, pipelines, and the skills we transferred.

The ultimate validation came when the client told us: “You’re the first contractor we could actually say did its job. We don’t need you anymore.” That’s not losing a customer. That’s mission accomplished.

Tech Stack Leveraged

Platform Engineering Cloud Foundry Tanzu Application Service Platform Automation Toolkit Tanzu Ops Manager Terraform Concourse AWS AWS GovCloud