I build and operate resilient, Kubernetes-first platforms that help teams ship reliably and repeatedly. My work sits at the intersection of SRE, DevOps, and Platform Engineering combining hands-on engineering depth with a focus on operational maturity, developer experience, and pragmatic automation.
I’ve worked across multi-cloud, highly regulated, and Critical National Infrastructure (CNI) environments—helping teams move from manual, fragile workflows to automated delivery with clear guardrails, observability, and supportability.
- Reliability & Platforms: Evolving SRE and platform practices across complex estates (cloud + regulated/CNI)
- Building & Automating: GitOps workflows, pipeline optimisation, and Python-based tooling to reduce toil
- Collaboration: Tight loops with dev teams—platform work only counts if it gets used
- Let’s talk: SLOs/error budgets, Kubernetes/OpenShift ops, CI/CD, GitOps, incident response, observability
- Reducing developer cognitive load with opinionated defaults and self-service workflows
- SRE practices that drive prioritisation: SLIs, SLOs, error budgets
- Kubernetes/OpenShift environments that scale operationally, not just technically
- Incident response → learning via blameless postmortems and systemic fixes
- Automation-first delivery using IaC, GitOps, and reliable pipelines
- Improved CI/CD pipeline runtime by ~30–40% through optimisation and standardisation
- Delivered “Path to Prod” workflows supporting cloud and regulated/CNI environments
- Enabled safer releases for 40+ apps by tightening promotion, validation, and deployment flows
- Raised observability maturity by focusing on actionable signals over dashboard sprawl
I like platforms that:
- Provide safe defaults and clear golden paths (without blocking edge-cases)
- Treat reliability as a product feature
- Reduce toil with automation rather than relying on tribal knowledge
- Keep ownership clear: “self-service” should still be supportable
Pinned repositories generally focus on:
- Platform/infrastructure automation and quality-of-life tooling
- Delivery workflows, GitOps patterns, and operational improvements
- Solving recurring reliability problems once, properly
I write occasionally about reliability engineering, incident response, and platform maturity—often drawing parallels between emergency medicine and operating complex systems under pressure.
- Life Lessons from the Ambulance: How Emergency Medicine Shaped My Approach to IT
- Preparation is Key: A Peek into My Pre-Shift Rituals in Medicine and IT
- The Lifelong Journey of Learning: CPD in Medicine and IT
- Unified Command: Synchronised Communication in Medicine and IT
- From Medical Emergency to IT Outage: My Evolution in Crisis Management
- ❌ Merged PR #3 in AshKapow/GhostieBot
- 💪 Opened PR #3 in AshKapow/GhostieBot
- ❌ Merged PR #2 in AshKapow/GhostieBot