Cutting cloud waste at scale: Akamai saves 70% using AI agents orchestrated by kubernetes



Discover the event trusted by enterprise leaders for almost twenty years. VB Transform brings together the individuals shaping real enterprise AI strategy. Find out more









Cloud costs are escalating in this age of generative AI, with enterprises projected to waste $44.5 billion on unnecessary cloud spending this year.



Akamai Technologies faces a significant challenge due to its extensive and intricate cloud infrastructure on multiple platforms, along with stringent security demands.



To tackle this issue, the cybersecurity and content delivery provider enlisted the help of the Kubernetes automation platform Cast AI, whose AI agents aid in optimizing cost, security, and speed across cloud environments.



Ultimately, Cast AI assisted Akamai in reducing cloud costs by 40% to 70%, depending on the workload.



“We required a continuous method to optimize our infrastructure and decrease cloud expenses without compromising performance,” stated Dekel Shavit, senior director of cloud engineering at Akamai. “We handle security events. Delay is not an option. Failure to respond to a security threat in real time is unacceptable.”



Specialized agents that monitor, analyze, and act



Kubernetes manages the infrastructure supporting applications, making deployment, scaling, and management easier, particularly in cloud-native and microservices architectures.



Cast AI has integrated into the Kubernetes ecosystem to assist customers in scaling their clusters and workloads, selecting optimal infrastructure, and managing compute lifecycles. The core platform, Application Performance Automation (APA), utilizes specialized agents to continuously monitor, analyze, and take action to enhance application performance, security, efficiency, and cost.



APA leverages machine learning models with reinforcement learning based on historical data and learned patterns, supported by an observability stack and heuristics. It is coupled with infrastructure-as-code tools on various clouds, creating a fully automated platform.



Gil emphasized the importance of human-centricity in automation, stating that APA complements human decision-making by maintaining human-in-the-middle workflows.



Akamai’s unique challenges



Shavit outlined Akamai’s complex cloud infrastructure that powers content delivery network (CDN) and cybersecurity services for demanding customers and industries, all while adhering to strict service level agreements and performance requirements.



He emphasized the need to balance complexity with cost, mentioning the challenge of scaling cloud capacity to meet sudden spikes in demand without incurring excessive costs.



After considering various optimization strategies, Akamai focused on enhancing the core infrastructure itself to address the inherent complexity of their business model.



Automatically optimizing the entire Kubernetes infrastructure



Akamai sought a Kubernetes automation platform to optimize real-time costs for its core infrastructure across multiple clouds, scaling applications based on changing demand without compromising performance.



Prior to implementing Cast AI, Akamai’s DevOps team manually tuned Kubernetes workloads infrequently, missing out on real-time optimization opportunities due to the scale and complexity of their infrastructure.



With Cast AI, hundreds of agents continuously fine-tune workloads, implementing features like autoscaling, bin packing, cost-efficient instance selection, workload rightsizing, and Spot instance automation to achieve significant cost savings and performance improvements.



By leveraging Cast AI’s automation capabilities, Akamai was able to streamline their infrastructure management, allowing the team to focus on innovation and customer-centric features.



Editor’s note: At this month’s VB Transform, industry experts will discuss the latest AI trends in healthcare and the challenges of deploying multi-model AI systems in a regulated environment. Register now.