Alibaba unveils research on tools to cut outages and cloud costs

Alibaba has announced that its innovative low-level software has successfully reduced network outages, decreased load balancing costs, and enhanced SmartNIC performance by reallocating workloads to underutilized infrastructure. According to a report by The Register, the company will be presenting its findings in three research papers at the upcoming SIGCOMM conference.

One of the papers introduces ZooRoute, a system designed to maintain cloud network operations during failures. Described as a “fast failure recovery service,” ZooRoute ensures global bypass in large-scale cloud networks within seconds. This is crucial as network failures are common in cloud environments, and the speed of response can impact end users. Alibaba’s ZooRoute has been implemented in production for 18 months, reducing overall outage time by over 92%.

Another research effort focuses on Hermes, a system that optimizes layer 7 load balancers in cloud networks. By implementing a new scheduling layer based on eBPF technology, Hermes can prioritize and distribute traffic more efficiently, reducing CPU imbalances and uneven connection counts significantly. This approach has resulted in a nearly 100% decrease in worker “hangs” and a 19% reduction in the cost of running layer 7 load balancing infrastructure.

The third paper introduces Nezha, a distributed system for balancing workloads in SmartNICs. By monitoring usage and redistributing tasks from overloaded to underused SmartNICs, Nezha has improved performance and efficiency. The deployment of Nezha is cost-effective and has eliminated bottlenecks in virtual switches on SmartNICs.

Alibaba’s research showcases the company’s efforts to enhance efficiency and reliability in cloud infrastructure. By addressing outages and bottlenecks, providers can improve customer confidence and reduce unnecessary hardware spending. The emphasis on software-based solutions highlights the importance of managing complex cloud networks effectively.

This content has been seamlessly integrated into a WordPress platform, preserving the original HTML tags, images, and key points.