Researchers from the Korea Advanced Institute of Science and Technology (KAIST) have developed a groundbreaking energy-efficient NPU technology that has shown significant performance enhancements in lab tests.
Their specialized AI chip has been able to run AI models 60% faster while using 44% less electricity compared to the graphics cards commonly used in AI systems, as demonstrated in controlled experiments.
The research, led by Professor Jongse Park from KAIST’s School of Computing in collaboration with HyperAccel Inc., addresses a major challenge in modern AI infrastructure: the high energy and hardware demands of large-scale generative AI models.
Existing systems like OpenAI’s ChatGPT-4 and Google’s Gemini 2.5 require substantial memory bandwidth and capacity, leading companies to invest in numerous NVIDIA GPUs.
The core innovation of the team lies in their approach to solving memory bottleneck issues in AI infrastructure. Their energy-efficient NPU technology focuses on streamlining the inference process while minimizing accuracy loss, a critical balance that previous solutions have struggled with.
Presented at the 2025 International Symposium on Computer Architecture (ISCA 2025) in Tokyo, the research paper titled “Oaken: Fast and Efficient LLM Serving with Online-Offline Hybrid KV Cache Quantization” details their comprehensive approach to the problem.
The technology centers around KV cache quantization, identified by researchers as a major contributor to memory usage in generative AI systems. By optimizing this component, the team enables comparable AI infrastructure performance using fewer NPU devices than traditional GPU-based systems.
The energy-efficient NPU technology from KAIST employs a three-pronged quantization algorithm, incorporating threshold-based online-offline hybrid quantization, group-shift quantization, and fused dense-and-sparse encoding. This approach allows the system to integrate seamlessly with existing memory interfaces without requiring changes to operational logic in current NPU architectures.
Furthermore, the hardware architecture incorporates page-level memory management techniques to maximize limited memory bandwidth and capacity utilization. The team also introduced new encoding techniques specifically optimized for quantized KV cache, addressing the unique needs of their approach.
The environmental impact of AI infrastructure is a growing concern as generative AI adoption increases. The energy-efficient NPU technology developed by KAIST offers a potential path towards more sustainable AI operations, with 44% lower power consumption compared to current GPU solutions.
While the research represents a significant advancement, widespread implementation will require continued development and collaboration within the industry.
As AI companies strive to balance performance and sustainability, the timing of this energy-efficient NPU technology breakthrough is crucial. The technology has shown promise in implementing high-performance, low-power infrastructure for generative AI, with potential applications in AI cloud data centers and dynamic AI environments.
Innovations like KAIST’s energy-efficient NPU technology offer hope for a more sustainable future in artificial intelligence computing as the industry grapples with energy consumption concerns.



