Alibaba said its new computing pooling system, Aegaeon, reduced the number of
Nvidia H20 GPUs needed to serve
AI models by 82%, from 1,192 to 213, during tests on its Model Studio platform. The system reportedly enables a single GPU to support up to seven models and lowers latency by 97%.