For the artificial intelligence (AI) engineering, 95% of the time and effort is consumed by data related workloads. In order to tackle this challenge, tech giants spend thousands of hours on building ...
本章深入探讨 NUMA(Non-Uniform Memory Access,非统一内存访问)架构下的 AI 编译器优化技术。随着 200T 参数级模型的出现,单一计算节点已无法满足计算和内存需求,NUMA 架构成为高性能计算的必然选择。本章将从 NUMA 基础概念出发,详细讨论亲和性设置、本地内存 ...