Pipeline parallelism and activation recomputation are widely adopted optimization techniques, among others, to scale DNN training on large accelerator clusters. However, as DNNs grow in complexity and heterogeneity, it becomes increasingly difficult to determine the optimal combination of pipeline partitioning and recomputation strategies. Existing solutions either propose manual optimization approaches that do not scale or automated approaches that explore only a subset of optimization possibilities due to an explosion of search space. In this paper, we present BMPipe, a bubble-memory co-optimization planner that holistically optimizes computation imbalance, memory under utilization, redundant computation, and schedulinginduced preparation time. At its core, BMPipe uses symbolic representations that unify computation, memory, and bubbles into a single model that is solved by using an ILP-based planner. Using BMPipe, we perform a thorough experimental evaluation where we train several large, state-of-the-art DNN models on a 16K-NPU cluster. We show that BMPipe achieves up to
BMPipe: Bubble-memory co-optimization strategy planner for very-large DNN training
CLUSTER 2025, IEEE International Conference on Cluster Computing, 2-5 September 2025, Edinburgh, Scotland, UK
Type:
Conférence
City:
Edinburgh
Date:
2025-09-02
Department:
Data Science
Eurecom Ref:
8417
Copyright:
© 2025 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
See also:
PERMALINK : https://www.eurecom.fr/publication/8417