ManuMatic: Strategy injection for robust automatic hybrid parallelism in distributed DNN training

Wang, Ruiwen; Li, Chong; Wang, Hongxing; Appuswamy, Raja; Yujie, Yuan

NPC 2025, 22nd IFIP International Conference on Network and Parallel Computing, 14-16 November 2025, Nha Trang, Vietnam / Also on Lecture Notes in Computer Science, Vol.16306

Training modern deep neural networks (DNNs) requires hybrid parallelism. Automatic planners search data, tensor/model, and pipeline shardings with cost models, but decisions can drift from runtime optima due to framework/planner decoupling and overlap mis-modeling. We present MANUMATIC, a light-touch planner that lets users pin a few critical operator shardings while automatically deriving globally consistent strategies for the rest. Inside a binary recursive partitioner, MANUMATIC prioritizes pins via an infinite compromise price and decomposes multi-dimensional hints into two-way refinements; when hard constraints are infeasible, a soft-penalty variant applies. The design is profiling-free, preserves D-Rec’s short compilation time, and degenerates to D-Rec when no pins are given. Built atop D-Rec, MANUMATIC delivers consistent speedups without cost-model reengineering: on Mixtral-8 $\times$

" role="presentation" style="box-sizing: inherit; display: inline-block; line-height: normal; font-size-adjust: none; word-spacing: normal; overflow-wrap: normal; text-wrap-mode: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; padding: 0px; margin: 0px; scroll-margin-top: 74px; position: relative;">

7B, an expert-parallel-aware BMM pin achieves 2.24 $\times$

over D-Rec; on Llama3-8B, a sequence-parallel-aware MatMul pin reaches 2.04 $\times$

; on Qwen2.5-72B, a sequence-parallel-aware MatMul pin combined with BMPipe yields 1.45 $\times$

over D-Rec and 1.30 $\times$

over an expert plan. These results show that minimal guidance can robustify automatic parallelism while largely preserving automation.

Detail

DOI

BIBTEX

Type:

Conference

City:

Nha Trang

Date:

2025-11-14

Department:

Data Science

Eurecom Ref:

8487

© Springer. Personal use of this material is permitted. The definitive version of this paper was published in NPC 2025, 22nd IFIP International Conference on Network and Parallel Computing, 14-16 November 2025, Nha Trang, Vietnam / Also on Lecture Notes in Computer Science, Vol.16306 and is available at : https://doi.org/10.1007/978-3-032-10466-3_14