The Ultra-Scale Playbook
Training LLMs on GPU Clusters
Usually printed in 3 - 5 business days
Embark on a journey to orchestrate thousands of GPUs to scale LLM training to the largest compute clusters today. Starting with the memory and compute anatomy of model training we then explore 5 dimensions of parallelism to distribute training efficiently. From there we dive deeper into how GPUs are designed and how specialised kernels help increase training efficiency further.
This book is a great starting point if you want to get into training ever larger models efficiently at scale!
Details
- Publication Date
- Jul 28, 2025
- Language
- English
- Category
- Computers & Technology
- Copyright
- Creative Commons NonCommercial, ShareAlike (CC BY-NC-SA)
- Contributors
- By (author): Nouamane Tazi, By (author): Ferdinand Mom, By (author): Haojun Zhao, By (author): Phuc Nguyen, By (author): Mohamed Mekkouri, By (author): Leandro von Werra, By (author): Thomas Wolf, Cover design or artwork by: Florine Baeriswyl
Specifications
- Pages
- 246
- Binding Type
- Paperback Perfect Bound
- Interior Color
- Color
- Dimensions
- Digest (5.5 x 8.5 in / 140 x 216 mm)