SynopsisBringing a deep learning project into production at scale is quite challenging. To successfully scale your project, you require a foundational understanding of the deep learning stack-specifically, how deep learning interfaces with hardware, software, and data. Ideal for anyone interested in model development at scale, this book illustrates complex concepts of the deep learning stack and reinforces them through practical exercises. Author Suneeta Mall explains the intricate concepts, tools, and techniques to help you scale your deep learning model development and training workload effectively and efficiently. Topics include: How your model is decomposed into a computation graph and how your data flows through this graph during the training process, How accelerated computing speeds up your training and how you can best utilize the hardware resources at your disposal, How to train your model using distributed training paradigms (e.g., data, mode!, pipeline, and hybrid multidimensional parallelism), Debugging, monitoring, and investigating bottlenecks that undesirably slow down the scale out of model training, How to expedite the training lifecycle and streamline your feedback loop to iterate model development and other related tricks, tools, and techniques to scale your training workload, How to apply data-centric techniques to efficiently train your model at scale, Bringing a deep-learning project into production at scale is quite challenging. To successfully scale your project, a foundational understanding of full stack deep learning, including the knowledge that lies at the intersection of hardware, software, data, and algorithms, is required. This book illustrates complex concepts of full stack deep learning and reinforces them through hands-on exercises to arm you with tools and techniques to scale your project. A scaling effort is only beneficial when it's effective and efficient. To that end, this guide explains the intricate concepts and techniques that will help you scale effectively and efficiently. You'll gain a thorough understanding of: How data flows through the deep-learning network and the role the computation graphs play in building your model How accelerated computing speeds up your training and how best you can utilize the resources at your disposal How to train your model using distributed training paradigms, i.e., data, model, and pipeline parallelism How to leverage PyTorch ecosystems in conjunction with NVIDIA libraries and Triton to scale your model training Debugging, monitoring, and investigating the undesirable bottlenecks that slow down your model training How to expedite the training lifecycle and streamline your feedback loop to iterate model development A set of data tricks and techniques and how to apply them to scale your training model How to select the right tools and techniques for your deep-learning project Options for managing the compute infrastructure when running at scale
LC Classification NumberQ325.73.M34 2024