DISTRIBUTED MACHINE
LEARNING
        Machine Learning
            Fall 2024
      Zahra Keshavarz Rezaei
    What is Distributed Machine Learning?
Distributed Machine Learning refers to the process of training machine
learning models using multiple machines or processors simultaneously. This
allows the handling of massive datasets and complex models that a single
machine cannot efficiently process.
distributed Machine Learning
             vs.
     Federated Learning
Types of Distributed Learning:
  Data Parallelism
 Model Parallelism
 Hybrid Parallelism
                 Model Parallelism
Split the model itself (e.g., layers of a neural network) across
  different nodes. One node processes input layers, while
              another processes hidden layers.
                Data Parallelism
 Split the training data into smaller subsets. Each worker
(node) processes its subset. Think of each node as working
 on a few puzzle pieces to contribute to the entire picture.
            Key Algorithms in DML
Stochastic Gradient Descent (SGD)
   Mini-batch SGD distributed across nodes.
Gradient Aggregation
   Aggregates gradients computed by multiple nodes.
   Synchronous vs. Asynchronous Training
Synchronous: All nodes update weights simultaneously.
Asynchronous: Nodes update weights independently.
            AllReduce Algorithm
A communication pattern used to aggregate gradients across
all workers.
Reduces communication overhead by combining operations
(e.g., summing gradients) during data transfer.
            Ring-AllReduce Algorithm
Optimized version of All-Reduce.
Workers are organized in a ring topology.
Each worker sends and receives gradients from its neighbors in a
pipeline fashion.
Reduces latency compared to traditional All-Reduce.
Scatter Reduce
Scatter Reduce
All Gather
All Gather
             Parameter Server Architecture
Dedicated nodes responsible for storing and updating the model
parameters (weights, biases, etc.).
Aggregate gradients from workers and send updated parameters
back.
                             Frameworks
TensorFlow Distributed: Provides strategies like tf.distribute.Strategy for
data and model parallelism.
PyTorch Distributed: Includes utilities like torch.distributed for
communication and gradient sharing.
Horovod: Open-source library optimized for distributed deep learning with
minimal code changes.
                           Challenges in DML
   Communication Overhead:
Synchronization between nodes (e.g., sharing gradients) can slow down training.
   Fault Tolerance:
Node failures can disrupt training or lead to inconsistencies.
   Data Imbalance:
 Uneven distribution of data can lead to skewed models.
                    Applications
Image Recognition
Language Modeling
Finance
                      Conclusion
DML is essential for scaling AI to meet modern demands.
Key approaches: Data parallelism, model parallelism, and hybrid methods.
Challenges: Communication overhead, fault tolerance, and data imbalance.
THANK YOU
REFERENCES
 Joost Verbraeken, Matthijs Wolting, Jonathan Katzy, Jeroen Kloppenburg, Tim
 Verbelen, and Jan S. Rellermeyer. 2020. A Survey on Distributed Machine
 Learning.
 Zhao, Huasha & Canny, John. (2013). Sparse Allreduce: Efficient Scalable
 Communication for Power-Law Data.
 Distributed Machine Learning and the Parameter Server, “CS4787 — Principles of
 Large-Scale Machine Learning Systems” Course, Cornell University, Lecture Note.