KEMBAR78
Paralle Programming in Python | PPT
Python Multiprocessing,
Celery or PP
for
Dynamic Model
Applications
Ghazal Tashakor
https://www.linkedin.com/in/tashakor/
Parallel Programming in Python(PP)
Within parallel programming, Python has
built-in and external modules that simplify
implementation.
Parallelizable Problems
1. Obtaining the highest Fibonacci value
for multiple inputs
2. Crawling the Web
Python PP Tools
1. The PythonThreading Module
2. The Python Multiprocessing Module
3. The Parallel Python Module
4. Celery – A Distributed Task Queue
Disadvantages of Threads
1. Harm the performance of the application
2. Sharing data and scalability is poor
Data Buffering in Multiprocessing
Module solution
Although the user of Multiprocessing and
ProcessPoolExecutor in python could solve
the parallelizable problems and does not need
to use synchronization mechanisms, such as
Locks for instance, but internally, these
mechanisms are used to transport data among
buffers and pipes in order to accomplish
communication.
Advantage of using PP module
1. Automatic detection of number of
processors to improve load balance
2. Many processors allocated can be
changed at runtime
3. Load balance at runtime
4. Auto-discovery resources throughout the
network
Advantage of using Celery
1. Improved version of the
multiprocessing pool (pool as a
service)
2. Best for shared-arrays
Distributing Tasks with Celery
Celery is a framework that offers
mechanisms to lessen difficulties while
creating distributed systems. The Celery
framework works with the concept of
distribution of work units (tasks) by
exchanging messages among the
machines that are interconnected as a
network, or local workers.
A task is the key concept in Celery; any
sort of job we must distribute has to be
encapsulated in a task beforehand.
Parallel Algorithms

The divide and conquer technique

Data decomposition
(we used to creat low cost the user thread)
Not good in the case of reality

Decomposing tasks with pipeline

Processing and mapping
Data Decomposition problem
We used it to solve the matrix problem
where each necessary operation to get to
the final result was executed by a single
worker, and each worker executed the
same number of operations. In real world,
there is an asymmetry of the relation
between the number of workers and the
quantity of data that is decomposed, and
this directly affects the performance of the
solution.
Processing and mapping solution
1. Identifying the tasks that require data
exchange
2. Grouping the tasks that establish
constant communication in a single
worker can enhance the performance.
This is true when there is a large load of data
communication as it may help reduce the
overhead in exchange of the information within
the tasks.
Resources/Books
1. Parallel Programming with Python – June 25, 2014 by Jan Palach
2. Distributed Computing with Python – April 2016 by Francesco Pierfederici
3. Parallel, Distributed Scripting with Python by P. Miller Center for Applied Scientific Computing

Paralle Programming in Python

  • 1.
    Python Multiprocessing, Celery orPP for Dynamic Model Applications Ghazal Tashakor https://www.linkedin.com/in/tashakor/
  • 2.
    Parallel Programming inPython(PP) Within parallel programming, Python has built-in and external modules that simplify implementation.
  • 3.
    Parallelizable Problems 1. Obtainingthe highest Fibonacci value for multiple inputs 2. Crawling the Web
  • 4.
    Python PP Tools 1.The PythonThreading Module 2. The Python Multiprocessing Module 3. The Parallel Python Module 4. Celery – A Distributed Task Queue
  • 5.
    Disadvantages of Threads 1.Harm the performance of the application 2. Sharing data and scalability is poor
  • 6.
    Data Buffering inMultiprocessing Module solution Although the user of Multiprocessing and ProcessPoolExecutor in python could solve the parallelizable problems and does not need to use synchronization mechanisms, such as Locks for instance, but internally, these mechanisms are used to transport data among buffers and pipes in order to accomplish communication.
  • 7.
    Advantage of usingPP module 1. Automatic detection of number of processors to improve load balance 2. Many processors allocated can be changed at runtime 3. Load balance at runtime 4. Auto-discovery resources throughout the network
  • 8.
    Advantage of usingCelery 1. Improved version of the multiprocessing pool (pool as a service) 2. Best for shared-arrays
  • 9.
    Distributing Tasks withCelery Celery is a framework that offers mechanisms to lessen difficulties while creating distributed systems. The Celery framework works with the concept of distribution of work units (tasks) by exchanging messages among the machines that are interconnected as a network, or local workers. A task is the key concept in Celery; any sort of job we must distribute has to be encapsulated in a task beforehand.
  • 10.
    Parallel Algorithms  The divideand conquer technique  Data decomposition (we used to creat low cost the user thread) Not good in the case of reality  Decomposing tasks with pipeline  Processing and mapping
  • 11.
    Data Decomposition problem Weused it to solve the matrix problem where each necessary operation to get to the final result was executed by a single worker, and each worker executed the same number of operations. In real world, there is an asymmetry of the relation between the number of workers and the quantity of data that is decomposed, and this directly affects the performance of the solution.
  • 12.
    Processing and mappingsolution 1. Identifying the tasks that require data exchange 2. Grouping the tasks that establish constant communication in a single worker can enhance the performance. This is true when there is a large load of data communication as it may help reduce the overhead in exchange of the information within the tasks.
  • 13.
    Resources/Books 1. Parallel Programmingwith Python – June 25, 2014 by Jan Palach 2. Distributed Computing with Python – April 2016 by Francesco Pierfederici 3. Parallel, Distributed Scripting with Python by P. Miller Center for Applied Scientific Computing