[Draft Version] Performance
improvements in Multi-Processor
systems
1- Even though the current approaches are quite effective for single-processor based
systems, they are not applicable, in many cases, to multi-cores and have to be revised to
effectively utilize the benefits of Chip Multi-Processors (CMP).
2- In recent years, researches have directed their attention to on-chip memory
components such as Scratch Pad Memory (SPM) in addition to hardware-controlled onchip caches to be able to meet very tight constraints, especially embedded systems. SPM
is a software-managed on-chip SRAM with guaranteed fast access time. The advantages
of SPM include power/energy efficiency, reduced cost, better performance, and real-time
predictability.
3- Performance improvement examples:
a. Memory space: Reducing data memory space consumption of embedded applications:
the idea is to reduce memory space requirements by performing extra recomputations instead of indiscriminately(ba bi, u) storing all intermediate results in
memory.
b. Power improvement:
c. Throughput:
d. Execution time: run-time performance vs storage performance
e. Example about system specification: multiple heterogeneous processors and a shared
memory. The processor cores are assumed to be embedded in a single die and they are
connected using share bus. The communication between the processors and the memory
system is conducted via the memory interface and the bus structure.
In above structure, 2 processor cores (CPUs) are embedded into a single chip. Using
graph-based model, called the task graph, to describe the system specification. A task
graph is a directed-acyclic graph where the vertices represent the tasks and the edges
represent the dependencies among these tasks. A directed edge between 2 tasks
captures the fact that there is a data transfer between them. In any scheduling approach,
we have to make sure that a task is scheduled only after all its input data are available
from its predecessors. Below task graph is illustrating the trade-off between performance
and memory consumption:
Task Scheduling? (Scheduler) (Adaptive scheduler for the real-time multiprocessor
systems)  a real-time system means that every task in the system must complete in
time. There are two types of real-time system: hard and soft real-time system. In hard
real-time system, tasks have to complete in time, but in soft real-time system, tasks just
need to complete as soon as possible. If hard or soft real-time systems have predictable
behaviors and the necessary attributes of the task are known in advance, then the
system can be analyzed with a scheduling test and we can develop a scheduling
algorithm to meet the real time requirement. For example, if we know deadlines, we can
use earliest-deadline first(EDF) scheduling in an embedded single processor platform.
Scheduling algorithm in an embedded single processor platform has been studied
detailed; but in an embedded multiprocessor platform, there are still some
problems. for example, the assignment of tasks to processors is an NP-complete
problem. Therefore, we must do with heuristic (/hjuristik/: phong doan, suy doan). But,
these heuristics cannot guarantee an allocation is feasible. There are two kinds of
multiprocessor scheduling: partitioned scheduling and global scheduling. In a partitioned
scheduling, once a task is allocated to a processor, all of its instances are executed
exclusively on that processor. In a global scheduling, any instance of a task can be
executed on a processor, or even be preempted and moved to a different processor
before it is completed  propose a adaptive scheduler based on the global
scheduling
The multiprocessor System-on-Chip (MPSoC) is a system-on-a-chip (SoC) which uses
multiple processors (see multi-core), usually targeted for embedded applications. It is
used by platforms that contain multiple, usually
heterogeneous, processing elements with specific functionalities reflecting the need of
the expected application domain, a memory hierarchy (often using scratchpad RAM and
DMA) and I/O components. All these components are linked to each other by an on-chip
interconnect. These architectures meet the performance needs of multimedia
applications, telecommunication architectures, network security and other application
domains while limiting the power consumption through the use of specialised processing
elements and architecture. (http://en.wikipedia.org/wiki/MPSoC)
A system on a chip or system on chip (SoC or SOC) is an integrated circuit (IC) that
integrates all components of a computer or other electronic system into a single chip. It
may contain digital, analog, mixed-signal, and often radio-frequency functionsall on a
single chip substrate. SoCs are very common in the mobile electronics market because of
their low power consumption. A typical application is in the area of embedded systems.
(http://en.wikipedia.org/wiki/System_on_a_chip)
A multi-core processor is a single computing component with two or more independent
actual central processing units (called "cores"), which are the units that read and execute
program instructions.[1] The instructions are ordinary CPU instructions such as add, move
data, and branch, but the multiple cores can run multiple instructions at the same time,
increasing overall speed for programs amenable to parallel computing.[2] Manufacturers
typically integrate the cores onto a single integrated circuit die (known as a chip
multiprocessor or CMP), or onto multiple dies in a single chip package.
Hardware/Software partitioning is the problem of dividing an applications computation
into a part that executes as sequential instructions on a microprocessor( the software)
and part that run in parallel as a part of hardware such as FPGA or ASIC to improve some
metrics such as performance, cost, power, size.
Scratchpad Memory: A portion of L1 cache reserved for direct and private usage by the
CPU. Typically, a cache is used to temporarily store copies of data that resides on slower
main memory. However, the CPU can use scratchpad RAM for any purpose, such as
storing instructions or intermediate values