0% found this document useful (0 votes)

27 views86 pages

HPC Architecture

Uploaded by

Rajul

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views86 pages

HPC Architecture

Uploaded by

Rajul

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 86

Processor Architecture

Victor Eijkhout

Fall 2023

1 Eijkhout – Processor Architecture – Fall 2023

Justification

The performance of a parallel code has as one component the

behaviour of the single processor or single-threaded code. In this
section we discuss the basics of how a processor executes
instructions, and how it handles the data these instructions operate on.

2 Eijkhout – Processor Architecture – Fall 2023

Structure of a modern processor

3 Eijkhout – Processor Architecture – Fall 2023

Von Neumann machine

The ideal processor:

(Stored program)
An instruction contains the operation and two operand locations
Processor decodes instruction, gets operands, computes and
writes back the result
Repeat

4 Eijkhout – Processor Architecture – Fall 2023

The actual state of affairs

Single instruction stream versus multiple cores / floating point

units
Single instruction stream versus Instruction Level Parallelism
Unit-time-addressable memory versus large latencies
Modern processors contain lots of magic to make them seem like Von
Neumann machines.

5 Eijkhout – Processor Architecture – Fall 2023

Complexity measures

Traditional: processor speed was paramount. Operation counting.

Nowadays: memory is slower than processors
This course
Study data movement aspects
Algorithm design for processor reality

6 Eijkhout – Processor Architecture – Fall 2023

A first look at a processor

7 Eijkhout – Processor Architecture – Fall 2023

Structure of a core

8 Eijkhout – Processor Architecture – Fall 2023

Motivation for pipelining

An operation consists of several stages.

Addition:
Decoding the instruction operands.
Data fetch into register
Aligning the exponents:

.35 × 10−1 + .6 × 10−2 becomes

−1
.35 × 10 + .06 × 10 .− 1

Adding mantissas, giving .41.

Normalizing the result, giving .41 × 10−1 .
Storing the result.
pipeline stages

9 Eijkhout – Processor Architecture – Fall 2023

Pipelining, pictorially

Discrete hardware for each stage:

10 Eijkhout – Processor Architecture – Fall 2023

Analysis

Operation timing:

n operations

ℓ number of stages ⇒ t (n) = nℓτ

τ clock cycle


With pipelining:
t (n) = [s + ℓ + n − 1]τ

where s is a setup cost

⇒ Asymptotic speedup is ℓ
n1/2 : value for which speedup is ℓ/2

11 Eijkhout – Processor Architecture – Fall 2023

Applicability of pipelining

Pipelining works for:

vector addition/multiplication
Division/square root maybe pipelined, but much slower

12 Eijkhout – Processor Architecture – Fall 2023

Recurrences

Pipelining does not immediately work:

for (i) {
x[i+1] = a[i]*x[i] + b[i];
}

Transform:

xn+2 = an+1 xn+1 + bn+1

= an+1 (an xn + bn ) + bn+1
= an+1 an xn + an+1 bn + bn+1

13 Eijkhout – Processor Architecture – Fall 2023

Instruction pipeline

Instruction-Level Parallelism: more general notion of independent

instructions
Requires independent instructions
As frequency goes up, pipeline gets longer: more demands on
compiler

14 Eijkhout – Processor Architecture – Fall 2023

Instruction-Level Parallelism

multiple-issue of independent instructions

branch prediction and speculative execution
out-of-order execution
prefetching
Problems: complicated circuitry, hard to maintain performance

15 Eijkhout – Processor Architecture – Fall 2023

Implications

Long pipeline needs many independent instructions:

demands on compiler
Conditionals break the stream of independent instructions
Processor tries to predict branches
branch misprediction penalty:
pipeline needs to be flushed and refilled
avoid conditionals in inner loops!

16 Eijkhout – Processor Architecture – Fall 2023

Instructions

Addition/multiplication: pipelined
Division (and square root): much slower
for ( i )
a[i] = b[i] / c

Can you improve on this?

Fused Multiply-Add (FMA) s += a*b
where can you use this?

17 Eijkhout – Processor Architecture – Fall 2023

Peak performance

Performance is a function of
Clock frequency,

SIMD width

Load/store unit behavior

Floating point capabilities of several processor architectures

DAXPY cycle number for 8 operands
Processor year add/mult/fma units daxpy cycles
(count×width) (arith vs load/store)
MIPS R10000 1996 1×1+1×1+0 8/24
Alpha EV5 1996 1×1+1×1+0 8/12
IBM Power5 2004 0+0+2×1 4/12
AMD Bulldozer 2011 2×2+2×2+0 2/4
Intel Sandy Bridge 2012 1×4+1×4+0 2/4
Intel Haswell 2014 0+0+2×4 1/2

18 Eijkhout – Processor Architecture – Fall 2023

Memory hierarchy: caches, register, TLB.

19 Eijkhout – Processor Architecture – Fall 2023

The Big Story

DRAM memory is slow, so let’s put small SRAM close to the

processor
This helps if data is reused
Does the algorithm have reuse?
Does the implementation reuse data?

20 Eijkhout – Processor Architecture – Fall 2023

Bandwidth and latency

Important theoretical concept:

latency is delay between request for data and availability
bandwidth is rate at which data arrives thereafter

21 Eijkhout – Processor Architecture – Fall 2023

Memory hierarchy

22 Eijkhout – Processor Architecture – Fall 2023

Registers

23 Eijkhout – Processor Architecture – Fall 2023

Computing out of registers

a := b + c

load the value of b from memory into a register,

load the value of c from memory into another register,
compute the sum and write that into yet another register, and
write the sum value back to the memory location of a.

24 Eijkhout – Processor Architecture – Fall 2023

Assembly code
(note: Intel two-operand syntax)

addl %eax, %edx

Registers are named

Can be explicitly addressed by the programmer
. . . as opposed to caches.
Assembly coding or inline assembly (compiler dependent)
. . . but typically generated by compiler

25 Eijkhout – Processor Architecture – Fall 2023

Examples of register usage

1. Resident in register

a := b + c
d := a + e
a stays resident in register, avoid store and load
2. subexpression elimination:

t1 = sin(alpha) * x + cos(alpha) * y;
t2 = -cos(alpha) * x + sin(alpha) * y;
becomes:

s = sin(alpha); c = cos(alpha);
t1 = s * x + c * y;
t2 = -c * x + s * y
often done by compiler

26 Eijkhout – Processor Architecture – Fall 2023

Caches

27 Eijkhout – Processor Architecture – Fall 2023

Cache basics

Fast SRAM in between memory and registers: mostly serves data

reuse

... = ... x ..... // instruction using x

......... // several instructions not involving x
... = ... x ..... // instruction using x

load x from memory into cache, and from cache into register;
operate on it;
do the intervening instructions;
request x from memory, but since it is still in the cache, load it
from the cache into register; operate on it.
essential concept: data reuse

28 Eijkhout – Processor Architecture – Fall 2023

Cache levels

Levels 1,2,3(,4): L1, L2, etc.

Increasing size, increasing latency, decreasing bandwidth
(Note: L3/L4 can be fairly big; beware benchmarking)
Cache hit / cache miss: one level is consulted, then the next
L1 has separate data / instruction cache, other levels mixed
Caches do not have enough bandwidth to serve the processor:
coding for reuse on all levels.

29 Eijkhout – Processor Architecture – Fall 2023

Cache misses

Compulsory miss: first time data is referenced

Capacity miss: data was in cache, but has been flushed
(overwritten) by LRU policy
Conflict miss: two items get mapped to the same cache location,
even if there are no capacity problems
Invalidation miss: data becomes invalid because of activity of
another core

30 Eijkhout – Processor Architecture – Fall 2023

Cache hits

Data has been requested, used a second time: temporal locality

⇒ Can’t wait too long between uses
(Data can be loaded because it’s close to data requested: spatial
locality. Later.)

31 Eijkhout – Processor Architecture – Fall 2023

Capacity miss

(Why is that last block going where it is going?)

32 Eijkhout – Processor Architecture – Fall 2023

Cache capacity

Loading data multiple times

LRU: oldest item evicted if needed
Reuse if not too much data

for ( lots of times ) // sequential loop

load and process data // probably parallel loop

33 Eijkhout – Processor Architecture – Fall 2023

Illustration of capacity

12 2.0

10
1.8

8
Cache miss fraction

1.6

cycles per op
6
1.4
4

1.2
2

00 5 10 15 20 25 301.0
dataset size

34 Eijkhout – Processor Architecture – Fall 2023

Replacement policies

What determines where new data goes /

what old data is overwritten?
Least Recently Used (LRU): most common
First-In / First-Out (FIFO): IBM Power4. Not a good idea.
Random Replacement. Sometimes used.
It’s actually more subtle than pure LRU . . .

35 Eijkhout – Processor Architecture – Fall 2023

Cache lines

Memory requests go by byte or word

Memory transfers go by cache line:
typically 64 bytes / 8 double precision numbers
Cache line transfer costs bandwidth
⇒ important to use all elements

36 Eijkhout – Processor Architecture – Fall 2023

Effects of striding

Always 8 numbers transferred

With stride s > 1: 8/s elements used
Loss of efficiency if bandwidth-limited

37 Eijkhout – Processor Architecture – Fall 2023

Cache line use

for (i=0; i<N; i++)

... = ... x[i] ...

for (i=0; i<N; i+=stride)

... = ... x[i] ...

38 Eijkhout – Processor Architecture – Fall 2023

Stride effects

for (i=0,n=0; i<L1WORDS; i++,n+=stride)

array[n] = 2.3*array[n]+1.2;

7 300

6
250

5
cache line utilization

200

total kcycles
4
150
3

100
2

11 2 3 4 5 6 750
stride

39 Eijkhout – Processor Architecture – Fall 2023

Cache mapping

Cache is smaller than memory, so we need a mapping scheme

memory address 7→ cache address
Ideal: any address can go anywhere; LRU policy for replacement
pro: optimal; con: slow, expensive to manufacture
Simple: direct mapping by truncating addresses
pro: fast and cheap; con: I’ll show you in a minute
Practical: limited associativity
‘enough but not too much’

40 Eijkhout – Processor Architecture – Fall 2023

Direct mapping

Direct mapping of 32-bit addresses into a 64K cache

Use last number of bits to find cache address
If you traverse an array, a contiguous chunk will be mapped to
cache without conflict.
If (memory) addresses are cache size apart, they get mapped to
the same cache location

41 Eijkhout – Processor Architecture – Fall 2023

Conflicts

Mapping conflicts in a direct-mapped cache.

42 Eijkhout – Processor Architecture – Fall 2023

The problem with direct mapping

real*8 A(8192,3);
do i=1,512
a(i,3) = ( a(i,1)+a(i,2) )/2
end do

In each iteration 3 elements map to the same cache location:

constant overwriting (‘eviction’, cache thrasing):
low performance

43 Eijkhout – Processor Architecture – Fall 2023

Associative cache mapping

Allow each memory address to go to multiple (but not all) cache

addresses; typically 2,4,8
Prevents problems with multiple arrays
Reasonable fast
Often lower associativity for L1 than L2, L3

Associativity L1 L2
Intel (Woodcrest) 8 8
AMD (Bulldozer) 2 8

44 Eijkhout – Processor Architecture – Fall 2023

Associativity

Associative cache structure

45 Eijkhout – Processor Architecture – Fall 2023

Illustration of associativity

Two caches of 12 elements: direct mapped (left) and 3-way

associative (right)
Direct map: 0–12 is conflict
Associative: no conflict

46 Eijkhout – Processor Architecture – Fall 2023

Associativity in practice

m
∀j : yj = yj + ∑ xi ,j
i =1

47 Eijkhout – Processor Architecture – Fall 2023

The number of L1 cache misses and the number of cycles for each j
One remedy

Do not user powers of 2.

The number of L1 cache misses and the number of cycles for each j
column accumulation, vector length 4096 + 8

48 Eijkhout – Processor Architecture – Fall 2023

Exercise

Write a small cache simulator in your favorite language. Assume a

k -way associative cache of 32 entries and an architecture with 16 bit
addresses.

49 Eijkhout – Processor Architecture – Fall 2023

Exercise: vectorsum

Compare sequential performance to single-threaded OMP

For some problem sizes observe a difference in performance
Use Intel option -qopt-report=3 and inspect the report.
Compare different compilers: Intel 19 behaves differently from 24!
Also gcc13.

for ( int iloop=0; iloop<nloops; ++iloop ) {

for ( int i=0; i<vectorsize; ++i ) {
outvec[i] += invec[i]*loopcoeff[iloop];
}
}

Analyze and report

50 Eijkhout – Processor Architecture – Fall 2023

51 Eijkhout – Processor Architecture – Fall 2023

Bandwidth / latency

Simple model for sending n words:

t = α + βn

Quoted bandwidth figures are always optimistic:

bandwidth shared between cores
not enough bandwidth for all cores:
⇒ speedup less than linear
bandwidth wasted on coherence
NUMA: pulling data from other socket
assumes optimal scheduling of DRAM banks

52 Eijkhout – Processor Architecture – Fall 2023

Prefetch

Do you have to wait for every item from memory?

Memory controller can infer streams: prefetch
Sometimes controllable through assembly, directives, libraries
(AltiVec)
One form of latency hiding

53 Eijkhout – Processor Architecture – Fall 2023

Memory pages

Memory is organized in pages:

Translation between logical address, as used by program, and
physical in memory
This serves virtual memory and relocatable code
so we need another translation stage.

54 Eijkhout – Processor Architecture – Fall 2023

Page translation: TLB

General page translation: slowish and expensive

Translation Look-aside Buffer (TLB) is a small list of frequently
used pages
Example of spatial locality: items on an already referenced page
are found faster

55 Eijkhout – Processor Architecture – Fall 2023

TLB misses

#define INDEX(i,j,m,n) i+j*m

array = (double*) malloc(m*n*sizeof(double));
/* traversal #2 */
for (i=0; i<m; i++)
for (j=0; j<n; j++)
array[INDEX(i,j,m,n)] = array[INDEX(i,j,m,n)]+1;

56 Eijkhout – Processor Architecture – Fall 2023

TLB hits

#define INDEX(i,j,m,n) i+j*m

array = (double*) malloc(m*n*sizeof(double));
/* traversal #1 */
for (j=0; j<n; j++)
for (i=0; i<m; i++)
array[INDEX(i,j,m,n)] = array[INDEX(i,j,m,n)]+1;

57 Eijkhout – Processor Architecture – Fall 2023

Little’s Law

Item loaded from memory, processed, new item loaded in

response
But this can only happen after latency wait
Items during latency are independent, therefore
Concurrency = Bandwidth × Latency.

58 Eijkhout – Processor Architecture – Fall 2023

Multicore issues

59 Eijkhout – Processor Architecture – Fall 2023

Why multicore

Quest for higher performance:

Not enough instruction parallelism for long pipelines
Two cores at half speed more energy-efficient than one at full
speed.
Multicore solution:
More theoretical performance
Burden for parallelism is now on the programmer

60 Eijkhout – Processor Architecture – Fall 2023

Dennard scaling

Scale down feature size by s:

Feature size ∼s
Voltage ∼s
Current ∼s
Frequency ∼ s−1

Miracle conclusion:

Power = V · I ∼ s2 ; Power density ∼ 1

Everything gets better, cooling problem stays the same

Opportunity for more components, higher frequency

61 Eijkhout – Processor Architecture – Fall 2023

Dynamic power

Charge q = CV
Work W = qV = CV 2 (1)
Power W /time = WF = CV 2 F
Two cores at half frequency:

Cmulti = 2C 
Fmulti = F /2 ⇒ Pmulti = P /4.
Vmulti = V /2


Same computation, less power

62 Eijkhout – Processor Architecture – Fall 2023

Multicore caches

63 Eijkhout – Processor Architecture – Fall 2023

The coherence problem

64 Eijkhout – Processor Architecture – Fall 2023

Cache coherence

Modified-Shared-Invalid (MSI) coherence protocol:

Modified: the cacheline has been modified
Shared: the line is present in at least one cache and is
unmodified.
Invalid: the line is not present, or it is present but a copy in
another cache has been modified.

65 Eijkhout – Processor Architecture – Fall 2023

Coherence issues

Coherence is automatic, so you don’t have to worry about it. . .

. . . except when it saps performance
Beware false sharing
writes to different elements of a cache line

66 Eijkhout – Processor Architecture – Fall 2023

Balance analysis

Sandy Bridge core can aborb 300 GB/s

4 DDR3/1600 channels provide 51 GB/s, difference has to come
from reuse
It gets worse: latency 80ns, bandwidth 51 GB/s,
Little’s law: parallelism 64 cache lines
However, each core only has 10 line fill buffers,
so we need 6–7 cores to provide the data for one core
Power: cores are 72%, uncore 17, DRAM 11.
Core power goes 40% to instruction handling, not arithmetic
Time for a redesign of processors and programming; see my
research presentation

67 Eijkhout – Processor Architecture – Fall 2023

Programming strategies for performance

68 Eijkhout – Processor Architecture – Fall 2023

How much performance is possible?

Performance limited by
Processor peak performance: absolute limit
Bandwidth: linear correlation with performance
Arithmetic intensity: ratio of operations per transfer
If AI high enough: processor-limited
otherwise: bandwidth-limited

69 Eijkhout – Processor Architecture – Fall 2023

erformance depends on algorithm:

70 Eijkhout – Processor Architecture – Fall 2023

nsufficient utilization of functional units:

71 Eijkhout – Processor Architecture – Fall 2023

mperfect data transfer:

72 Eijkhout – Processor Architecture – Fall 2023

Spatial and temporal locality

Temporal locality: use an item, use it again but from cache

efficient because second transfer cheaper.
Spatial locality: use an item, then use one ‘close to it’
(for instance from same cacheline)
efficient because item is already reachable even though not used
before.

73 Eijkhout – Processor Architecture – Fall 2023

Architecture aware programming

Cache size: block loops

pipelining and vector instructions: expose streams of instructions
reuse: restructure code (both loop merge and splitting, unroll
TLB: don’t jump all over memory
associativity: watch out for powers of 2

74 Eijkhout – Processor Architecture – Fall 2023

Loop blocking

Multiple passes over data

for ( k< small bound )

for ( i < N )
x[i] = f( x[i], k, .... )

Block to be cache contained

for ( ii < N; ii+= blocksize )

for ( k< small bound )
for ( i=ii; i<ii+blocksize; i++ )
x[i] = f( x[i], k, .... )

This requires independence of operations

75 Eijkhout – Processor Architecture – Fall 2023

The ultimate in performance programming: DGEMM

Matrix-matrix product C = A · B

∀i ∀j ∀k : cij + = aik bkj

Three independent loop i , j , k

all three blocked i ′ , j ′ , k ′
Many loop permutations, blocking factors to choose

76 Eijkhout – Processor Architecture – Fall 2023

DGEMM variant

Inner products

for ( i )
for ( j )
for ( k )
c[i,j] += a[i,k] * b[k,j]

77 Eijkhout – Processor Architecture – Fall 2023

DGEMM variant

Outer product: updates with low-rank columns-times-vector

for ( k )
for ( i )
for ( j )
c[i,j] += a[i,k] * b[k,j]

78 Eijkhout – Processor Architecture – Fall 2023

DGEMM variant

Building up rows by linear combinations

for ( i )
for ( k )
for ( j )
c[i,j] += a[i,k] * b[k,j]

Exchanging i , j: building up columns

79 Eijkhout – Processor Architecture – Fall 2023

Rank 1 updates

C∗∗ = ∑ A∗k Bk ∗
k

80 Eijkhout – Processor Architecture – Fall 2023

Matrix-panel multiply

Block of A times ‘sliver’ of B

81 Eijkhout – Processor Architecture – Fall 2023

Inner algorithm

For inner i:

// compute C[i,*] :
for k:
C[i,*] = A[i,k] * B[k,*]

82 Eijkhout – Processor Architecture – Fall 2023

Tuning

For inner i:

// compute C[i,*] :
for k:
C[i,*] += A[i,k] * B[k,*]

C[i,*] stays in register

A[i,k] and B[k,*] stream from L1
blocksize of A for L2 size
A stored by rows to prevent TLB problems

83 Eijkhout – Processor Architecture – Fall 2023

Cache-oblivious programming

Observation: recursive subdivision will ultimately make a problem

small / well-behaved enough

84 Eijkhout – Processor Architecture – Fall 2023

Cache-oblivious matrix-matrix multiply

C11 C12 A11 A12 B11 B12
=
C21 C22 A21 A22 B21 B22
with C11 = A11 B11 + A12 B21
Recursive approach will be cache contained.
Not as high performance as being cache-aware. . .

85 Eijkhout – Processor Architecture – Fall 2023

The power question

86 Eijkhout – Processor Architecture – Fall 2023

Lecture 1 - Introduction
No ratings yet
Lecture 1 - Introduction
26 pages
Onur 447 Spring15 Lecture17 Memoryhierarchyandcaches Afterlecture
No ratings yet
Onur 447 Spring15 Lecture17 Memoryhierarchyandcaches Afterlecture
51 pages
Computer Arichitecture
No ratings yet
Computer Arichitecture
60 pages
Modern Computer Architecture (Processor Design) : Prof. Dan Connors Dconnors@colostate - Edu
No ratings yet
Modern Computer Architecture (Processor Design) : Prof. Dan Connors Dconnors@colostate - Edu
32 pages
Module 2
No ratings yet
Module 2
127 pages
Core Set of Topics From Computer Architecture and Microprocessors
No ratings yet
Core Set of Topics From Computer Architecture and Microprocessors
3 pages
CA I - Chapter 1 Introduction
No ratings yet
CA I - Chapter 1 Introduction
39 pages
Ade 2
No ratings yet
Ade 2
64 pages
CA Lecture 12
No ratings yet
CA Lecture 12
48 pages
System-On-Chip (Soc) Architecture Soc Example
No ratings yet
System-On-Chip (Soc) Architecture Soc Example
71 pages
Department of Computer Science and Engineering Subject Name: Advanced Computer Architecture Code: Cs2354
No ratings yet
Department of Computer Science and Engineering Subject Name: Advanced Computer Architecture Code: Cs2354
7 pages
Chap 1
No ratings yet
Chap 1
48 pages
Computer System Overview: 1 Spring 2015
No ratings yet
Computer System Overview: 1 Spring 2015
48 pages
Final Exam Topics: CSE 564 Computer Architecture Summer 2017
No ratings yet
Final Exam Topics: CSE 564 Computer Architecture Summer 2017
78 pages
Parallel Computer Architecture A Hardware-Software
No ratings yet
Parallel Computer Architecture A Hardware-Software
18 pages
Lec21 Career
No ratings yet
Lec21 Career
55 pages
66 F 94 Be 58 A 2 C 1 LEC2 COALby Muzamil Ahmed
No ratings yet
66 F 94 Be 58 A 2 C 1 LEC2 COALby Muzamil Ahmed
37 pages
Computer Organization & CPU Architecture
No ratings yet
Computer Organization & CPU Architecture
13 pages
cs152 Notes
No ratings yet
cs152 Notes
34 pages
Architectures - 1: Mariagiovanna Sami
No ratings yet
Architectures - 1: Mariagiovanna Sami
27 pages
Virtual Sys
No ratings yet
Virtual Sys
4 pages
2 - Cpe410l2
No ratings yet
2 - Cpe410l2
10 pages
HPC Unit 2
No ratings yet
HPC Unit 2
72 pages
IIST AV 312 CAO Bsmanoj August December 2024 Lecture 5
No ratings yet
IIST AV 312 CAO Bsmanoj August December 2024 Lecture 5
25 pages
Computer Architecture
No ratings yet
Computer Architecture
667 pages
Introduction To Parallel Processing Architecture
No ratings yet
Introduction To Parallel Processing Architecture
31 pages
ComputerOrganization - Architecture Regular HO
No ratings yet
ComputerOrganization - Architecture Regular HO
8 pages
Lecture 9
No ratings yet
Lecture 9
26 pages
ECT206 Module1 Part2
No ratings yet
ECT206 Module1 Part2
55 pages
ECE 462/562 Computer Architecture and Design: T-TH 12:30-1:45 in HARV210 WWW - Ece.arizona - Edu/ Ece462
No ratings yet
ECE 462/562 Computer Architecture and Design: T-TH 12:30-1:45 in HARV210 WWW - Ece.arizona - Edu/ Ece462
39 pages
Computer Architecture John L. Hennessy PDF Download
No ratings yet
Computer Architecture John L. Hennessy PDF Download
88 pages
HPC Unit 1
No ratings yet
HPC Unit 1
65 pages
LECTURE - 1 - 2 Fall 2015 334
No ratings yet
LECTURE - 1 - 2 Fall 2015 334
49 pages
My Lecture 2
No ratings yet
My Lecture 2
25 pages
CPU Architecture Essentials
No ratings yet
CPU Architecture Essentials
9 pages
CA Final PDF
No ratings yet
CA Final PDF
13 pages
Computer Archi
No ratings yet
Computer Archi
58 pages
Computer Organization Architecture by Subarna Shakya Researchgate
No ratings yet
Computer Organization Architecture by Subarna Shakya Researchgate
58 pages
Computer-Architecture Hari Aryal Ioe
No ratings yet
Computer-Architecture Hari Aryal Ioe
163 pages
SAP1
No ratings yet
SAP1
40 pages
High Performance Computing: Jeremy R. Johnson
No ratings yet
High Performance Computing: Jeremy R. Johnson
19 pages
Chapter 4 (Processors and Memory Hierarchy)
100% (1)
Chapter 4 (Processors and Memory Hierarchy)
17 pages
DSP - Presentation - Sumit 5
No ratings yet
DSP - Presentation - Sumit 5
45 pages
l1 Intro PDF
No ratings yet
l1 Intro PDF
36 pages
CPU Evolution & Architecture
No ratings yet
CPU Evolution & Architecture
21 pages
Module 1 - Introduction
No ratings yet
Module 1 - Introduction
38 pages
Computer Architecture Notes
No ratings yet
Computer Architecture Notes
10 pages
ILP-Architectures Part I
No ratings yet
ILP-Architectures Part I
56 pages
Computer Architecture
No ratings yet
Computer Architecture
6 pages
Solihin Chapter 1 Slides
No ratings yet
Solihin Chapter 1 Slides
23 pages
Parallel Processing: 6.004x Computation Structures Part 3 - Computer Organization
No ratings yet
Parallel Processing: 6.004x Computation Structures Part 3 - Computer Organization
41 pages
Week0 Basics 2023
No ratings yet
Week0 Basics 2023
27 pages
Comparch 06 Advanced Concepts
No ratings yet
Comparch 06 Advanced Concepts
37 pages
WilliamStallings Chp3 PDF
No ratings yet
WilliamStallings Chp3 PDF
60 pages
Unit 1 Modern Processors
No ratings yet
Unit 1 Modern Processors
52 pages
HPC Unix
No ratings yet
HPC Unix
46 pages
HPC Graph
No ratings yet
HPC Graph
22 pages
HPC Parallel
No ratings yet
HPC Parallel
122 pages
HPC Pkgconfig
No ratings yet
HPC Pkgconfig
12 pages
HPC Iterative
No ratings yet
HPC Iterative
106 pages
HPC Performance
No ratings yet
HPC Performance
13 pages
HPC Cmakeshort
No ratings yet
HPC Cmakeshort
11 pages
HPC Linear
No ratings yet
HPC Linear
52 pages
HPC Intro
No ratings yet
HPC Intro
16 pages
HPC Programming
No ratings yet
HPC Programming
33 pages
Lec2 17
No ratings yet
Lec2 17
27 pages
HPC Git
No ratings yet
HPC Git
12 pages
HPC Arithmetic
No ratings yet
HPC Arithmetic
62 pages
HPC Scaling
No ratings yet
HPC Scaling
56 pages
HPC Debug
No ratings yet
HPC Debug
38 pages
Lec1 17
No ratings yet
Lec1 17
39 pages
Lec4 17
No ratings yet
Lec4 17
22 pages
HPC Nbody
No ratings yet
HPC Nbody
23 pages
CMake Tutorial for C++ Projects
No ratings yet
CMake Tutorial for C++ Projects
76 pages
0.1 Installation of R Packages
No ratings yet
0.1 Installation of R Packages
10 pages
Equity Structured Products Accumulator/ Decumulator
No ratings yet
Equity Structured Products Accumulator/ Decumulator
5 pages
000 Getstartedrpi Digital
100% (2)
000 Getstartedrpi Digital
116 pages
NTP: Synchronizing Internet Clocks
100% (1)
NTP: Synchronizing Internet Clocks
22 pages
Elective I (Math)
No ratings yet
Elective I (Math)
2 pages
Probability Theorems in Finance
No ratings yet
Probability Theorems in Finance
59 pages
Linear Models: Stability and Redundancy: 2.1 Singular Value Decomposition
No ratings yet
Linear Models: Stability and Redundancy: 2.1 Singular Value Decomposition
24 pages
Long-Range Dependency Effects in Network Timekeeping: David L. Mills University of Delaware
100% (1)
Long-Range Dependency Effects in Network Timekeeping: David L. Mills University of Delaware
33 pages
Numerical Methods in Finance. Part A. (2010-2011)
No ratings yet
Numerical Methods in Finance. Part A. (2010-2011)
23 pages
Asset-V1 HKUx+HKU 08x+1T2030+type@asset+block@Introduction To FinTech Course Syllabus 05142018
No ratings yet
Asset-V1 HKUx+HKU 08x+1T2030+type@asset+block@Introduction To FinTech Course Syllabus 05142018
2 pages
Flume User Guide
No ratings yet
Flume User Guide
48 pages
Snap Fit
No ratings yet
Snap Fit
9 pages
User Manual G Raid Shuttle 4 8 SSD
No ratings yet
User Manual G Raid Shuttle 4 8 SSD
33 pages
Power Over Ethernet (Poe) User Guide For Ex Series Switches: Published
No ratings yet
Power Over Ethernet (Poe) User Guide For Ex Series Switches: Published
130 pages
Classification - Shared Memory Systems
No ratings yet
Classification - Shared Memory Systems
3 pages
Sample Format Worker Namelist
No ratings yet
Sample Format Worker Namelist
1 page
DL105 I/O Specifications: Wiring Diagram and Specifications F1-130DR
No ratings yet
DL105 I/O Specifications: Wiring Diagram and Specifications F1-130DR
5 pages
Characteristics of Embedded System: Basic Software Techniques For Embedded Applications
No ratings yet
Characteristics of Embedded System: Basic Software Techniques For Embedded Applications
31 pages
The Dark Truth About Steve Jobs
No ratings yet
The Dark Truth About Steve Jobs
10 pages
2 - Hardware
No ratings yet
2 - Hardware
29 pages
Lec.3 Core of Embedded Systems
No ratings yet
Lec.3 Core of Embedded Systems
40 pages
Power Optimization in ALU Design
No ratings yet
Power Optimization in ALU Design
20 pages
Embedded Systems (Eceg-5702) : Lecture 5: Real Time Operating Systems
No ratings yet
Embedded Systems (Eceg-5702) : Lecture 5: Real Time Operating Systems
26 pages
x99 Chipset PCH Datasheet
No ratings yet
x99 Chipset PCH Datasheet
916 pages
Fix PC Stuck in BIOS: Solutions Guide
100% (1)
Fix PC Stuck in BIOS: Solutions Guide
7 pages
LAS - Computer Science 7 - Quarter 1 Module 1week 1-3
No ratings yet
LAS - Computer Science 7 - Quarter 1 Module 1week 1-3
3 pages
P2,5 Indoor 864x352cm - Rev2
No ratings yet
P2,5 Indoor 864x352cm - Rev2
3 pages
BCA Sem V - Unix
0% (1)
BCA Sem V - Unix
14 pages
SOC and Interface Block Diagram
No ratings yet
SOC and Interface Block Diagram
50 pages
Unit 2 - Microcontrollers and Programming
No ratings yet
Unit 2 - Microcontrollers and Programming
93 pages
Astm F606
No ratings yet
Astm F606
18 pages
Sunny Boy SWR 2500U: Installation and Operator's Manual
No ratings yet
Sunny Boy SWR 2500U: Installation and Operator's Manual
60 pages
Avr/8051 Universal Development Kit
No ratings yet
Avr/8051 Universal Development Kit
16 pages
RW 9924 0106
No ratings yet
RW 9924 0106
3 pages
Dell Laptop Dealer Price List - June 2021: Manikandan R - 7358807305
No ratings yet
Dell Laptop Dealer Price List - June 2021: Manikandan R - 7358807305
2 pages
Us F Online Catalog 2019
No ratings yet
Us F Online Catalog 2019
159 pages
DFT Material
No ratings yet
DFT Material
39 pages
新一代高速序列傳輸標準 (Pcie 5與usb 4) 講義 Usb4
No ratings yet
新一代高速序列傳輸標準 (Pcie 5與usb 4) 講義 Usb4
56 pages
8051 Interrupt Types
No ratings yet
8051 Interrupt Types
4 pages
Valvula Bola C-316
No ratings yet
Valvula Bola C-316
1 page
EXPANSION CARD - PPTX Ajin and Aljismer
No ratings yet
EXPANSION CARD - PPTX Ajin and Aljismer
41 pages