KEMBAR78
I/O Systems: CS6461 - Computer Architecture Fall 2016 Morris Lancaster | PDF | Input/Output | Central Processing Unit
0% found this document useful (0 votes)
70 views50 pages

I/O Systems: CS6461 - Computer Architecture Fall 2016 Morris Lancaster

The document discusses different methods for input/output (I/O) in computer systems, including I/O-mapped I/O, memory-mapped I/O, and direct memory access (DMA). Memory-mapped I/O allows devices to be accessed like memory using load/store instructions, consuming address space. I/O-mapped I/O uses special instructions to access I/O ports without using address space. DMA allows high-speed devices to access memory directly without CPU involvement.

Uploaded by

闫麟阁
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views50 pages

I/O Systems: CS6461 - Computer Architecture Fall 2016 Morris Lancaster

The document discusses different methods for input/output (I/O) in computer systems, including I/O-mapped I/O, memory-mapped I/O, and direct memory access (DMA). Memory-mapped I/O allows devices to be accessed like memory using load/store instructions, consuming address space. I/O-mapped I/O uses special instructions to access I/O ports without using address space. DMA allows high-speed devices to access memory directly without CPU involvement.

Uploaded by

闫麟阁
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 50

CS6461 Computer Architecture

Fall 2016
Morris Lancaster
Adapted from Professor Stephen Kaislers Slides

Lecture 6 I/O Systems


I/O System Design
A typical program does three basic activities: input, computation, and
output.
Most I/O devices interface to the CPU in a fashion quite similar to
memory.
Many devices appear to the CPU as though they were memory devices.
To output data to the outside world, the CPU simply stores data into a
"memory" location and the data magically appears on some connectors
external to the computer.
Similarly, to input data from some external device, the CPU simply
transfers data from a "memory" location into the CPU
this "memory" location holds the value found on the pins of some external
connector.
An output port is a device that looks like a memory cell to the computer
but contains connections to the outside world.
An I/O port typically uses a latch rather than a flip-flop to implement the memory cell.
When the CPU writes to the address associated with the latch, the latch device
captures the data and makes it available on a set of wires external to the CPU and
memory system.

10/7/2017 CSCI 6461 Computer Architecture 2


I/O Types

There are three basic forms of input and output that a


typical computer system will use:
I/O-mapped I/O uses special instructions to transfer data
between the computer system and the outside world.
Memory-mapped I/O uses special memory locations in the
normal address space of the CPU to communicate with real-
world devices
Direct Memory Access (DMA) is a special form of memory-
mapped I/O where the peripheral device reads and writes data in
memory without going through the CPU.

10/7/2017 CSCI 6461 Computer Architecture 3


Memory-Mapped I/O

A memory-mapped peripheral device is connected to the CPU's


address and data lines exactly like memory:
whenever the CPU reads or writes the address associated with the
peripheral device, the CPU transfers data to or from the device.
a certain portion of the processor's address space is mapped to
the device, and communications occur by reading and writing
directly to/from those memory areas.
Memory-mapped I/O is suitable for devices which must move large
quantities of data quickly, such as graphics cards.
Memory-mapped I/O can be used either instead of or more often in
combination with traditional registers. For example, graphics cards
still use registers for control information such as setting the video
mode.
A potential problem exists with memory-mapped I/O, if a process is
allowed to write directly to the address space used by a memory-
mapped I/O device.

10/7/2017 CSCI 6461 Computer Architecture 4


Memory-Mapped I/O: Advantages

Principle Advantage: the CPU can use any


instruction that accesses memory to transfer data
between the CPU and a memory-mapped I/O device.
PDP-11 had no I/O instructions
I/O registers in devices were mapped to memory locations
The MOV instruction is the one most commonly used to
send and receive data from a memory-mapped I/O device,
but any instruction that reads or writes data in memory is
also legal.
For example, if you have an I/O port that is read/write, you
can use the ADD instruction to read the port, add data to the
value read, and then write data back to the port.

10/7/2017 CSCI 6461 Computer Architecture 5


Memory-Mapped I/O: Disadvantages

Principle Disadvantage: they consume addresses in


the memory map.
Generally, the minimum amount of space you can allocate
to a peripheral (or block of related peripherals) is a four
kilobyte page.
A few independent peripherals can wind up consuming a
fair amount of the physical address space.
A typical PC has only a couple dozen such devices, so this
isn't much of a problem.
However, some devices, like video cards, consume a large
chunk of the address space (e.g., some video cards have
32 megabytes of on-board memory that they map into the
memory address space).

10/7/2017 CSCI 6461 Computer Architecture 6


I/O Mapped IO

I/O-mapped input/output uses special instructions to access I/O


ports. (sounds redundant, doesnt it?)
Many CPUs do not provide this type of I/O, although the 80x86
does.
The Intel 80x86 family uses the IN and OUT instructions to provide I/O-
mapped input/output capabilities.
The 80x86 IN and OUT instructions behave somewhat like the MOV
instruction except they transmit their data to and from a special I/O
address space that is distinct from the memory address space.
The IN and OUT instructions use the following syntax (where al is a
register):
in( port, al ); // ... or AX or EAX, port is a constant in the range
out( al, port ); // 0..255.
in( dx, al ); // Or AX or EAX.
out( al, dx ); etc

10/7/2017 CSCI 6461 Computer Architecture 7


Intel 80x86

The 80x86 family uses a separate address bus for I/O


transfers.
This bus is only 16-bits wide, so the 80x86 can access a
maximum of 65,536 different bytes in the I/O space.
The first two instructions encode the port address as an eight-bit
constant, so they're actually limited to accessing only the first
256 I/O addresses in this address space.
This makes the instruction shorter (two bytes instead of three).
Unfortunately, most of the interesting peripheral devices
are at addresses above 255, so the first pair of
instructions above are only useful for accessing certain
on-board peripherals in a PC system.

10/7/2017 CSCI 6461 Computer Architecture 8


Intel 80x86

To access I/O ports at addresses beyond 255 you must


use the latter two forms of the IN and OUT instructions
above.
These forms require that you load the 16-bit I/O address into the
DX register and use DX as a pointer to the specified I/O address.
For example, to write a byte to the I/O address $378, you would
use an instruction sequence like the following:
mov( $378, dx );
out( al, dx );

10/7/2017 CSCI 6461 Computer Architecture 9


I/O Mapped IO

Principle Advantage: peripheral devices mapped to this


area do not consume space in the memory address
space.
This allows you to fully expand the memory address
space with RAM or other memory.
On the other hand, you cannot use arbitrary memory
instructions to access peripherals in the I/O address
space
You can only use the IN and OUT instructions.

10/7/2017 CSCI 6461 Computer Architecture 10


I/O Mapped IO

Principle Disadvantage: it is quite small.


Although most peripheral devices only use a couple of I/O
addresses (and most use fewer than 16-bit I/O addresses), a
few devices, like video display cards, can occupy millions of
different I/O locations (e.g., three bytes for each pixel on the
screen).
As noted earlier, some video display cards have 32
MBytes of dual-ported RAM on board.
Clearly we cannot easily map this many locations into
the 64K I/O address space.

10/7/2017 CSCI 6461 Computer Architecture 11


I/O Mapped IO

Polling

10/7/2017 CSCI 6461 Computer Architecture 12


Direct Memory Access - I

For very high-speed I/O devices the CPU may be too slow
when processing this data a byte (or word or double word) at a
time.
Such devices generally have an interface to the CPU/Memory
bus so they can directly read and write memory.
This is known as direct memory access since the peripheral
device accesses memory directly, without using the CPU as an
intermediary.
This often allows the I/O operation to proceed in parallel with
other CPU operations, thereby increasing the overall speed of
the system.
Note, however, that the CPU and DMA device cannot both use
the address and data busses at the same time. (in most
machines).

10/7/2017 CSCI 6461 Computer Architecture 13


Direct Memory Access - II

Concurrent processing only occurs if:


the CPU has a cache
is executing code, and
accessing data found in the cache (so the bus is free).
Even if the CPU must halt and wait for the DMA
operation to complete, the I/O is still much faster
since many of the bus operations during I/O or
memory-mapped I/O consist of instruction fetches or
I/O port accesses which are not present during DMA
operations.

10/7/2017 CSCI 6461 Computer Architecture 14


Direct Memory Access - III
A typical DMA controller consists of a pair of counters and other
circuitry that interfaces with memory and the peripheral device.
One of the counters serves as an address register. This counter
supplies an address on the address bus for each transfer.
The second counter specifies the number of transfers to complete.
Each time the peripheral device wants to transfer data to or from
memory, it sends a signal to the DMA controller.
The DMA controller places the value of the address counter on
the address bus.
At the same time, the peripheral device places data on the data
bus (if this is an input operation) or reads data from the data
bus (if this is an output operation).
After a successful data transfer, the DMA controller increments
its address register and decrements the transfer counter.
This process repeats until the transfer counter decrements to
zero.

10/7/2017 CSCI 6461 Computer Architecture 15


How Does DMA Work?

10/7/2017 CSCI 6461 Computer Architecture 16


Interrupt-Driven System
Interrupts allow devices to notify the CPU when they have data
to transfer or when an operation is complete, allowing the CPU
to perform other duties when no I/O transfers need its
immediate attention.
The CPU has an interrupt-request line that is sensed after
every instruction.
A device's controller raises an interrupt by asserting a signal on
the interrupt request line.
The CPU then performs a state save, and transfers control to
the interrupt handler routine at a fixed address in memory.
The CPU catches the interrupt and dispatches the interrupt
handler.
The interrupt handler determines the cause of the interrupt,
performs the necessary processing, performs a state restore,
and executes a return from interrupt instruction to return
control to the CPU.
The interrupt handler clears the interrupt by servicing the device.
Note that the state restored does not need to be the same state as
the one that was saved when the interrupt went off.

10/7/2017 CSCI 6461 Computer Architecture 17


Interrupt-Driven System

10/7/2017 CSCI 6461 Computer Architecture 18


Interrupts vs. Traps

I/O devices and the CPU can execute concurrently.


Each device controller is in charge of a particular device type.
Each device controller has a local buffer.
CPU moves data from/to main memory to/from local buffers
I/O is from the device to local buffer of controller.
Device controller informs CPU that it has finished its operation by
causing an interrupt.
Determines which type of interrupt has occurred:
polling
vectored interrupt system
Separate segments of code determine what action should be taken
for each type of interrupt
A trap is a software-generated interrupt cause by:
A user program error (e.g. division by zero)
A user request for a service by the OS

10/7/2017 CSCI 6461 Computer Architecture 19


Some Types of Intel Interrupts

10/7/2017 CSCI 6461 Computer Architecture 20


I/O Processors

Separate microprocessor-based system that handles all


I/O operations
CPU send I/O request to IOP
IOP converts this into a set of instructions specific to the
device to be accessed.
IOP sends instructions to the device controller which
retrieves or stores the data
Device controller reads data or stores data in the
specified memory locations.
IOP interrupts CPU when done.

10/7/2017 CSCI 6461 Computer Architecture 21


I/O Processors

10/7/2017 CSCI 6461 Computer Architecture 22


Another (Sort Of) Axiom: Its Also About the Buses!!

Types of Buses:
processor-memory buses: short and high speed
I/O buses : long and have many devices connected to them
backplane buses: allow processors, memory, and I/O devices to
coexist on single bus

Old Style: All On One Bus)

10/7/2017 CSCI 6461 Computer Architecture 23


Modern Computer System

Note: Most modern systems now have two busses: a


CPU-Memory bus and a Memory-High Speed Peripheral
Devices bus.

10/7/2017 CSCI 6461 Computer Architecture 24


Bus Issues

A bus is a set of control lines and a set of data lines


control lines indicate what type of info is contained on the data
lines
A bus transaction: two parts
send address and command
receive or send the data
I/O - in perspective of the processor
input- data from device to memory where processor can read it
output - data to a device from memory where processor wrote it
A bus creates a communication bottleneck
Limited by physical factors: length of bus, number of devices
Lots of devices increases propagation delays
But, it supports versatility since new devices can be added
easily

10/7/2017 CSCI 6461 Computer Architecture 25


Synchronous/Asynchronous Buses

Synchronous - clock in control lines


can be implemented easily in finite state machine
every device on bus must run at the same clock rate
because of clock skew, either short and fast, or long and
slow
processor/memory usually synchronous

Asynchronous- not clocked


can accommodate wide variety of devices
handshaking protocol: read-request data-ready,
acknowledge
handshaking slow, so can use synchronizer
asynchronous scales better with technology changes and
can support a wider variety of device response speeds, but
increased overhead
10/7/2017 CSCI 6461 Computer Architecture 26
A (Historical) Survey of Buses - I

ISA: Industry Standard Architecture


Old technology ran at 8 MHz, 16 bits transfer
42.4 Mbits/s maximum transfer rate
Rare today, replaced by EISA
MCA: Micro-Channel Architecture
IBMs attempt to compete with the ISA and seize the standard
32-bits transfer
Automatically configured cards (Plug-n-Play)
Not compatible with ISA
EISA: Extended Industry Standard Architecture
32 bits transfer
8.33 MHz cycle
Backward compatible with ISA
10/7/2017 CSCI 6461 Computer Architecture 27
A (Historical) Survey of Buses - II
VESA: Video Electronic Standards Association
32-bit bus
Found mostly on Intel 80486 machines
Relied on the 486 processor to function
So, people started switching to the PCI bus
PCI: Peripheral Component Interconnect
Up to 33 MHz and 960 Mbits/s transfer rate
16-bit architecture
Synchronous or Asynchronous
Plug-n-Play became reality
Ran (at that time) at half of system bus 66 MHz
PCI-X: PCI Extended
Ran at 133 MHz
64-bit transfer w/ maximum throughput of 1 GBits/s
Developed for Fibre Channel, Gigabit Ethernet, and Ultra3 SCSI

10/7/2017 CSCI 6461 Computer Architecture 28


A (Historical) Survey of Buses - III

AGP: Accelerated Graphics Port


A high speed 133 MHz PCI port
Used for high-speed 3D Graphics processors
Dedicated to a single device
IDE: Integrated Drive Electronics
Good performance at cheap cost
Mostly used for high-speed hard disks
Other names: ATA, ATAPI, EIDE, .
SCSI: Small Computer Systems Interface
Mostly used for slower speed peripherals, but include hard
disks
Speed varied: 80 - 640 Mbits/s
Later updated to Fast SCSI, Ultra SCSI, etc..

10/7/2017 CSCI 6461 Computer Architecture 29


A (Historical) Survey of Buses - IV

USB: Universal Serial Bus


Hot plug-n-play
Standard connectivity for many personal computer systems
today
Up to 127 devices chained together
Up to 480 Mbits/s (Version 2.0)
Firewire (IEEE 1394):
Developed by Apple
400 Mbits/s transfer rate
Hot plug-n-play
Other types of ports: IBM Parallel Port, Game Ports,
etc.

10/7/2017 CSCI 6461 Computer Architecture 30


SCSI

A set of standards for physically connecting and transferring data


between computers and peripheral devices.
SCSI is an intelligent, peripheral, buffered, peer to peer interface.
Every device attaches to the SCSI bus in a similar manner by hiding
the complexity of the physical format.
SCSI standard defines command sets for specific peripheral device
types.
Must have at least one host, but may have more; up to 16 devices
per bus in later versions.
See http://www.scsita.org/terms-and-terminology.html

10/7/2017 CSCI 6461 Computer Architecture 31


SCSI Bus Comparison

http://www.scsita.org/terms-and-terminology.html
10/7/2017 CSCI 6461 Computer Architecture 32
How Do Buses Fit In?
Pentium 4
processor

System bus (800 MHz, 604 GB/sec)


DDR 400 AGP 8X
(3.2 GB/sec) Memory (2.1 GB/sec) Graphics
controller
Main output
DDR 400 hub CSA
memory
(3.2 GB/sec) (north bridge) (0.266 GB/sec)
DIMMs 1 Gbit Ethernet
82875P

Serial ATA (266 MB/sec) Parallel ATA


(150 MB/sec) (100 MB/sec)
Disk CD/DVD

Serial ATA Parallel ATA


(150 MB/sec) (100 MB/sec)
Disk Tape
I/O
AC/97 controller
(1 MB/sec) hub
Stereo (south bridge)
(surround- (20 MB/sec)
82801EB 10/100 Mbit Ethernet
sound) USB 2.0
(60 MB/sec)

... PCI bus


(132 MB/sec)

10/7/2017 CSCI 6461 Computer Architecture 33


Bus Issues

10/7/2017 CSCI 6461 Computer Architecture 34


Who Cares About I/O?

Well:
CPU Performance increased 50% - 100% per year, now
slowing to 25-50% or less
Memory Performance increasing slowly, but slower than
CPU speed by order of magnitude
I/O Performance limited by mechanical delays (< 5% per
year performance increase IOs per second or MBs per
second)
Amdahls Law: System speed-up is limited by the
slowest part!
10% I/O & 10x CPU 5x Performance (lose 50%)
10% I/O & 100x CPU 10x Performance (lose 90%)
So, 100% speed-up in CPU performance buys you
little if you cant get the data on and off the disk
much faster or to and from CPU much faster.
10/7/2017 CSCI 6461 Computer Architecture 35
So, who cares? WE DO!!

I/O bottleneck:
Diminishing fraction of time in CPU
Diminishing value of faster CPUs
I/O Design is an important, but neglected topic in
computer architecture and even more neglected in the
study of the design and operation of operating systems.

10/7/2017 CSCI 6461 Computer Architecture 36


I/O Systems

The I/O system is shared by multiple programs using the processor


I/O systems often use interrupts to communicate information about
i/o operations - interrupts cause transfer to OS mode
Low level control of an I/O device is complex:
it requires managing a set of concurrent events
the requirements for the correct device control are often very detailed
OS guarantees I/O rights/security (disk read/write access)
OS provides abstractions for accessing devices
OS handles interrupts
OS makes I/O fair
OS needs ability to command I/O devices
Device must be able to tell OS when I/O device is done or error

10/7/2017 CSCI 6461 Computer Architecture 37


(Sort of) Axiom: Its all about the disks!!

We want fast disk accesses: fast swaps for virtual


memory, fast file reads, fast file writes
Magnetic Disks:
Floppy disks (8 in, 5.25 in, 3.5 in)
Hard disks (various sizes)
Rotating platter coated with magnetic surface and moveable
read/write head to access the disk
Some disks are beginning to feature one read/write head per
track (Note: Magnetic drums used to have this feature)
In a magnetic disk, storage is non volatile (unless you
experience a head crash!!)

10/7/2017 CSCI 6461 Computer Architecture 38


Whats a Hard Disk?

Hard disk platter is metal or glass


can be larger because it is rigid
has higher density because it can be controlled more precisely
hard disk has higher data rate because it spins faster
Hard disks
Collection of platters (1-15) each of which has two recordable disk surfaces
Stack is rotated at 3600-5400- 7200 RPM usually, but high performance disks faster
Diameter between 1-8 inches (used to be 24 inches)
Each disk surface has concentric circles - called tracks
1000-5000 tracks per surface
Track divided into sectors that contain data
64-200 sectors per track
Disk heads move together
Cylinder - all the tracks under the heads at a given point on all surfaces

10/7/2017 CSCI 6461 Computer Architecture 39


Hard Disks

10/7/2017 CSCI 6461 Computer Architecture 40


Hard Disks: Timing

seek - position head over proper track


seek time (min, max, avg)
rotational delay/latency - after head reached correct
track, time to wait for the desired sector to rotate
under the read/write head
average wait is half way around disk ~8.3ms
smaller diameter can spin at higher rates with less power
consumption
transfer time - time to transfer a block of bits (sector)
- function of sector size (2-15 MB/sec)
disk controller - handles control of disk and transfer
between disk and memory
controller time - overhead because of disk controller

10/7/2017 CSCI 6461 Computer Architecture 41


Hard Disk Parameters

Note: This is old data, but to give you an idea

10/7/2017 CSCI 6461 Computer Architecture 42


Hard Disk Performance

10/7/2017 CSCI 6461 Computer Architecture 43


Hard Disk Performance: Example

So, assume:
3600 RPM 60 RPS
Avg seek time ~ 9 msecs
100 sectors/track; 512 bytes/sector
tcontroller + tqueuing ~ 1 msec
So, what is the average time to read one sector?
Rate of transfer = 100 sectors/trk * 512 bytes/sector * 60 RPS =>
2.4 MBytes/s
ttransfer = 512 bytes/2.4 Mbytes/s = 0.2 msecs
trotation = 0.5/ 60 RPS 8.3 msecs
tdisk = 9 msecs + 8.3 msecs + 0.3 msecs + 1 msec = 18.5 msecs
So, the time to transfer is the smallest component of the I/O
time!!
Note: tqueuing gets longer the more requests are issued against
the disk
10/7/2017 CSCI 6461 Computer Architecture 44
I/O System: Queuing

10/7/2017 CSCI 6461 Computer Architecture 45


Design Based on Anticipated Workload

Data mining + supercomputing


Large files, sequential reads
Raw data transfer rate is most important
Transaction processing
Large files, but many small requests and random access
I/Os per second is most important
Timesharing File Systems
Small files, sequential access usually
I/Os per second is most important

10/7/2017 CSCI 6461 Computer Architecture 46


Hard Disks: Examples (circa 2000)

10/7/2017 CSCI 6461 Computer Architecture 47


I/O Design (H&P Example 3rd Ed.)

10/7/2017 CSCI 6461 Computer Architecture 48


I/O Design - I

10/7/2017 CSCI 6461 Computer Architecture 49


I/O Design - II

10/7/2017 CSCI 6461 Computer Architecture 50

You might also like