0 ratings0% found this document useful (0 votes) 618 views109 pagesThe Computer Music Tutorial
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
Curtis Roads
with John Strawn, Curtis Abbott, John Gordon, and Philip Greenspun
The Computer Music Tutorial
The MIT Press
Cambridge, Massachusetts
London, England© 1996 Massachusetts Institute of Technology
All rights reserved. No part of this book may be reproduced in any form by any
electronic or mechanical means (including photocopying, recording, or information
storage and retrieval) without permission in writing from the publisher.
This book was set in Times Roman by Asco Trade Typesetting Ltd., Hong Kong
and was printed and bound in the United States of America.
Library of Congress Cataloging-in-Publication Data
Roads, Curtis.
‘The computer music tutorial /Curtis Roads ... ft al.
P. om.
Includes bibliographical references and index.
—ISBN 0-262-68082-3 (paper)
1, Computer music—Instruction and study. 2. Computer
‘composition. I. Title.
MTS6.R6 1995,
780..285—de20 94-19027
cp
MN
0 9Contents
‘Foreword: New Music and Science ix
John Chowning
Pref ae:
Ackno ts xix
1 Digital Audio Concepts _5
with John Strawn
2 Music Systems Programming 49
Curtis Abbott
3 Introduction to Digital Sound Synthesis 85
swith John Strawn
4 Sampling and Additive Synthesis 115
5___Multiple Wavetable, Wave Terrain, Granular, and Subtractive
‘Synthesis 157wi Contents
6 Modulation Synthesis 213
7 Physical Modeling and Formant Synthesis 263
8 Waveform Segment, Graphic, and Stochastic Synthesis 317
Mixing and Signal Processing 347
Overview to Part II 349
9 Sound Mixing 353
10 _ Basic Concepts of Signal Processing 387
‘Sound Spatialization and Reverberation 449
IV_ Sound Analysis 493
12__Pitch and Rhythm Recognition 497
3. ‘trum Analysis 533
_Y_The Musician's Interface 611
a
14 Musical Input Devices 617
15 Performance Software _ 659
16 Music Editors 703
17 Music Languages 783
18 _ Algorithmic Composition Systems 819
19 _ Representations and Strategies for Algorithmic Composition 853vii Contents
VI_Internals and Interconnections 911
Overview to Part VI 913
20 _Internals of Digital Signal Processors 915
21_MIDI__969
22__ System Interconnections 1017
VII Psychoacoustics 1049
Overview to Part VII__1051
23 Psychoacoustics in Computer Music _ 1053
wlohe W. Gordon
Appendix 1071
Fourier Analysis 1073
with Philip GreenspunForeword: New Music and Science
With the use of computers and digital devices, the processes of music com-
position and its production have become intertwined with the scientific and
technical resources of society to a greater extent than ever before. Through
extensive application of computers in the generation and processing of
sound and the composition of music from levels of the microformal to the
macroformal, composers, from creative necessity, have provoked a robust
interdependence between domains of scientific and musical thought. Not
only have science and technology enriched contemporary music, but the
converse is also true: problems of particular musical importance in some
cases suggest or pose directly problems of scientific and technological im-
portance, as well. Each having their own motivations, music and science
depend on one another and in so doing define a unique relationship to their
mutual benefit.
The use of technology in music is not new; however, it has reached a
new level of pertinence with the rapid development of computer systems.
Modern computer systems encompass concepts that extend far beyond
those that are intrinsic to the physical machines themselves. One of the
distinctive attributes of computing is programmability and hence program-
ming languages. High-level programming languages, representing centuries
of thought about thinking, are the means by which computers become ac-
cessible to diverse disciplines.
Programming involves mental processes and rigorous attention to detail
not unlike those involved in composition. Thus, it is not surprising that
composers were the first artists to make substantive use of computers. There
were compelling reasons to integrate some essential scientific knowledge
and concepts into the musical consciousness and to gain competence in
areas which are seemingly foreign to music. Two reasons were (and are)
particularly compelling: (1) the generality of sound synthesis by computer,
and (2) the power of programming in relation to the musical structure and
the process of composition.‘Sound Synthesis
Foreword
Although the traditional musical instruments constitute a rich sound space
indeed, it has been many decades since composers’ imaginations have con-
jured up sounds based on the interpolation and extrapolation of those
found in nature but which are not realizable with acoustical or analog
electronic instruments. A loudspeaker controlled by a computer is the most
general synthesis medium in existence. Any sound, from the simplest to the
most complex, that can be produced through a loudspeaker can be synthe-
sized with this medium. This generality of computer synthesis implies an
extraordinarily larger sound space, which has an obvious attraction to com-
posers. This is because computer sound synthesis is the bridge between that
which can be imagined and that which can be heard.
With the elimination of constraints imposed by the medium on sound
production, there nonetheless remains an enormous barrier which the com-
poser must overcome in order to make use of this potential. That barrier is
one of lack of knowledge—knowledge that is required for the composer to
be able to effectively instruct the computer in the synthesis process. To some
extent this technical knowledge relates to computers; this is rather easily
acquired. But it mostly has to do with the physical description and percep-
tual correlates of sound. Curiously, the knowledge required does not exist,
for the most part, in those areas of scientific inquiry where one would most
expect to find it, that is, physical acoustics and psychobiology, for these
disciplines often provide either inexact or no data at those levels of detail
with which a composer is ultimately most concerned. In the past, scientific
data and conclusions were used to try to replicate natural sounds as a way
of gaining information about sound in general. Musicians and musician-
scientists were quick to point out that most of the conclusions and data were
insufficient. The synthesis of sounds which approach in aural complexity the
simplest natural sound demands detailed knowledge about the temporal
evolution of the various components of the sound.
Physics, psychology, computer science, and mathematics have, however,
provided powerful tools and concepts. When these concepts are integrated
with musical knowledge and aural sensitivity, they allow musicians, scien-
tists, and technicians, working together, to carve out new concepts and
physical and psychophysical descriptions of sound at levels of detail that are
of use to the composer in meeting the exacting requirements of the ear and
imagination.
As this book shows, some results have emerged: There is a much deeper
understanding of timbre, and composers have a much richer sound paletteForeword
with which to work; new efficient synthesis techniques have been discovered
and developed that are based upon modeling the perceptual attributes of
sound rather than the physical attributes; powerful programs have been
developed for the purposes of editing and mixing synthesized and/or digi
tally recorded sound; experiments in perceptual fusion have led to novel
and musically useful research in sound source identification and auditory
images; finally, special purpose computer-synthesizers are being designed
and built. These real-time performance systems incorporate many advances
in knowledge and technique.
Programming and Composition
Because one of the fundamental assumptions in designing a computer pro-
gramming language is generality, the range of practical applications of any
‘given high-level language is enormous and obviously includes music. Pro-
grams have been written in a variety of programming languages for various
musical purposes. Those that have been most useful and with which
‘composers have gained the most experience are programs for the synthesis
and processing of sound and programs that translate musical specifications
of a piece of music into physical specifications required by the synthesis
program.
‘The gaining of some competence at programming can be rewarding to a
‘composer as itis the key to a general understanding of computer systems.
Although systems are composed of programs of great complexity and w:
ten using techniques not easily learned by nonspecialists, programming abil-
ity enables the composer to understand the overall workings of a system to
the extent required for its effective use. Programming ability also gives the
composer a certain independence at those levels of computing where inde
pendence is most desirable—synthesis. Similar to the case in traditional
orchestration, the choices made in the synthesis of tones, having to do with
timbre and microarticulation, are often highly subjective. The process is
greatly enhanced by the ability of the composer to alter synthesis algorithms
freely.
The programming of musical structure is another opportunity which
programming competence can provide. To the extent that compositional
processes can be formulated in a more or less precise manner they may be
implemented in the form of a program. A musical structure that is based
upon some iterative process, for example, might be appropriately realized
by means of programming.xii
Foreword
But there is a less tangible effect of programming competence which
results from the contact of the composer with the concepts of a program-
ming language. While the function a program is to perform can influence
the choice of language in which the program is written, it is also true that a
programming language can influence the conception of a program's func-
tion. In a more general sense, programming concepts can suggest functions
that might not occur to one outside of the context of programming. This is
of signal importance in music composition, since the integration of pro-
‘gramming concepts into the musical imagination can extend the boundaries
of the imagination itself. That is, the language is not simply a tool with
which some preconceived task or function can be accomplished; it is an
extensive basis of structure with which the imagination can interact, as well.
Although computer synthesis of sound involves physical and psycho-
physical concepts derived from the analysis of natural sounds, when joined
with higher-level programming of musical structure the implications extend
far beyond timbre. Unlike the condition that exists in composition for tradi-
tional instruments where the relation of vibrational modes of an instrument
is largely beyond compositional influence, computer synthesis allows for the
composition of music’s microstructure.
In the context of computing, then, the microstructure of music is not
necessarily of predetermined form—associated with a specific articulation
of a particular instrument. Rather, it can be subjected to the same thought
processes and be as freely determined in the imagination of the composer as
every other aspect of the work.
John ChowningPreface
Music changes: new forms appear in infinite variety, and reinterpretations
infuse freshness into old genres. Waves of musical cultures overlap, diffus-
ing new stylistic resonances. Techniques for playing and composing music
meander with these waves. Bound with the incessant redevelopment in
music-making is an ongoing evolution in music technology. For every music
there is a family of instruments, so that today we have hundreds of instru-
ments to choose from, even if we restrict ourselves to the acoustic ones.
In the twentieth century, electronics turned the stream of instrument
design into a boiling rapids. Electrification transformed the guitar, bass,
piano, organ, and drum (machine) into the folk instruments of industrial
society. Analog synthesizers expanded the musical sound palette and
launched a round of experimentation with sound materials. But analog
synthesizers were limited by a lack of programmability, precision, memory,
and intelligence. By virtue of these capabilities, the digital computer pro-
vides an expanded set of brushes and implements for manipulating sound
color. It can listen, analyze, and respond to musical gestures in sophisti-
cated ways. It lets musicians edit music or compose according to logical
rules and print the results in music notation. It can teach interactively and
demonstrate all aspects of music with sound and images. New musical ap-
plications continue to spin out of computer music research.
In the wake of ongoing change, musicians confront the challenge of un-
derstanding the possibilities of the medium and keeping up with new devel-
opments. The Computer Music Tutorial addresses the need for a standard
and comprehensive text of basic information on the theory and practice of
computer music. As a complement to the reference volumes Foundations of
Computer Music (MIT Press, 1985) and The Music Machine (MIT Press,
1989), this book provides the essential background necessary for advanced
exploration of the computer music field. While Foundations of Computer
Music and The Music Machine are anthologies, this textbook contains all
new material directed toward teaching purposes.Preface
Intended Audience
The intended audience for this book is not only music students but also
engineers and scientists seeking an orientation to computer music. Many
sections of this volume open technical “black boxes,” revealing the inner
‘workings of software and hardware mechanisms. Why is technical informa-
tion relevant to the musician? Our goal is not to turn musicians into engi-
neers but to make them better informed and more skillful users of music
technology. Technically naive musicians sometimes have unduly narrow
concepts of the possibilities of this rapidly evolving medium; they may im-
port conceptual limitations of bygone epochs into a domain where such
restrictions no longer apply. For want of basic information, they may waste
time dabbling, not knowing how to translate intuitions into practical re-
sults. Thus one aim of this book is to impart a sense of independence to the
many musicians who will eventually set up and manage a home or institu-
tional computer music studio.
For some musicians, the descriptions herein will serve as an introduction
to specialized technical study. A few will push the field forward with new
technical advances. This should not surprise anyone who has followed the
evolution of this field, History shows time and again that some of the most
significant advances in music technology have been conceived by technically
informed musicians.
Interdisciplinary Spirit
Heritage
The knowledge base of computer music draws from composition, acoustics,
psychoacoustics, physics, signal processing, synthesis, composition, perfor-
mance, computer science, and electrical engineering. Thus, a well-rounded
pedagogy in computer music must reflect an interdisciplinary spirit. In this
book, musical applications motivate the presentation of technical concepts,
and the discussion of technical procedures is interspersed with commentary
on their musical significance.
One goal of our work has been to convey an awareness of the heritage of
computer music. Overview and background sections place the currentPreface
‘The second part focuses on digital sound synthesis. Chapters 3 through 8
cover the major synthesis methods, including both experimental and com-
mercially available methods.
Part II, Mixing and Signal Processing, contains four chapters that de-
mystify these sometimes arcane subjects, including sound mixing, filtering,
delay effects, reverberation, and spatial manipulation.
Analysis of sound, the subject of part IV, is on the ascendency, being key
to many musical applications such as sound transformation, interactive
performance, and music transcription. Chapters 12 and 13 cover the analy-
sis of pitch, rhythm, and spectrum by computer.
Part V addresses the important subject of the musician’s interface for
computer music systems. The physical devices manipulated by a performer
are the subject of chapter 14, while chapter 15 deals with the software that
interprets a performer's gestures. Chapter 16 is a survey of music editing
systems. Music languages are the subject of chapter 17. The last two chap-
ters in part V introduce the universe of algorithmic composition methods
and representations.
Part VI opens the lid of computer music systems, beginning with an
examination of the internals of digital signal processors in chapter 20.
Chapter 21 discusses the popular MIDI interface protocol, while chapter 22
looks at interconnections between computers, input devices, and digital
signal processing hardware.
The seventh part contains a single chapter on psychoacoustics by John
Gordon, which deals with the instrument of listening—human perception.
Knowledge of the basic concepts in psychoacoustics can help in several
aspects of computer music, including sound design, mixing, and interpreting
the output of signal analysis programs.
The final part of the book is a technical appendix introducing readers to
the history, mathematics, and overall design of Fourier analysis, in par-
ticular the fast Fourier transform—a ubiquitous tool in computer music
systems.
Notwithstanding the broad scope of this book, it was impossible to com-
press the art of composition into a single part. Instea:
many citations to composers and musical practices interwoven with techni-
cal discussions. Chapters 18 and 19 present the technical principles behind
algorithmic composition, but this is only one facet of a vast—indeed open-Preface
ended—discipline, and is not necessarily meant to typify computer music
composition as a whole.
We have surveyed composition practices in other publications. Compos-
ers and the Computer focuses on several musicians (Roads1985a). During
my tenure as editor of Computer Music Journal, we published many reviews
of compositions, interviews with, and articles by composers. These include
a “Symposium on Composition,” with fourteen composers partipating
(Roads 1986a), and a special issue on composition, Computer Music Journal
5(4) 1981. Some of these articles were reprinted in a widely available text,
The Music Machine (MIT Press 1989). Issue 11(1) 1987 featured micro-
tonality in computer music composition. Many other periodicals and books
contain informative articles on compositional issues in electronic and com-
puter music.
References and Index
Ina tutorial volume that covers many topics, itis essential to supply point-
ers for further study. This book contains extensive citations and a reference
list of more than 1300 entries compiled at the back of the volume. As a
further service to readers, we have invested much time to ensure that both
the name and subject indexes are comprehensive.
Mathematics and Coding Style
Since this Tutorial is addressed primarily to a musical audience, we chose to
present technical ideas in an informal style. The book uses as little mathe-
matical notation as possible. It keeps code examples brief. When mathemat-
ical notation is needed, it is presented with operators, precedence relations,
and groupings specified explicitly for readability. This is important because
the idioms of traditional mathematical notation are sometimes cryptic at
first glance, or incomplete as algorithmic descriptions. For the same
reasons, the book usually uses long variable names instead of the single-
character variables favored in proofs. With the exception of a few simple
examples, code examples are presented in a Pascal-like pseudocode
for readibility.
Appendix A presents advanced material and denser mathematical for-
For this reason we fall back on traditional mathematical notationxviii Preface
Corrections and Comments Invited
In the first edition of a large book covering a new field, there will inevitably
be errors. We welcome corrections and comments, and we are always seek-
ing further historical information. Please address your comments to the
author in care of The MIT Press, 55 Hayward Street, Cambridge, Massa-
chusetts 02142,Acknowledgments
This book was written over a period of many years. I wrote the first draft
from 1980 to 1986, while serving as Research Associate in computer music
at the Massachusetts Institute of Technology and Editor of Computer Music
Journal for The MIT Press. | am grateful to many friends for their assis-
tance during the period of revisions that followed.
Major sections of part III (Mixing and Signal Processing) and part IV
(Sound Analysis) were added during a 1988 stay as Visiting Professor in the
Department of Physics at the Universita di Napoli Federico II, thanks to an
invitation by Professor Aldo Piccialli. I am deeply grateful to Professor
Piccialli for his detailed comments on chapter 13 (Spectrum Analysis) and
appendix A (Fourier Analysis), and for his generous counsel on the theory
of signal processing.
Valuable feedback on part II (Sound Synthesis) came from composition
students in the Department of Music at Harvard University, where I taught
in 1989, thanks to Professor Ivan Teherepnin. I thank Professors Conrad
Cummings and Gary Nelson for the opportunity to teach at the Oberlin
Conservatory of Music in 1990, where I presented much of the book in
lecture form, leading to clarifications in the writing,
During spare moments { worked on part V (The Musician’s Interface) in
Tokyo at the Center for Computer Music and Music Technology, Kunitachi
College of Music, in 1991, thanks to the center's director Cornelia Colyer,
Kunitachi chairman Bin Ebisawa, and a commission for a composition
from the Japan Ministry of Culture. I presented the first courses based on
the completed text in 1993 and 1994 at Les Ateliers UPIC, thanks to Gerard
Pape and Iannis Xenakis, and the Music Department of the University of
Paris VIII, thanks to Professor Horacio Vaggione.
John Strawn, formerly my editorial colleague at Computer Music Journal,
contributed substantially to this project for several years. In between his
duties as a doctoral student at Stanford University, he wrote parts of chap-
ters | and 3. Later, he reviewed drafts of most chapters with characteristicxx
Acknowledgments
thoroughness Throughout this marathon effort, John was consulted on
myriad details via electronic mail. I am grateful to him for sharing his wide
‘musical and technical knowledge and sharp wit.
Curtis Abbott and John Gordon kindly contributed two fine chapters
that I am very pleased to include in the book. I would also like to thank
Phillip Greenspun of the MIT Department of Electrical Engineering and
Computer Science. Philip wrote a six-page text that served as the skeleton
for the central part of appendix A and carefully reviewed the draft
Many kind individuals helped by supplying information, documentation,
photographs, or by reading chapter drafts. I am profoundly indebted to
these generous people for their numerous suggestions, criticisms, and con-
tributions to this book: Jean-Marie Adrien, Jim Aiken, Clarence Barlow,
Frangois Bayle, James Beauchamp, Paul Berg, Nicola Bernardini, Peter
Beyls, Jack Biswell, Thom Blum, Richard Boulanger, David Bristow,
William Buxton, Wendy Carlos, René Caussé, Xavier Chabot, John
Chowning, Cornelia Colyer, K. Conklin, Conrad Cummings, James
Dashow, Philippe Depalle, Mark Dolson, Giovanni De Poli, Gerhard
Eckel, William Eldridge, Gianpaolo Evangelista, Ayshe Farman-Farmaian,
Adrian Freed, Christopher Fry, Guy Garnett, John W. Gordon, Kurt
Hebel, Henkjan Honing, Gottfried Michael Koenig, Paul Lansky, Otto
Laske, David Lewin, D. Gareth Loy, Max V. Mathews, Stephen McAdams,
Dennis Miller, Diego Minciacchi, Bernard Mont-Reynaud, Robert Moog,
F. R. Moore, James A. Moorer, Peter Nye, Robert J. Owens, Alan Peevers,
Aldo Piccialli, Stephen Pope, Edward L. Poulin, Miller Puckette, Frangois
Reveillon, Thomas Rhea, Jean-Claude Risset, Craig Roads, Xavier Rodet,
Joseph Rothstein, William Schottstaedt, Marie-Héléne Serra, John Snell,
John Stautner, Morton Subotnick, Martha Swetzoff, Stan Tempelaars,
Danie! Teruggi, Iréne Thanos, Barry Truax, Alvise Vidolin, Dean Wallraff,
David Waxman, Erling Wold, and Iannis Xenakis.
I would also like to express my thanks to the staff of The MIT Press
Journals—Janet Fisher, manager—publishers of Computer Music Journal.
This work would have been nigh impossible without their backing over the
past fourteen years.
I will always be grateful to Frank Urbanowski, Director of The MIT
Press, and Executive Editor Terry Ehling for their extraordinarily patient
and kind support of this project. I am also indebted to David Anderson,
Sandra Minkkinen, Deborah Cantor-Adams, and Chris Malloy for their
fine editing and production labors.
This book is dedicated to my mother, Marjorie Roads.Fundamental ConceptsOverview to Part I
‘Once upon a time—not too long ago—digital audio recording, synthesis,
processing, and playback were the privilege of laboratory specialists. Today
they are nearly as commonplace as television; virtually all computers are
equipped for digital audio. Digital audio, the subject of chapter 1, is central
to computer music. The sample—nothing more than a number—is the
atom of sound. Theory says that we can construct any sound emitted by a
loudspeaker by means of a series of samples that trace the pattern of a
sound waveform over time. But theory becomes reality only when strict
technical conditions concerning sampling rate and sample width are met. If
the sampling rate is too low, the result is a sound that is either muffled or
polluted by distortion. Sample width refers to the size of the digital word
used to represent a sample; if itis too small, the sound is literally chopped
by noise.
Chapter 2 introduces the art of programming. Knowing how to program
is the key to doing something really new in computer music. Thus a famil-
iarity with programming concepts is an essential topic for the student.
Organization of Part I
Part I introduces basic concepts in digital audio and programming that are
developed throughout the rest of the book. Chapters 1 and 2 cover a great
deal of material in summary form. Their goal is to convey a sense of the
scope of these fields as they have evolved, and to prepare readers for the
many chapters to follow.
The first chapter, by John Strawn and Curtis Roads, covers such basic
topics as the history of digital recording, the sampling theorem, aliasing,
phase correction, quantization, dither, audio converters, oversampling, and
digital audio formats. Portions of chapter 1 were originally published in
Keyboard magazine but have been extensively revised for this book.Part 1 Fundamental Concepts
Chapter 2, “Music Systems Programming” by Curtis Abbott, is an intro-
duction to the art of programming from a master practitioner. The author
traces the development of programming languages and the elements of pro-
gramming style. He summarizes the basic concepts of programming lan-
‘guages, their control and data structures, and describes the fundamentals of
object-oriented programming.1 Digital Audio Concepts
with John Strawn
Background: History of Digital Audio Recording
Experimental Digital Recording
Digital Sound for the Public
Digital Sound for Musicians
Digital Multitrack Recording
Basics of Sound Signals
Frequency and Amplitude
Time-domain Representation
Frequency-domain Representation
Phase
Importance of Phase
Analog Representations of Sound
Digital Representations of Sound
Analog-to-digital Conversion
Binary Numbers
Digital-to-analog Conversion
Digital Audio Recording versus MIDI Recording
Sampling
Reconstruction of the Analog Signal
Aliasing (Foldover)
‘The Sampling Theorem
Ideal Sampling Frequency7 Digital Audio Concepts
The merger of digital audio recording with computer music technology
creates a supple and powerful artistic medium. This chapter introduces the
history and technology of digital audio recording and playback. After
studying this introduction, you should be familiar with the basic vocabulary
and concepts of digital audio. In the interest of brevity we condense topics
that are large specialities unto themselves; for more literature sources see
D. Davis (1988, 1992).
Background: History of Digital Audio Recording
Sound recording has a rich history, beginning with Thomas Edison and
Emile Berliner’s experiments in the 1870s, and marked by V. Poulsen’s
Telegraphone magnetic wire recorder of 1898 (Read and Welch 1976). Early
audio recording was a mechanical process (figure 1.1).
Although the invention of the triode vacuum tube in 1906 launched the
era of electronics, electronically produced records did not become practical
until 1924 (Keller 1981). Figure 1.2 depicts one of the horn-loaded loud-
speakers typical in the 1920s.
Optical sound recording on film was first demonstrated in 1922 (Ristow
1993), Sound recording on tape coated with powdered magnetized material
was developed in the 1930s in Germany (figure 1.3), but did not reach the
rest of the world until after World War 2. The German Magnetophon tape
by the large cone over the piano were transduced into vibrations of a cutting stylus,
piercing a rotating wax cylinder.Part | Fundamental Concepts
=
Haut-Parleurs
AMPLION
Brevets E-A. GRAHAM
Amplion Libellule, Prix 135 francs
Auditions a Exposition Internationale de T.S.F., Arts Décoratifs, quai d'Orsay
Compagnie Francaise AMPLION
| 131, rue de Vaugirard, 131, PARIS (15°)
Figure 1.2 Amplion loudspeaker, as advertised in 1925.10
Part | Fundamental Concepts
the sampling theorem, which specifies the relation between the sampling rate
and the audio bandwidth (see the section on the sampling theorem later
in this chapter). This theorem is also called the Nyquist theorem after the
work of Harold Nyquist of Bell Telephone Laboratories (Nyquist 1928),
but another form of this theorem was first stated in 1841 by the French
mathematician Augustin Louis Cauchy (1789-1857). The British researcher
A. Reeves developed the first patented pulse-code-modulation (PCM) system
for transmission of messages in “‘amplitude-dichotomized, time-quantized”
(digital) form (Reeves 1938; Licklider 1950; Black 1953). Even today, digital
recording is sometimes called “PCM recording,” The development of infor-
mation theory contributed to the understanding of digital audio transmis-
sion (Shannon 1948). Solving the difficult problems of converting between
analog signals and digital signals took decades, and is still being improved.
(We describe the conversion processes later.)
In the late 1950s, Max Mathews and his group at Bell Telephone La-
boratories generated the first synthetic sounds from a digital computer.
The samples were written by the computer to expensive and bulky reel-to-
reel computer tape storage drives. The production of sound from the num-
bers was a separate process of playing back the tape through a custom-built
12-bit vacuum tube “digital-to-sound converter” developed by the Epsco
Corporation (Roads 1980; see also chapter 3).
Hamming, Huffman, and Gilbert originated the theory of digital error
correction in the 1950s and 1960s. Later, Sato, Blesser, Stockham, and Doi
made contributions to error correction that resulted in the first practical
systems for digital audio recording. The first dedicated one-channel digital
audio recorder (based on a videotape mechanism), was demonstrated by the
NHK, the Japan broadcasting company (Nakajima et al. 1983). Soon there-
after, Denon developed an improved version (figure 1.4), and the race began
to bring digital audio recorders to market (Iwamura et al. 1973).
By 1977 the first commercial recording system came to market, the Sony
PCM-1 processor, designed to encode 13-bit digital audio signals onto Sony
Beta format videocassette recorders. Within a year this was displaced by
16-bit PCM encoders such as the Sony PCM-1600 (Nakajima et al. 1978).
At this point product development split along two lines: professional and
“consumer” units, although a real mass market for this type of digital re-
cording never materialized. The professional Sony PCM-1610 and 1630
became the standards for compact disc (CD) mastering, while Sony PCM-
F l-compatible systems (also called EIAJ systems, for Electronics Industry
Association of Japan) became a de facto standard for low-cost digital audio
recording on videocassette. These standards continued throughout the
1980s.u
Digital Audio Concepts
‘on a I-inch videotape recorder (on the right).
The Audio Engineering Society established two standard sampling fre-
quencies in 1985: 44.1 and 48 KHz. They revised their specification in 1992
(Audio Engineering Society 1992a, 1992b). (A 32 KHz sampling frequency
for broadcast purposes also exists.) Meanwhile, a few companies developed
higher-resolution digital recorders capable of encoding more than sixteen
bits at higher sampling rates. For example, a version of Mitsubishi’s X-86
reel-to-reel digital tape recorder encoded 20 bits at a 96 KHz sampling
frequency (Mitsubishi 1986). A variety of high-resolution recorders are now
available.
Digital Sound for the Public
Digital sound first reached the general public in 1982 by means of the com-
pact disc (CD) format, a 12-cm optical disc read by a laser (figure 1.5). The
CD format was developed jointly by the Philips and Sony corporations
after years of development. It was a tremendous commercial success, selling
over 1.35 million players and tens of millions of discs within two years
(Pohiman1989). Since then a variety of products have been derived fromi“ Part! Fundamental Concepts
‘Figure 1.6 3M 32-track digital tape recorder, introduced in 1978,
Basics of Sound Signals
This section introduces the basic concepts and terminology for describing
sound signals, including frequency, amplitude, and phase.
Frequency and Amplitude
Sound reaches listeners’ ears after being transmitted through air from a
source. Listeners hear sound because the air pressure is changing slightly in
their ears. If the pressure varies according to a repeating pattern we say the
sound has a periodic waveform. If there is no discernible pattern it is called
znoise. In between these two extremes is a vast domain of quasi-periodic and
quasi-noisy sounds.
‘One repetition of a periodic waveform is called a cycle, and the fundamen-
tal frequency of the waveform is the number of cycles that occur per second.
As the length of the cycle—called the wavelength or period—increases, the
frequency in cycles per second decreases, and vice versa. In the rest of this
book we substitute Hz for “cycles per second” in accordance with standard
acoustical terminology. (Hz is an abbreviation for Hertz, named after the
German acoustician Heinrich Hertz.)15
Digital Audio Concepts
Figure 1.7 Studer D820-48 DASH digital multitrack recorder introduced in 1991
with a retail price of about $270,000.
Time-domain Representation
A simple method of depicting sound waveforms is to draw them in the form
of a graph of air pressure versus time (figure 1.8). This is called a time-
domain representation. When the curved line is near the bottom of the
graph, then the air pressure is lower, and when the curve is near the top of
the graph, the air pressure has increased. The amplitude of the waveform is
the amount of air pressure change; we can measure amplitude as the vertical
distance from the zero pressure point to the highest (or lowest) points of a
given waveform segment.16
Part 1 Fundamental Concepts
ano. 0 a
hy |
Tine
Figure 1.8 Time-domain representation of a signal. The vertical dimension shows
the air pressure. When the curved line is near the top of the graph, the air pressure
is greater. Below the solid horizontal line, the air pressure is reduced. Atmospheric
pressure variations heard as sound can occur quickly; for musical sounds, this entire
‘graph might last no more than one-thousandth of a second (ms).
An acoustic instrument creates sound by emitting vibrations that change
the air pressure around the instrument. A loudspeaker creates sound by
moving back and forth according to voltage changes in an electronic signal.
When the loudspeaker moves “in” from its position at rest, then the air
pressure decreases. As the loudspeaker moves “out,” the air pressure near
the loudspeaker is raised. To create an audible sound these in/out vibrations
‘must occur at a frequency in the range of about 20 to 20,000 Hz.
Frequency-domain Representation
Besides the fundamental frequency, there can be many frequencies present
in a waveform. A frequency-domain or spectrum representation shows the
frequency content of a sound. The individual frequency components of the
spectrum can be referred to as harmonics or partials. Harmonic frequencies
are simple integer multiples of the fundamental frequency. Assuming a
Figure 1.9 Time-domain and frequency-representations of four signals. (a) Time-
domain view of one cycle of a sine wave. (b) Spectrum of the one frequency compo-
nent in a sine wave. (c) Time-domain view of one cycle of a sawtooth waveform.
(d) Spectrum showing the exponentially decreasing frequency content of a sawtooth
wave. (¢) Time-domain view of one cycle of a complex waveform. Although the
waveform looks complex, when it is repeated over and over its sound is actually
simple—like a thin reed organ sound. (f) The spectrum of waveform (¢) shows
that it is dominated by a few frequencies. (g) A random noise waveform. (A) If the
waveform is constantly changing (each cycle is different from the last cycle) then we
hear noise, The frequency content of noise is very complex. In this case the analysis,
‘extracted 252 frequencies. This snapshot does not reveal how their amplitudes are
constantly changing over time.” Digital Audio Concepts
© 00% ©
100%
t
amp.
100%
© pase —e “e ates
o Phase 860"
»
100% °
amp, ame.
0%
Saas GWUaW EG 110 20-30 40 50” 60
topnomoe Harmonics
®
100%
Amp.
a 100%
=
° Phase ° Phase on
5 Frequency components
42-211 omited,
Harmonics —=
ow
1 an 212 252
Frequency components
Obras protegk18
Part 1 Fundamental Concepts
fundamental or first harmonic of 440 Hz, its second harmonic is 880 Hz,
its third harmonic is 1760 Hz, and so on. More generally, any frequency
‘component can be called a partial, whether or not it is an integer multiple
of a fundamental. Indeed, many sounds have no particular fundamental
frequency.
The frequency content of a waveform can be displayed in many ways. A
standard way is to plot each partial as a line along an x-axis. The height of
each line indicates the strength (or amplitude) of each frequency compo-
nent, The purest signal is a sine waveform, so named because it can be
calculated using trigonometric formulae for the sine of an angle. (Appendix
A explains this derivation.) A pure sine wave represents just one frequency
component, or one line in a spectrum. Figure 1.9 depicts the time-domain
and frequency-domain representations of several waveforms. Notice that
the spectrum plots are labeled “Harmonics” on their horizontal axis, since
the analysis algorithm assumes that its input is exactly one period of the
fundamental of a periodic waveform. In the case of the noise signal in figure
1.9g, this assumption is not valid, so we relabel the partials as “frequency
components.”
Phase
The starting point of a periodic waveform on the y or amplitude axis is its
initial phase. For example, a typical sine wave starts at the amplitude point
O and completes its cycle at 0. If we displace the starting point by 2x on the
horizontal axis (or 90 degrees) then the sinusoidal wave starts and ends at 1
on the amplitude axis. By convention this is called a cosine wave. In effect,
a cosine is equivalent to a sine wave that is phase shifted by 90 degrees
(figure 1.10).
Figure 1.10 A sine waveform is equivalent to a cosine waveform that has been
delayed or phase shifted slightly.0
Digital Audio Concepts
@)
\
©
|
<
©
[_.
Figure 1.11 The effects of phase inversion, (b) isa phase-inverted copy of (a). Ifthe
two waveforms are added together, they sum to zero (c).
When two signals start at the same point they are said to be in phase or
hase aligned. This contrasts to a signal that is slightly delayed with respect
to another signal, in which the two signals are out of phase. When a signal
Ais the exact opposite phase of another signal B (i.., it is 180 degrees out
of phase, so that for every positive value in signal A there is a corresponding
negative value for signal B), we say that B has reversed polarity with respect.
to A. We could also say that B is a phase-inverted copy of A. Figure 1.11
portrays the effect when two signals in inverse phase relationship sum.
Importance of Phase
It is sometimes said that phase is insignificant to the human ear, because
two signals that are exactly the same except for their initial phase are diffi-
cult to distinguish. Actually, research indicates that 180-degree differences
in absolute phase or polarity can be distinguished by some people under
laboratory conditions (Greiner and Melton 1991). But even apart from this
special case, phase is an important concept for several reasons. Every filter
uses phase shifts to alter signals. A filter phase shifts a signal (by delaying its
input for a short time) and then combines the phase-shifted version with the
original signal to create frequency-dependent phase cancellation effects thatDigital Audio Concepts
Phonograph
Microscopic grooves
‘untae iva phonograph
record
i
Tine
Weak electronic
signal
Proampitir
foo0 Ol Slightly ampitied
ooo Oo) signal
Amplifier —
Greatly ampiiied
eI signal
Alt pressure
variation (sound)
Loudspeaker U
Figure 1.12 The analog audio chain, starting from an analog waveform trans-
duced from the grooves of a phonograph record to a voltage sent to a preamplifier,
amplifier, loudspeaker, and projected into the air.
contain a continuous-time representation of the sound stored in the record.
As the needle glides through the groove, the needle moves back and forth in
lateral motion. This lateral motion is then changed into voltage, which is
amplified and eventually reaches the loudspeaker.
Analog reproduction of sound has been taken to a high level in recent
years, but there are fundamental limitations associated with analog record-
ing. When you copy an analog recording onto another analog recorder,
the copy is never as good as the original. This is because the analog record-
ing process always adds noise. For a first-generation or original recording,
this noise may not be objectionable. But as we continue with three or four
generations, making copies of copies, more of the original recording is lost
to noise. In contrast, digital technology can create any number of genera-
tions of perfect (noise-free) clones of an original recording, as we show later.22
Digital Representations of Sound
Part 1 Fundamental Concepts
In essence, generating or reproducing digital sound involves converting a
string of numbers into one of the time-varying changes that we have been
discussing. If these numbers can be turned into voltages, then the voltages
can be amplified and fed to a loudspeaker to produce the sound.
This section introduces the most basic concepts associated with digital
signals, including the conversion of signals into binary numbers, compari-
son of audio data with MIDI data, sampling, aliasing, quantization, and
dither.
Analog-to-digital Conversion
Let us look at the process of digitally recording sound and then playing it
back. Rather than the continuous-time signals of the analog world, a digital
recorder handles discrete-time signals. Figure 1.13 diagrams the digital au-
dio recording and playback process. In this figure, a microphone transduces
air pressure variations into electrical voltages, and the voltages are passed
through a wire to the analog-to-digital converter, commonly abbreviated
ADC (pronounced “A DC”). This device converts the voltages into a string
of binary numbers at each period of the sample clock. The binary numbers
are stored in a digital recording medium—a type of memory.
Binary Numbers
In contrast to decimal (or base fen) numbers, which use the ten digits 0-9,
binary (or base fwo) numbers use only two digits, 0 and 1. The term bit is an
abbreviation of binary digit. Table 1.1 lists some binary numbers and their
decimal equivalents. There are various ways of indicating negative numbers
in binary. In many computers the leftmost bit is interpreted as a sign indica~
tor, with a 1 indicating a positive number, and a 0 indicating a negative
number. (Real decimal or floating-point numbers can also be represented in
binary. See chapter 20 for more on floating-point numbers in digital audio
signal processing.)
The way a bit is physically encoded in a recording medium depends on
the properties of that medium. On a digital audio tape recorder, for exam-
ple, a I might be represented by a positive magnetic charge, while a 0 is
indicated by the absence of such a charge. This is different from an analogDigital Audio Concepts
AI \y Bepigssue
variations
‘Microphone
Voltage AAU
Preamplifier
votage Ay
Lowpass
antialiasing
titer
6 i vonage AW
+ rumors 4
Recording
Storage
Playback
=
Figure 1.13 Overview of digital recording and playback.oy]
Part 1 Fundamental Concepts
‘Table 1.1 Binary numbers and their decimal equivalents
Binary Decimal
0 0
1 1
10 2
n 3
100 4
1000 8
10000 16
100000 32
mun 65535
tape recording, in which the signal is represented as a continuously varying
charge. On an optical medium, binary data might be encoded as variations
in the reflectance at a particular location.
Digital-to-analog Conversion
Figure 1.14 depicts the result of converting an audio signal (a) into a digital
signal (b). When the listener wants to hear the sound again, the numbers are
read one-by-one from the digital storage and passed through a digital-to-
analog converter, abbreviated DAC (pronounced “dack”). This device, driven
by a sample clock, changes the stream of numbers into a series of voltage
levels. From here the process is the same as shown in figure 1.13; that is, the
series of voltage levels are lowpass filtered into a continuous-time waveform
(figure 1.14), amplified, and routed to a loudspeaker, whose vibration
causes the air pressure to change. Voila, the signal sounds again.
In summary, we can change a sound in the air into a string of binary
numbers that can be stored digitally, The central component in this conver-
sion process is the ADC. When we want to hear the sound again, a DAC
can change those numbers back into sound,
Digital Audio Recording versus MIDI Recording
This final point may clear up any confusion: the string of numbers gener-
ated by the ADC are not related to MIDI data. (MIDI is the Musical
Instrument Digital Interface specification—a widely used protocol for con-
trol of digital music systems; see chapter 21.) Both digital audio recorders
and MIDI sequencers are digital and can record multiple “tracks,” but they
differ in the amount and type of information that each one handles.25
Digital Audio Concepts
Figure 1.14 Analog and digital representations of a signal. (a) Analog sine wave-
form. The horizontal bar below the wave indicates one period or cycle. (b) Sampled
version of the sine waveform in (a), as it might appear at the output of an ADC.
Each vertical bar represents one sample. Each sample is stored in memory as a
number that represents the height of the vertical bar. One period is represented
by fifteen samples. (c) Reconstruction of the sampled version of the waveform in (b).
Roughly speaking, the tops of the samples are connected by the lowpass smoothing
filter to form the waveform that eventually reaches the listener's car.
When a MIDI sequencer records a human performance on a keyboard,
only a relatively small amount of controt information is actually transmitted
from the keyboard to the sequencer. MIDI does not transmit the sampled
waveform of the sound, For each note, the sequencer records only the start
time and ending time, its pitch, and the amplitude at the beginning of the
note. If this information is transmitted back to the synthesizer on which it
was originally played, this causes the synthesizer to play the sound as it did
before, like a piano roll recording. If the musician plays four quarter notes
at a tempo of 60 beats per minute on a MIDI synthesizer, just sixteen pieces
of information capture this 4-second sound (four starts, ends, pitches, and
amplitudes).
By contrast, if we record the same sound with a microphone connected to
a digital audio tape recorder set to a sampling frequency of 44.1 KHz,
352,800 pieces of information (in the form of audio samples) are recorded
for the same sound (44,100 x 2 channels x 4 seconds). The storage require-
ments of digital audio recording are large. Using 16-bit samples, it takes26
Part 1 Fundamental Concepts
‘over 700,000 bytes to store a 4-second sound. This is 44,100 times more data
than is stored by MIDI.
Because of the tiny amount of data it handles, an advantage of MIDI
sequence recording is low cost. For example, a 48-track MIDI sequence
recorder program running on a small computer might cost less than $100
and handle 4000 bytes/second. In contrast, a 48-track digital tape recorder
costs tens of thousands of dollars and handles more than 4.6 Mbytes of
audio information per second—over a thousand times the data rate of
MIDI.
‘The advantage of a digital audio recording is that it can capture any
sound that can be recorded by a microphone, including the human voice.
MIDI sequence recording is limited to recording control signals that indi-
cate the start, end, pitch, and amplitude of a series of note events. If you
plug the MIDI cable from the sequencer into a synthesizer that is not the
same as the synthesizer on which the original sequence was played, the
resulting sound may change radically.
‘Sampling
The digital signal shown in figure 1.14b is significantly different from the
original analog signal shown in figure 1.14a. First, the digital signal is de-
fined only at certain points in time. This happens because the signal has
been sampled at certain times. Each vertical bar in figure 1.14b represents
one sample of the original signal. The samples are stored as binary numbers;
the higher the bar in figure 1.14b, the larger the number.
The number of bits used to represent each sample determines both
the noise level and the amplitude range that can be handled by the sys-
tem. A compact disc uses a 16-bit number to represent a sample, but more
or fewer bits can be used. We return to this subject later in the section on.
“quantization.”
The rate at which samples are taken—the sampling frequency—is ex-
pressed in terms of samples per second. This is an important specification of
digital audio systems. It is often called the sampling rate and is expressed in
terms of Hertz. A thousand Hz is abbreviated 1 KHz, so we say: “The
sampling rate of a compact disc recording is 44.1 KHz,” where the “1
derived from the metric term “kilo” meaning thousand.
Reconstruction of the Analog Signal
‘Sampling frequencies around 50 KHz are common in digital audio systems,
although both lower and higher frequencies can also be found. In any case,”
Digital Audio Concepts
50,000 numbers per second is a rapid stream of numbers; it means there are
6,000,000 samples for one minute of stereo sound.
‘The digital signal in figure 1.13b does not show the value between the
bars. The duration of a bar is extremely narrow, perhaps lasting only
0.00002 second (two hundred-thousandths of a second). This means that if
the original signal changes “between” bars, the change is not reflected in the
height of a bar, at least until the next sample is taken. In technical terms, we
say that the signal in figure 1.13b is defined at discrete times, each such time
represented by one sample (vertical bar).
Part of the magic of digitized sound is that if the signal is bandlimited, the
DAC and associated hardware can exactly reconstruct the original signal
from these samples! This means that, given certain conditions, the missing
part of the signal “between the samples” can be restored. This happens
when the numbers are passed through the DAC and smoothing filter. The
smoothing filter “connects the dots” between the discrete samples (see the
dotted line in figure 1.13c). Thus, a signal sent to the loudspeaker looks and
sounds like the original signal.
Aliasing (Foldover)
The process of sampling is not quite as straightforward as it might seem.
Just as an audio amplifier or a loudspeaker can introduce distortion, sam-
pling can play tricks with sound. Figure 1.15 gives an example. Using the
input waveform shown in figure 1.1Sa, suppose that a sample of this wave~
form is taken at each point in time shown by the vertical bars in figure 1.156
(cach vertical bar creates one sample). As before, the resulting samples
of figure 1.15c are stored as numbers in digital memory. But when we at-
tempt to reconstruct the original waveform, as shown in figure 1.15d, the
result is something completely different.
In order to understand better the problems that can occur with sampling,
we look at what happens when we change the wavelength (the length of one
cycle) of the original signal without changing the length of time between.
samples. Figure 1.16a shows a signal with a cycle eight samples long, figure
1.16d shows a cycle two samples long, and figure 1.16g shows a waveform
with eleven cycles per ten samples. This means that one cycle takes longer
than the interval between samples. This relationship could also be expressed
as 11/10 cycles per sample.
Again, as each of the sets of samples is passed through the DAC and
associated hardware, a signal is reconstructed (figures 1.16c, f, and i) and
sent to the loudspeaker. The signal shown by the dotted line in figure 1.16c