ENG 6
Audio Signals
Bevan Baas, Andre Knoesen
Sound
• Sound is simply changing pressure waves moving through the air
– A microphone converts air pressure into an electrical signal
– A speaker converts an electrical signal into air pressure
– Typical human hearing range: 20 – 20,000 Hertz
• There are two primary ways of recording, storing,
transmitting, and representing audio signals
– Analog waveforms. The exact signal value is stored
for every instant in time.
– Digital waveforms. A rounded signal estimate (e.g., integers [0, 1, 2, …,
256]) is stored for only specific instants in time.
• This method is less susceptible to noise
• Representing sound waves digitally involves sampling, which is a process by
which the amplitude of the sound wave is recorded (or sampled) once every
specified interval.
• For a CD, this is 44.1kHz (44,1000 times each second). If you don't have
enough samples per second, you can no longer reconstruct the original
signal from the samples. This sampling frequency is commonly written as fs
– To accurately sample a signal which has frequencies up to fmax Hz, it must be
sampled by at least fs = 2 * fmax Hz, which is called the Nyquist frequency
2
Sound
• Representing sound in MATLAB is quite easy
• All of the samples of the amplitudes are stored in a one-
dimensional vector
• The only other necessary piece of information is the sampling
rate, which is how many samples are taken each second.
• If you're interested in this, take Signals and Systems or other
Electrical Engineering classes.
1
%make pure tone 0.8
0.6
clear;
fs = 44100; 0.4
len_time = 3;
volume = 0.5; 0.2
t = 1 : (len_time * fs);
len_t = length(t); 0
freq = 440;
w = sin(t/fs*2*pi*freq); -0.2
figure(1); -0.4
plot(w);
-0.6
axis([0 1000 -1.1 1.1]); xlabel('time');
-0.8
sound(w*volume, fs);
-1
3
0 100 200 300 400 500 600 700 800 900 1000
time
Sound
• To emphasize the point that the signal is sampled only at discrete
points in time, use the stem plot
• stem() has the same basic syntax as plot()
%make pure tone 0.8
0.6
clear;
fs = 44100; 0.4
len_time = 3;
volume = 0.5; 0.2
t = 1 : (len_time * fs);
len_t = length(t); 0
freq = 440;
w = sin(t/fs*2*pi*freq); -0.2
figure(1); -0.4
stem(w);
-0.6
axis([0 70 -1.1 1.1]); xlabel('time');
-0.8
sound(w*volume, fs);
-1
4
0 10 20 30 40 50 60 70
time
Sound Analysis
• Many signal properties are best analyzed, modified, or
synthesized in the “frequency domain.” Ex: piano
• x_freq = fft(x_time) Fast Fourier Transform.
Transforms a time-domain signal into the frequency
domain
• The right edge of the plot is the sampling frequency
• In general, the output of the FFT is complex so we often
look at the magnitude of the complex numbers, abs()
x 10
4
%make pure tone 6
clear;
fs = 44100; 5
len_time = 3;
volume = 0.5;
t = 1 : (len_time * fs); 4
len_t = length(t);
freq = 440;
w = sin(t/fs*2*pi*freq); 3
f1 = fft(w);
2
f2 = f1(1:(len_t/2)); % look at left half only
figure(1); plot(abs(f1));
figure(2); plot(abs(f2));
1
%axis([0 1000 -1.1 1.1]);
xlabel('frequency');
0
5
0 2 4 6 8 10 12 14
fsx 10
4
Sound Analysis
• The maximum value of the FFT is the sampling
frequency, fs, so we normally plot only the left
half which shows the signals that can be
analyzed by this sampling frequency (remember
the Nyquist Frequency?)
4
x 10
7
%make pure tone 6
clear;
fs = 44100; 5
len_time = 3;
volume = 0.5;
t = 1 : (len_time * fs); 4
len_t = length(t);
freq = 440;
w = sin(t/fs*2*pi*freq); 3
f1 = fft(w);
2
f2 = f1(1:(len_t/2)); % look at left half only
figure(1); plot(abs(f1));
figure(2); plot(abs(f2));
1
%axis([0 1000 -1.1 1.1]);
xlabel('frequency');
fs/2
0
6
0 1 2 3 4 5 6 7
frequency 4
x 10
Sound Analysis
• x_time = ifft(x_freq)
Inverse Fast Fourier Transform
transforms a freq-domain signal into the
time domain
• In general, the output of the IFFT is
complex so we need to be prepared to
deal with a vector of complex numbers in
the time domain
7
Sound With Harmonics
• Pure tones sound sterile. Richer tones are made by adding in
harmonic frequencies whose frequencies are related by some factor
to the root or fundamental frequency
• When two sin waves are added, the sound changes if the phase
(offset in time) of one of the components is changed
• Ex: double frequency = octave higher
% Generate a tone with harmonics and phase shift
clear;
3
fs = 44100;
len_time = 3;
volume = 0.5; 2
%===== Play a tone with a few harmonics
t = 1 : (len_time * fs); 1
len_t = length(t);
freq = 440;
phase1 = (0:(len_t-1))/len_t*15;
0
phase2 = (0:(len_t-1))/len_t*10;
w = sin(t/fs*2*pi*freq); % root
%w = w + sin(t/fs*2*pi*freq*1.25 + phase1) * 1.0; % major third
%w = w + sin(t/fs*2*pi*freq*1.50 + phase2) * 0.8; % perfect fifth -1
figure(1);
plot(w); axis([0 len_t -3 3]); xlabel('time'); -2
figure(2);
plot(w(1:(freq*5)));
-3
8
sound(w*volume, fs); 0 500 1000 1500 2000 2500
time
Sound With Harmonics
• Taken over a very long sequence, the envelope
of such signals can have interesting patterns
• Ex: 3 seconds @ 44,100 Hz = 132,300 samples
-1
-2
-3
9
0 2 4 6 8 10 12
time 4
x 10
Sound Processing: Example #1
t = linspace(0,2,44100*2+1); % note: this is 44,1000 samples per second,
y = sin(440*2*pi*t)
p=audioplayer(y,44100); % note that we specify 44,100 samples per second
play(p)
10
Music in Matlab
• We can import sound into MATLAB using the wavread( ) function, passing it
just the file name of a wave file: wavread(‘filename')
• Better yet, explicitly ask for the sample frequency too:
[y, fs] = wavread(‘filename’);
• It reads in a vector of all of the y values, the amplitudes. (You don't need the
time values to sample at since they're evenly spaced, you just need to know
how many y points,or SAMPLES, per second were taken).
• wavread() requires a `.wav' file. Wav files are uncompressed sound files,
whereas mp3s are compressed; you can convert between the two using
free software.
0.5
clear; 0.4
[a,fs] = wavread('ga.wav'); 0.3
%[a,fs] = m4aread('SomebodysCrying.m4a');
%a = mp3read('MustBeLove.mp3'); 0.2
len_a = length(a);
fprintf('Sample rate fs = %f samples/sec\n', fs); 0.1
Sample value
fprintf('Number of samples = %i\n', length(a));
fprintf('Seconds of music = %f\n', len_a/fs); 0
%plot waveform -0.1
figure(1); %clf;
plot(a); -0.2
xlabel('Sample number');
ylabel('Sample value'); -0.3
grid on;
%junk = input(' <Press return>'); -0.4
-0.5
0 5 10 11 15
Sample number 5
x 10
Lady Gaga, © 2009 Interscope Records
Playing Music
• Play a waveform on your computer with the sound()
command:
sound(waveform, fs)
• clear playsnd
Stops playing the waveform
• junk = input(' <Press return>');
Handy way to pause execution
0.5
0.4
clear; 0.3
[a,fs] = wavread('ga.wav'); 0.2
%[a,fs] = m4aread('SomebodysCrying.m4a');
%a = mp3read('MustBeLove.mp3'); 0.1
Sample value
len_a = length(a);
fprintf('Sample rate fs = %f samples/sec\n', fs); 0
fprintf('Number of samples = %i\n', length(a));
fprintf('Seconds of music = %f\n', len_a/fs); -0.1
sound(a, fs); % play all -0.2
%sound(a(1:5*fs), fs); % play 1st 5 seconds
junk = input(' <Press return>'); % pause -0.3
clear playsnd; % stop playing music
-0.4
-0.5
0 5 10 12 15
Sample number 5
Lady Gaga, © 2009 Interscope Records x 10
Tricks with Sound Processing
“….. you wanna know what you get when you play a country song backwards? You get
your house back. You get your dog back. You get your best friend Jack back. You get
your truck back. You get your hair back. Ya get your first and second wives back. Your
front porch swing. Your pretty little thing……..” Rascal Flatts
Using vector and matrix functions, we can do quite a bit of sound processing.
First, let's think about how to play a sound file backwards. Quite simply, we just
need to reverse the vector that stores the samples. Noting that wavread gives us
a column vector, we use the flipud function:
% Play backwards
[y f] = wavread('gaga.wav');
p=audioplayer(flipud(y),f)
play(p)
13
Tricks with Sound Processing:
Speed-up
We can also speed up a sound file:
•One way to play a file at double speed is to trick MATLAB into using twice as
many samples per second as it should, effectively crunching all of the data into
half of the time (and thus double the frequency):
% Double Speed, Option 1
[y f] = wavread('gaga.wav');
p=audioplayer(y,f*2);
play(p)
•Similarly, we could use only half of the samples:
% Double Speed, Option 2
[y f] = wavread('gaga.wav');
p=audioplayer(y(1:2:end),f);
play(p)
14
Sound Processing: Fade-out
We can also create fade‐outs very easily:
•To create a `linear' fade, we could multiply the first sample by 1, the
second sample by a little less than one, the middle sample by 0.5, the
second to last sample by just over 0, and the final sample by 0.
•In other words, we'll multiply all of the samples by a vector that looks
something like [1 .999 .998 ... .002 .001 0].
•Since we know that these numbers should be evenly spaced, begin at 1,
and end at 0, we use linspace:
% Fade out
[y f] = wavread('gaga.wav');
y2 = y(1:(5*f)); % Take the first 5 seconds
p=audioplayer(y2.*linspace(1,0,length(y2))',f);
play(p)
15
Echo
• Echo is a repeated waveform added to the
original at a later point in time
• a_echo = [gap_zeros ; a] + [a ; gap_zeros];
echo1 = fs;
gap = zeros(echo1,1);
a2 = [a ; gap] + [gap ; a]; % basic 1 echo
a2 = [a;gap;gap] + [gap;a;gap] + [gap;gap;a]; % basic 2 echo
a2 = [a;gap;gap] + [gap;a;gap]/3 + [gap;gap;a]/9; % faded 2 echo
16
Auto Scaling
• Best audio performance if signal is scaled to
largest magnitude
• Often called automatic gain control
• MaxMagnitude = max(max(a), min(a));
MaxMagnitude = max(max(a), max(-a));
• ScalingFactor = MaxAllowable / MaxMagnitude;
• AudioScaled = Audio * ScalingFactor;
17
Analysis Over Time and Frequency
• In most signals, the frequency content changes over
time
• Plots of frequency vs. time are called spectrograms
• These are extremely useful in many fields such as voice
recognition
• Three things are displayed: time, frequency, intensity
1
0.9
0.8
0.7
0.6
specgram(a);
Frequency
0.5
spectrogram();
0.4
0.3
0.2
0.1
0 18
1 2 3 4 5 6 7
Time 5
x 10
Analysis Over Time and Frequency
• The intensity of a signal at a certain frequency
and time can be shown with color (previous
slide) or height in a 3D figure (below)
19
http://en.wikipedia.org/wiki/Spectrogram
Analysis Over Time and Frequency
• Another common display method is to show
signal intensity on the vertical axis and use a
movie to show changes over time
400
350
400
300
350
250
400 300
200
350 250
150
300 200
100
250 150
50
200 100
0
0 100 200 300 400 500 600 700 800 900 1000
150 50
100 0
0 100 200 300 400 500 600 700 800 900 1000
50
0
20
0 100 200 300 400 500 600 700 800 900 1000
Digital Filtering
• Filtering is often cited as the most common
signal processing function
• Digital filtering can do all the frequency filtering
you are already familiar with:
– Treble
– Bass
– Bass boost
• It can also do many more:
– Notch filter. Remove a range of frequencies such as
noise or a singer’s voice.
– Bandpass filter. Extract a range of frequencies.
21
Digital Filtering
• We will focus on audio signal filtering but always
keep in mind the exact same techniques work on
any signal
– Earthquake waves
– Vibrations in a bridge
– Cardiac EKG
22
Digital Filtering
• Convolution is a very fundamental operation in signal
processing; maybe right after adds, subtracts, and
multiplication
• The most common type of digital filters (FIRs) are
implemented with convolution
• The calculation can be envisioned as a dot product while
the two sequences slide past each other
– When the two sequences do not fully overlap, “pad” either
sequence with zeros as needed—padding zeros does not
change the result
• Example:
x = [1 2 1]; h = [2 3];
out = conv(x,h)
= [2 7 8 3]
23
Digital Filtering
• FIR digital filters are implemented with the
convolution of two sequences:
– The data (think of the music)
– The filter coefficients
• The length of the filter’s output will be the length
of the data + length of the filter coefficients - 1
• The characteristics of the filter are entirely
determined by the coefficient values
24
Digital Filtering
• These coefficients produce a “low-pass”
filter which allows only low frequency
signals to pass through
– 51 coefficients in this case
0.4
0.3
0.2
0.1
-0.1
25
5 10 15 20 25 30 35 40 45 50
Digital Filtering
• This is the “frequency response” of this low-pass filter
– The right edge of the plot is fs/2, which is 44,100/2 = 22,050 Hz if
we are operating common music such as from CDs or MP3s
– The vertical scale is normally drawn on a log scale where every
20 db is a factor of 10 in signal size
• Attenuation or Gain (dB) = 20 * log10(Amplitude1/Amplitude2)
– This low-pass filter passes frequencies below 22,050 * 0.4 =
8820 Hz with no attenuation. On the other hand, signals above
22,050 * 0.5 = 11,025 Hz are passed with at least approximately
50 dB of attenuation—which is a factor of 10^(50/20) = 316.
0
Magnitude (dB)
-20
-40
-60
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 26
Normalized Frequency (×π rad/sample)
fs/2
firpm()
• There are many many ways to design filter coefficients.
Matlab’s firpm() is one of the easiest to use
• coeffs = firpm(N,F,A) returns a length N+1 friendly FIR
filter which has the best approximation to the desired
frequency response described by F and A in some
sense. F is a vector of frequency band edges in pairs, in
ascending order between 0 and 1. 1 corresponds to the
Nyquist frequency or half the sampling frequency. At
least one frequency band must have a non-zero width.
A is a real vector the same size as F which specifies the
desired amplitude of the frequency response of the
resultant filter.
• This was used for the previous example
– coeffs = firpm(50,[0 0.4 0.5 1], [1 1 0 0]);
stem(coeffs);
axis([1 51 -0.15 0.50]);
27
firpm()
• This was used for the previous example
– coeffs = firpm(50,[0 0.4 0.5 1], [1 1 0 0]);
• Example “high-pass” filter
– coeffs = firpm(50,[0 0.6 0.7 1], [0 0 1 1]);
• Example “band-pass” filter
– coeffs = firpm(50,[0 0.3 0.35 0.55 0.6 1], [0 0 1 1 0 0]);
• Example “band-stop” filter
– coeffs = firpm(50,[0 0.3 0.35 0.55 0.6 1], [1 1 0 0 1 1]);
• In general, firpm() can more closely match the requested
frequency and attenuation goals as it is given more
coefficients (a.k.a. “taps”) to use
28
freqz()
• Plotting the response of a filter is very
easy
• freqz(coeffs);
axis([0 1 -70 10]);
• freqz() returns two subfigures in one
figure. Ignore the lower plot for ENG 6
(it is called the “phase response”)
29
Example
• Remove all signals that are below 950 Hz
– High-pass filter
– fs = 44,100 Hz
– fs/2 = 22,050 Hz
– firpm parameter: f=0 0 Hz
f=? 950 Hz
f=? 1000 Hz (reasonable?)
f=1 22,050 Hz
– Corresponding magnitudes
0
0
1
1
– N = Number of coefficients: try a few!
– coeffs = firpm(50,[0 - - 1], [0 0 1 1]);
stem(coeffs);
axis([- - - -]);
30