24/8/2014 SoundLab
Sound Lab: Power Spectra
Background
Higher frequencies of air pressure vibrations on our ears cause us to hear higher sounds. A pure sine wave of vibrations sounds like a pure
musical 'pitch'. Music is all about the combinations of pitches, their rhythms, and the characteristics of individual instruments or voices that make
one pitch sound different depending on who or what is producing it. There are some simple rules, like we hear "the same note" an "octave" higher
if the vibrations are twice as fast. Octave jumps in popular music your parents knew about include Paul jumping up an octave in "I want to hold
your [up an octave] hand" and a nice decending octave in John William's "Darth Vader March". There's a lifetime's full of music theory, so you
should definitely get started now. For this lab, try Pitches in Music.
Objective
Look at some power spectra (PSs) of single notes or chords from a few sound sources to see if the PS representation lets us answer some
simple questions that are often easy for us humans using our ears.
Sound and Matlab Files
The SoundLab Directory has the files for this lab. For some reason Windows thinks they are URLs(!). To get what you need into Matlab, first
open the soundlab directory in your browser to see the filenames, and then either:
1. ALT and left-click the name, which should download it onto the desktop. Then drag and drop it into Matlab's
working directory. Or...
2. Right-click the name, choose 'save link as...', and notice it wants to be saved as .wav or .m, so that's good. Then
just navigate in that same download GUI through testing, Documents, to MATLAB. Save. This will go
more quickly for the next file. Don't know if it makes sense to do several files at once; I couldn't figure out how.
This is boring, right enough.
Preparation
1. This is a team-of-two or individual exercise, you choose. Either way, get organized as follows.
2. Be prepared to listen to sounds from sound files using Matlab's wavplay(). Before the lab if
necessary, bring in earbuds, headset, whatever fits the sound output of lab computers (CB uses a USB
headset, but the headphone output should work too). Also the PCs may have usable speakers. You
probably want to hear what you're analyzing, so get set before the lab. This code should work: For
sound file 'guitAmin.wav'in the working directory,
>> [sound, freq] = wavread('guitAmin.wav');
>> wavplay(sound, freq);
Or wavplay(wavread('guitAmin.wav'));should work, no?
3. Read the tutorial or your favorite textbook or on-line resource dealing with the power spectrum. Our
PSs will be one-dimensional vectors, like our sound input (only in frequency space).
4. Listen to the sound files, or some of them, to see what you are dealing with and to give you ideas for
questions to pursue (see next Section).
5. Make a plan. Decide what question you will investigate and what sound files you will use.
6. The sounds and a script we hope you'll find useful are all in the SoundLab Directory. Common self-
respect will lead you to make sure you understand what the script does in detail before you use it.
What Question?
The data is single notes and chords produced on different 'instruments', or sources. Some of the following questions
https://www.cs.rochester.edu/~brown/160_10_27_11/assignments/signal_proc/SoundLab.html 1/4
24/8/2014 SoundLab
are pretty easy for humans. Others require 'perfect pitch' to identify exactly the pitch of a note or the pitches in a
chord. Humans don't need perfect pitch to tell major from minor chords, but you'll be dealing with computers and
using your visual system to analyze outputs, so it gets more interesting. Also humans may need more than a second of
data to answer these questions, so it's entirely possible that this computational approach can outperform humans with
short inputs.
Following are some questions you might want to answer, ranked in CB's version of "likelihood of success". Keep on
asking and answering until you're out of time.
Given a file with a single source and single pitch, what pitch is the lowest peak in the PS (translate from
its frequency)? Does that match up with the filename?
Do higher-sounding pitches from this source have higher lowest-peaks in the PS? (they "should" but do
they really?).
Compare the character of the harmonics for different sources: are there qualitative differences between
the PS of the pitchpipe and the hum (hint: yes!)?
Thus, can you reliably tell when files with the same note are from different sources? Or tell that files with
different notes are from the same source?
What's the same and different between different vowel sounds on the same pitch? (the "ahhh" "eee" etc.
data files).
Given a file, is more than one pitch (a chord) being played?
Given a major triad chord and one other, can you identify the other chord and justify your reasoning?
Maybe other questions will occur to you.
Method and Resources
Your method is just to plot and "eyeball" the power spectra of the sound data. Then you can see whether your
powerful visual perceptions can identify characterics in the frequency domain representation of the signal to answer
your question.
The function in PowSpec.min the SoundLab Directory is a place to start, could be all you need. It takes a sound
file name (e.g. 'hiss.wav') and a number saying how much data to read. This number should be a power of 2,
since we're going to FFT the data. I find 2048 is a reasonable size, you could try 4096 I suppose, or 1024. The data
is read from the start of the sound, so 2048 points from files (all but the guitar sounds) digitized at 22050
samples/sec is about 1/10 second.
The guitar files are sampled at 44100 Hz, so 2048 is about 1/20 second. The overtones of the non-guitar sound are
probably pretty constant, so you'll get very similar power spectra after about 1/5 second (size 4096); that is, you'll
be using a longer sample of an unchanging sound to get the same information out, since "more of the same" sound
doesn't tell you anything new about its frequency content. But the guitar sound decays and so my guess is using a big
size, like 32768 or about 3/4 second, will include more of the later sound that has less information about what
differentiates it from other similar sounds (see below on overtones). So shorter sizes give probably more uniform
sounds, and earlier, shorter sizes are more representative of what makes a sound unique.
The function reads and plays the sound, plots it, computes the spectrum and plots its right-hand side, from 0 to the
highest frequency in the file. It also returns the vector of frequencies (x-axis of PS plot) and the PS data that was
plotted. These vectors can be used to zoom in on parts of the data. You can use holdto compare plots.
Example: get the sound pipeA.wavand PowSpec.minto your working Matlab directory, then:
>> [freqA powA] = PowSpec('pipeA.wav', 4096);
%plays, plots twice
>> plot(freqA(1:400),powA(1:400), '-g');
% zooms in on low-frequency part of the spectrum.
The following relation of pitches to frequencies should help you analyze your power spectra:
Pitches and Frequencies
Timbre
Notes from instruments have a complex frequency make-up, which is why glockenspiels don't sound like bassoons
or pitchpipes like guitar strings. We call the differnce the timbre of the instrument, and it depends on what the
instrument is made of (wood, brass), how the sound is generated (striking, bowing, single-reed like clarinet, double-
reed like oboe or bassoon, etc.) If we knew more about timbre, then MIDI files, electronic keyboards, or synthetic
https://www.cs.rochester.edu/~brown/160_10_27_11/assignments/signal_proc/SoundLab.html 2/4
24/8/2014 SoundLab
voices could be made to sound more like real instruments. But we don't, really -- it's still a mystery. Some of our
questions above are about timbre: can we tell from the power spectrum (or maybe the sound waveform) the
difference between a pitchpipe and a guitar or humming? Note the Wikipedia link just below has a later section on
timbre.
One difference in timbre is the presence and amplitude of overtones. Look at this link: The Harmonic Series. Here
you'll see the "modes" or harmonics of string vibration when both ends of the string are kept from moving. These are
analogous to the modes of a vibrating drumhead we saw two links for when we discussed basis functions.
You see that there's one mode with wavelength twice the string's length (top picture), one whose wavelength is = the
string's length (2nd picture), then ones with wavelengths 2/3, 2/4, etc. In terms of frequencies therefore, which are
inverses of wavelengths, the harmonic frequencies corresponding to a fundamental frequency f are f, 2f, 3f, 4f,... the
overtones occur at equal intervals in frequency space -- they are evenly spaced in the power spectrum!
To make things interesting, open pipes have more possible harmonics, including those with wavelengths of the
fundamental pipe wavelength divided by an odd number.
For a visualization, follow this link from the Wikipedia article: Interactive Visualization. You can see the overtones
(harmonics) alone or working together. Sounds from real instruments have a weighted mixture of these overtones or
harmonics.
Our hearing of tones is NOT an arithmetic, but a geometric, progression. The 'same note' an octave higher is
vibrating twice as fast. The octave frequency series is f, 2f, 4f, 8f, 16f,... This means that the increasing, linearly-
spaced harmonics sound as decreasing musical intervals from each other. The second harmonic (first overtone) is an
octave above, the third harmonic is a fifth (7 half-steps) above that, the fourth harmonic is a fourth (5 half-steps)
above that (two octaves above the original), the next is a major third above that, the next a minor third above that,
etc. (see the musical notation and charts further down in the article). This nice relation of common and pleasing
musical intervals with the (early part of the) harmonic series is what got Pythagoras (circa 500 BC) and some people
before him excited about the powers of the harmonic series.
That's a little more than we need to know; the bottom line is: one thing to look out for in the power spectra is these
evenly-spaced harmonics which could be generated by the particular sound source involved producing "one note".
If you want to know more, try Wikipedia on "pythagorean overtones", "even-tempered scale", "pythagorean
comma".
Last, the fundamental frequency (the pitch we humans associate with the note) of our recorded sounds is mostly in
octaves 2, 3, 4 of the pitch-frequency chart (except for maybe some of the guitar harmonics: *.harm*files). It
seems to us that the "fundamental frequency" (e.g. the 82 Hz of the guitar's 6th-string E) comes through pretty
weakly in some of our sounds. So be open to surprises, that's the fun part.
Data
Our sound files were made with the above questions in mind. We hope this data can support questions like the
above -- a cursory look shows some is promising but there may be technical problems with some sounds.
Our files are digitized into one channel (not stereo) at 22050 samples per second (44100 for the guitar sounds) and
are 2 seconds long. The function in the SoundLab directory, PowSpec(filename, size), grabs the initial
sizeelements in the file.
Here are the files (blank entry means no file).
SINGLE PITCHES:
SOURCE
PITCH PITCHPIPE HUM WHISTLE GUITAR
------------------------------------------------------------------------------
|
A | pipeA.wav humA.wav whistleA.wav guitA1,A2,A3 (.wav)
D | pipeD.wav humD.wav whistleD.wav
E | humE.wav guitE1,E2,E3,E4,E3a,E3aharm1,2 (.wav)
C | humC.wav
G | humG.wav
VOICED VOWELS and a CONSONANT
"Ahh" -- voiceA.wav
"Eee" -- avoiceE.wav
"Ooo" -- voiceO.wav
"Uuu" -- voiceU.wav
https://www.cs.rochester.edu/~brown/160_10_27_11/assignments/signal_proc/SoundLab.html 3/4
24/8/2014 SoundLab
"Sss" -- hiss.wav
CHORDS:
GUITAR:
Amajor: (E, A, E2, A2, C#, E3) -- guitAmaj.wav
Aminor: (E, A, E2, A2, C, E3) -- guitAmin.wav
Adom7 : (E, A, E2, G, C#, E3) -- guitA7th.wav
Aminor7: (E, A, E2, G, C, E3) -- guitAmin7.wav
A6: (E, A, E2, A2, C#, F#) -- guitA6th.wav
PITCHPIPE:
B minor 2nd: (B C) -- pipeBC.wav
The guitar chords show all 6 strings, starting with low E at 82Hz. Pitches are ascending from left to right. E2 and E3
are one and two octaves up from low E, A is the one just above E, A2 an octave above A, up to the F# two half-
steps above E3.
Procedure
Use your eyes (not your ears!) and the temporal and frequency domain representation of the sounds. Eyeball the
plots and try to identify characteristics in them that you could use to answer the question you've picked.
Relax and have fun.
What to Hand In
Ignoring teams, each individual should hand in a piece of paper with legible signature, telling us:
1. What did you learn? (Your question and some working answers would be good, but negative results also
could be interesting).
2. Suggestions for improving this lab.
Want More?
Fact: our ".wav" files are nothing but a long vector of doubles in the interval [-1, 1] (try help wavrecord.
There are several corollaries to this:
1. You can equalize volumes by scaling all ".wav" vectors so their largest (smallest) number is 1 (-1).
2. You can combine sounds by simple vector addition and normalization back to [-1, 1].
3. You can make your own sounds, like square wave, sawtooth wave, sine wave, hiss (random entries),
with a few lines of Matlab.
Wonder about more complex, time-varying sounds and their variation of frequencies through time? You want a
sound spectrogram, which is the first option in the Signal Processing assignment. It's easy. Guitar effects? They're
there too, and more.
Interested in 'real sounds or songs'? Problem is it's hard to find real (uncompressed) .wav files on the web (believe
me I tried). Despite their .wav extension, all I could find were .mp3 compressed files that Matlab can't deal with.
There are converters, I'm told. Or you can record your own.
Last Change: 05/06/2011: CB
https://www.cs.rochester.edu/~brown/160_10_27_11/assignments/signal_proc/SoundLab.html 4/4