From CDOT Wiki
Revision as of 13:14, 12 February 2013 by Maronin (talk | contribs) (Added a useful link for explaining sampling in sound)
Jump to: navigation, search


Textbook chapter: 4 (though it's rather weak on this subject).

Sound is a wave in the air. As any wave it has an amplitude and a frequency. The amplitude controls the loudness and the frequency controls the pitch.

People perceive higher frequency sounds easier than low frequency sounds. So even though the apmlitude is higher that doesn't mean necessarily that the sounds is louder - it also depends on the frequency.

Low frequency sound is also less directional, it's hard to figure out where exactly it's coming from.

Sound frequency is measured in Hertz (Hz). Humans can hear frequencies roughly between 20 Hz and 20k Hz.

You can use Audacity to generate tones of defferent frequencies to get a feel for what they sound like.

Uncompressed Sound

There are various ways of representing waves digitally. Usually the choice is to represent a wave digitally is as a sinus formula with some parameters. This doesn't work very well for sound waves because there are so many changes to the wave in each second.

The choice in an audio CD is to "sample" the wave repeatedly and record each sample as a number. An audio CD and a typical uncompressed digital audio file stores 44100 records for each second of audio.

The number of samples per second is also measured in Hz, so the 44100 above is typically represented as 44.1kHz. This measurement is not related to the frequency/pitch of the sound even though it's the same unit.

44.1kHz was chosen for raw audio because people cannot hear better than half that frequency, thus cannot tell that the reconstructed wave is different from the original. Some claim that they can distinguish a live performance from an Audio CD playback but even such claims are rare.


These two are basically identical formats but given that WAV has been the supported-by-default format in windows - it's much more popular.

Sound is stored completely uncompressed, similar to how it's stored on Audio CDs. The format is not exactly the same though so conversion is necessary to transfer files from/to audio CDs.

Because of their age and lack of encoding/decoding requirement they are usually default formats for audio creation and editing tools.


Flac is a lossless compressed format. A wav file can be compressed with any generic compression tool so you can understand Flac playback as on-the-fly decompression, but it's been optimised to allow better audio compression, seeking to random places in the file, and storing audio metadata.

Playing a flac file is more CPU-intensive than playing a WAV file though so which format you choose is a balance between space required and decoding speed.

The speed of the encoder also varies. You can create better compressed Flac files by spending more time encoding them.

A flac file can be converted to any other lossless format such as WAV and back without any loss of audio data.

Compressed Sound

An audio CD contains as much data as a regular data CD - 640 or 700MB. That's used to represent 74 or 80 minutes of sound. That's more than 8MB per minute. Even with today's large harddrives and fast networks there are good reasons to compress sound - storage space is still an issue given enough audio, and for online streaming such sizes are still unacceptable even with high speed internet.

Just as with images there are lossless and lossy compression types.


The MP3 format is lossy but cuts down considerably the amount of disk space needed per second of audio. It first became popular when hard drives were small (hard drives had the capacity of an Audio CD) and it was the only accessible way to store musinc on a computer. Later the popularity of the format exploded because of Napster and these days every digital music player supports the format.

The algorithm for the compression is pretty complicated, we won't go though it in this course. Basically it relies on the fact that humans can't hear all sounds in all contexts.

OGG is a format essentially equivalent to MP3 and was developed because of patents surrounding use of MP3. Both OGG decoders (players) and encoders (creating tools) can be developed and distributed without a patent licence. Not all current players support OGG playback. The earlier iPods have been reported to be incapable of playing OGG files simply because their processors were too slow. OGG does require more processing power than MP3 to play.

MP3/OGG files have a bitrate similar to that of WAV files. Older encoders were only capable of generating constant bitrate (CBR) files. Newer encoders can generate variable bitrate (VBR) files where the encoder chooses the bitrate on the fly depending on how much information is in the current sound segment.

Because (1) the encoding process is much more resource-intensive than decoding and (2) the format is lossy - these formats are usually not used for recording or editing but only for distribution.

Other Compressed Audio Formats

There are lots of them out there, see for a list. One other worth mentioning here is AAC - a format mostly used by Apple products. It has some benefits over MP3 but is roughly as popular as OGG (both far less used than MP3.

Electronic Sound - MIDI

MIDI is actually much more than just a file format, it's an interchange format that's used in many electronic sound instruments such as synthesizers.

Instead of recording real sound (and often there is no real sound to record in the first place, the music is electronically generated) the audio is stored as a set of instructions, something like sheet music. The instructions include the instrument type, length and amplitude of each note. It is up to the player to combine (like an orchestra) all those instructions into a single song.

MIDI files on computers were much more popular when the internet was slow and the harddrives small because they are tiny compared even to the best compressed audio file. It was common to hear a MIDI playing when visiting an early webpage. These days support for MIDI on computers is spotty because they're not a big requirement and sound cards no longer have MIDI hardware on them like they used to.


Most audio formats (but not WAV and AIFF) can have metadata associated with them. This usually includes the artist, album, and track name, the year, and genre.

Most playback software can be used to edit the metadata. Format conversion software usually also copies the metadata into the appropriate fields.

For MP3 files ID3v1 is the old way of storing metadata inside MP3 files, but it doesn't support unicode so around the world it's not very popular. For example it will not work well even for latin-based languages such as german (they have some non-english characters). ID3v2 is completely different from ID3v1 and pretty much every player these days (software or hardware) supports this.

Audio on the web

Unfortunately not all browsers support the same formats, so you need more than one format to have your sound work on multiple browsers.


Bring headphones to the next lab.



This is a marked lab. Please submit it using Moodle (Lab4).

We're going to use this song to play with, because there are no copying rules associated with it:


On windows the version of Audacity available isn't the newest one (it's a challenge to compile) and because there is no package management a few plugins are not avaiable. But we can still do a few neat things with it in the lab. If you have your own linux box - by all means use that.

Part 1

  • Go to and download and install the newest windows version.
  • Download the pond-erosa puff ogg file.
  • Open that file in audacity. On windows the menu option is "open", on newer versions in linux it's "import".
  • Export the file as wav and as ogg.
  • Try to export it as mp3. You may get an error saying that lame is not found. Go back to the audacity page and download and install lame for audacity for windows.
  • Export as mp3 as well.
  • Export as flac as well.
  • Make a screenshot with the folder with all 4 files and audacity with the original open.
  • Submit that screenshot

Note that you would normally start with an uncompressed file (wav or flac) and use that to creat the compressed files, but we don't have the uncompressed version of this file so we're just going to pretend.

Part 2

We're going to blip out some words we don't like in the song.

  • Pick a word you don't like (for example "free") and find where it appears in audacity, make a note of the times in a text file. The lyrics are on the website, should make that a little easier.
  • Note: you can select a region of the song and play only that.
  • Add a new track (mono should do)
  • Generate a tone to overlap with the playback of the words you selected earlier. Try not to overlap other parts of the song.
  • Zoom out so the whole song is on one screen, and make a screenshot.
  • Submit your text file and the screenshot.

Compression & browser support

On linux you can compress things with tar cvzf dest.tar.gz sourcefile On windows you'll have to download and install winrar (or another app if you prefer).

  • Use the tool to compress every one of your files separately.
  • Make an html page with a table with the sizes of the originals and the compressed versions and two more columns.
  • In the third column use the HTML5 <audio> tag to allow the user to play that version of the audio file.
  • In the fourth column make a note of which browsers that worked in. In the lab you can test with Firefox, IE9, Chrome,and Safari.
  • If no audio files work in IE - try to upload your HTML and audio files to matrix and view the page there.
  • Submit the HTML page (without the big files).