From CDOT Wiki
Revision as of 00:15, 26 November 2011 by Andrew (talk | contribs)
Jump to: navigation, search


Textbook chapter: 4 (though it's rather week on this subject).

Sound is a wave in the air. As any wave it has an amplitude and a frequency. The amplitude controls the loudness and the frequency controls the pitch.

People perceive higher frequency sounds easier than low frequency sounds. So even though the apmlitude is higher that doesn't mean necessarily that the sounds is louder - it also depends on the frequency.

Low frequency sound is also less directional, it's hard to figure out where exactly it's coming from.

Sound frequency is measured in Hertz (Hz). Humans can hear frequencies roughly between 20 Hz and 20k Hz.

You can use Audacity to generate tones of defferent frequencies to get a feel for what they sound like.

Uncompressed Sound

There are various ways of representing waves digitally. Usually the choice is to represent a wave digitally is as a sinus formula with some parameters. This doesn't work very well for sound waves because there are so many changes to the wave in each second.

The choice in an audio CD is to "sample" the wave repeatedly and record each sample as a number. An audio CD and a typical uncompressed digital audio file stores 44100 records for each second of audio.

The number of samples per second is also measured in Hz, so the 44100 above is typically represented as 44.1kHz. This measurement is not related to the frequency/pitch of the sound even though it's the same unit.

44.1kHz was chosen for raw audio because people cannot hear better than half that frequency, thus cannot tell that the reconstructed wave is different from the original. Some claim that they can distinguish a live performance from an Audio CD playback but even such claims are rare.

Compressed Sound

An audio CD contains as much data as a regular data CD - 640 or 700MB. That's used to represent 74 or 80 minutes of sound. That's more than 8MB per minute. Even with today's large harddrives and fast networks there are good reasons to compress sound - storage space is still an issue given enough audio, and for online streaming such sizes are still unacceptable even with high speed internet.

Just as with images there are lossless and lossy compression types.


The MP3 format is lossy but cuts down considerably the amount of disk space needed per second of audio. It first became popular when hard drives were small (hard drives had the capacity of an Audio CD) and it was the only accessible way to store musinc on a computer. Later the popularity of the format exploded because of Napster and these days every digital music player supports the format.

The algorithm for the compression is pretty complicated, we won't go though it in this course. Basically it relies on the fact that humans can't hear all sounds in all contexts.

OGG is a format essentially equivalent to MP3 and was developed because of patents surrounding use of MP3. Both OGG decoders (players) and encoders (creating tools) can be developed and distributed without a patent licence. Not all current players support OGG playback. The earlier iPods have been reported to be incapable of playing OGG files simply because their processors were too slow. OGG does require more processing power than MP3 to play.

MP3/OGG files have a bitrate similar to that of WAV files. Older encoders were only capable of generating constant bitrate (CBR) files. Newer encoders can generate variable bitrate (VBR) files where the encoder chooses the bitrate on the fly depending on how much information is in the current sound segment.

Because (1) the encoding process is much more resource-intensive than decoding and (2) the format is lossy - these formats are usually not used for recording or editing but only for distribution.

Other Compressed Audio Formats

There are lots of them out there, see for a list. One other worth mentioning here is AAC - a format mostly used by Apple products. It has some benefits over MP3 but is roughly as popular as OGG (both far less used than MP3.

Uncompressed Sound


These two are basically identical formats but given that WAV has been the supported-by-default format in windows - it's much more popular.

Sound is stored completely uncompressed, similar to how it's stored on Audio CDs. The format is not exactly the same though so conversion is necessary to transfer files from/to audio CDs.

Because of their age and lack of encoding/decoding requirement they are usually default formats for audio creation and editing tools.


Flac is a lossless compressed format. A wav file can be compressed with any generic compression tool so you can understand Flac playback as on-the-fly decompression, but it's been optimised to allow better audio compression, seeking to random places in the file, and storing audio metadata.

Playing a flac file is more CPU-intensive than playing a WAV file though so which format you choose is a balance between space required and decoding speed.

The speed of the encoder also varies. You can create better compressed Flac files by spending more time encoding them.

A flac file can be converted to any other lossless format such as WAV and back without any loss of audio data.

Electronic Sound


Most image formats (but not WAV and AIFF) can have metadata associated with them. This usually includes the artist, album, and track name, the year, and genre.

Most playback software can be used to edit the metadata. Format conversion software usually also copies the metadata into the appropriate fields.


audacity gz compress wav, mpr, flac