MP3 is the audio portion of the motion picture group standard for media compression. An MP3 file can be many times smaller than the digitized audio signal it can reproduce.
In the mean time, let’s look at how these music files work to stuff all that music into such a small file.
When I was a teenager, the greatest invention in the world (at least one year anyway) was the “Walk-Man”. It was a radio and a tape player that could clip on your belt. It was sooooo small, about the size of a large paperback novel that you could carry it with you anywhere and listen to your tapes using headphones. Of course, if you wanted to carry your own music, you had to have a pretty hefty case for all of the tapes you might have wanted to listen to while you were out.
Things have definitely changed. Now you can carry hundreds of songs in the player, and the whole thing is the size of a deck of cards, maybe even thinner. The headphones have gotten much better too, but that is a different story.
This article is about how MP3’s work. In the late 1990s file-swapping services and the first portable MP3 players revolutionized music distribution. All of this was based on the new file format. Several years earlier, CD’s had come on the scene. CD’s were the first widely used digital music formats. Before the CD’s, tapes and records were analog forms for storing music or any other sound. In the Analog systems, the electrical (or even mechanical) signal picked up by a microphone was recorded. You could graph the signal as it was recorded or played back as a level of displacement of the speaker. Sound waves directly result from the mechanical displacement of the speaker diaphragm. In early electronic systems the speaker displacement was determined by the strength of the electric signal sent to that speaker. That really hasn’t changed, but the way the signal is stored has changed dramatically.
Digital storage is simply taking samples of the signal and storing numbers to represent the level of the signal, rather than some analog method that stores a faithful representation of the strength of the signal over the whole time the signal existed. Numbers can be stored in much less space and read by a digital computer. The storage can be in any form that has two states. You only need two states to represent either a 0 or a 1. With 0’s or 1’s you can store almost any number you want written in binary rather than base ten form. If each 0 or 1 is a bit, then 8 bits forms a byte and in order to get enough resolution for audio signals, two bytes are used to represent a signal level which allows 2 to the 16th power or 65536 discrete levels.
Now, rather than play back the whole signal, you tell your system the strength of the signal at each sample time. For a faithful representation of the original sound, you need to sample the signal pretty often: 44,100 samples per second per channel. That is 44kHz recorded information which allows about 22kHz as the maximum frequency that can be faithfully represented, just right for humans that can hear frequencies as high as 20kHz or so. That means that each hour of sound on a digital recording such as a CD must have 3600 seconds × 44,100 samples per second × 2 bytes per sample × 2 channels gives you 635 Mbytes. That is a huge file! (On the CD that means over 10 billion holes drilled by a laser).
Imagine downloading a song that took only 3 minutes to play. The file would be a 31Mbyte file. Just a “little” too large for most downloading in late 1990s.
MP3 is a compression system for music that reduces the number of bytes that must be stored to get the same audio signal when you replay it. MP3 is intended to reduce the number of bytes required by a factor of 10 to 14. That reduces our 30 Mbyte song to only 3 Mbytes, a much more manageable size.
Compression, in the case of sound files, is done by taking advantage of some of the characteristics of human hearing. For example, there are certain sounds that the human ear just can’t hear and there are certain sounds that the human ear hears much better than others and when two sounds are played simultaneously we usually hear only the louder one. Taking these facts into account, a technique called perceptual noise shaping allows compression of audio files. What this requires is breaking the sound file down into a mathematical representation, then comparing that representation to a psychoacoustic model and then throwing out what doesn’t match. This “breakdown” is mostly accomplished by using a fast Fourier transform (FFT). FFT provides the spectral strength showing which frequencies are most important in this file and which ones you don’t have to worry about for this file. After the FFT provides the spectral strengths, you can eliminate any frequency and sound pressure combination that does not fit into human hearing as well as any such combination that is just not important to this sound file. You can also pay more attention to the sounds or sound qualities that are usually used by humans. For example, you may want to be very careful with the frequency ranges between 1kHz and 4kHz, since those are the audio frequencies that humans hear best.
Using this technique, some of the audio has been removed. Fortunately, you probably won’t mind, since the parts removed were the ones that your ear would probably have screened out anyway. Any serious audiophile, will of course hear the difference, but that’s why MP3 is called “near CD quality” sound. But then, serious audiophiles claim to hear the difference between the earlier analog sound recordings and the new digital ones, noting that the digital managed to lose something.
What we have talked about so far is nowhere near enough to get a 10 times compression. By eliminating the sounds you wouldn’t hear anyway, you’ve made some reduction in size but you still need other compression mechanisms to get the MP3 10 times compression. (This is referred to as a ”lossy””compression since information is lost)
The usual compression methods work quite well to finish the job. These mechanisms include finding redundancy in the file and storing the redundant information only once. For example, in any stream of audio you will find repeating patterns. If the pattern is one that repeats exactly then the pattern can be stored once and then a look up table is created that allows the file to simply use the number that represents the repeating pattern. Unfortunately, this redundancy method does not work very well for music files until after the application of the psychoacoustic model, but after that has been applied, a lossless system like this redundancy reduction method works well. Usually the lossless redundancy reduction method applied to sound files is Huffman coding.
MP3 Streaming Audio Now You Can Add Streaming Media to Your Website
MP3 files have a format consisting of frames of data that have 384, 192, 576 or 1152 samples. Each frame has a 32 bit header and side information of 9, 17, or 32 bytes, depending on MPEG version and stereo/mono. Huffman encoded data requires this side information in order to interpret and decode.