Two lectures of material
Part I -- A Brief Overview
Data storage in CD format is not a simple thing. Typically, a user pictures the "1s" and "0s" in the memory of the computer as being directly transferred to "pits" and "bumps" on the CD disk. Unfortunately, it is far from that easy.
To begin with the incoming data is subjected to a series of coding operations. These coding operations add a number of additional parity bits to the data for error detection and correction purposes. The data is also subject to an interleaving process (which means that adjacent data on the disk is not adjacent data from the incoming file).
Additionally, the physical form of the data is changed (EFM coding) to eliminate the possibility of adjacent "1s". (This is done because it is the edges of the pit -- not the pit itself -- that represent l's in the data stream.)
A. Simple error detection and correction codes
Error detection and correction codes are fundamental to the operation of any digital storage system. There are literally thousands of such codes. These codes typically rely on using additional bits (usually called parity bits) to carry the error detection and correction information.
In a simple binary parity check, a parity bit is a single bit that represents whether the total number of "1s" in a particular data stream is even (1) or odd (0). (Modulo two addition).
For example, assume that you are setting a parity bit over all the digits of the following word.
1101 0000
The total number of "1s" is odd, so the parity bit would be 1. The word might then be written as
1101 0000 1
where the last digit is the parity bit.
Even simple binary parity checks can become quite complex if more than one parity bit is used. For example, you may elect to have two parity bits -- one on the first four bits of the word and one on the last four.
1 1 0 1 0 0 0 0 P1 P2 x x x x 1 x x x x 0If enough parity bits are used, then error can not only be detected -- they can also be corrected. For example, consider what happens if you use four parity bits. The first one is on the first four bits, the second one is on the second four bits, the third one is on the 1,2,5,6 bits and the fourth one is on the 2,3,6,7 bits.
1 1 0 1 0 0 0 0 P1 P2 P3 P4 x x x x 1 x x x x 0 x x x x 0 x x x x 1
Now, assume that there was an error in the final bit.
1 2 3 4 5 6 7 8 P1 P2 P3 P4 1 1 0 1 0 0 0 1 1 0 0 1 x x x x 1 x x x x 1 x x x x 0 x x x x 1
Parity bit P1 would agree with the parity bit in the transmitted word, P2 would NOT agree, P3 and P4 would agree. Since P2 is the only parity bit not agreeing with the transmitted word -- then the error must be in the 8th bit.
Unfortunately, the majority of error-detection and correction algorithms used in CD players are not as simple as the binary check codes discussed above. Although an overview of these codes will be presented, in-depth analysis of the codes is beyond the scope of this course. (Interested students should consult more advanced references, such a W. Peterson, Error-Correcting Codes, MIT Press)
B. Simple interleaving
Interleaving is a very simple and powerful idea. To illustrate interleaving, assume that you have a frame consisting of several characters of information,
U N I V E R S I T Y O F W A S H I N G T O N
Assume that you spit on the disk and destroy several of the characters.
R S I T Y O F W A S H I N G T O N
The first word is then very hard to reconstruct! However, you can take the original frame and scramble it as,
U N I V E R S I T Y O F W A S H I N G T O N
O N S T H U G R F S I I O T W N N V E I Y A
Then you can damage it,
U G R F S I I O T W N N V E I Y A
Then you can unscramble it,
U N I V E R I Y O F W A S I G T N
It is much easier to "interpolate" or "guess" the missing letters. (A bit like the later stages of "hangman"!)
C. Concealment
In practice some errors are so large that they cannot be corrected by the error-detection and correction algorithms. Unless these errors are handled by some other means, they can result in audible clicks in the audio output. In order to avoid these clicks, several methods are used to conceal uncorrectable errors:
Interpolation: In this technique, some average is constructed using the valid data around an error. This average is then substituted in for the erroneous data. Since most music (with the possible exception of heavy metal!) is continuous -- this method works well for concealing relatively short errors.
Muting: Muting is a last ditch technique -- as it effectively creates a brief period of silence in the audio train. However, it is not effective to simply set all the binary digits to zero --as this produces exactly the click that we are trying to avoid! Instead, the volume is faded out and then back in again to conceal the error.
D. EFM modulation
EFM means Eight to Fourteen Modulation and is an incredibly clever way of reducing errors. The idea is to minimize the number of 0 to 1 and 1-0 transitions -- thus avoiding small pits. In EFM only those combinations of bits are used in which more than two but less than 10 zeros appear continuously.
For example, a digital 10 given as a binary 0000 1010 is an EFM 1001 0001 0000 00
(See attached table for the complete list of EFM codes[1].)
The use of EFM coding means that pits come in discrete lengths ranging from 3 bits long (often written 3T) to 11 bits long (11T).
As the laser beam scans across these pits, a very distinct RF signal is formed. The shortest wavelength in this signal (highest frequency) is produced by the 3T pits. The longest wavelength in the signal (lowest frequency) is produced by the 11T pits. The zero crossings of the RF signal represent the edges of the pits -- and thus the binary "1s" in the data stream[2]. (Notice that the longer the wavelength, the larger the amplitude of the signal.)
It is common to display the photodetector output on a scope with a conventional trigger. This results in a display where the nine possible frequencies (3T to 11T) all add up on top of each other. This type of display is termed an "eye" pattern and provides valuable information about the various alignment parameters of the CD player. Notice that the relationship between size and wavelength is very distinct in the eye pattern[3].
The RF output is converted to a square wave, and then phase locks a clock with the period T. The CD player then begins to hunt for the characteristic start of frame symbol, which is three transitions separated by 11T. (100000000001000000000010 + 3 merge bits) Then, the player isolates the 33 17T symbols, and then kicks off the 3T merge bits -- leaving the 33 14T active symbols.
PART II -- IEC - 908 ... The BIG Picture
The encoding of digital audio on CD player is governed by IEC 908. This standard is available in the library for your perusal. (Notice that every other page is missing -- this is because the standard is written in both French and English and I took out the French pages!) This information is also covered more generally in Chapter 3 of Ken Pohlman's book The Compact Disk Handbook, (A-R Editions, 1992).
CD players use parity and interleaving techniques to minimize the effects of an error on the disk. In theory, the combination of parity and interleaving in a CD player can detect and correct a burst error of up to 4000 bad bits -- or a physical defect 2.47 mm long. Interpolation can conceal errors up to 13,700 or physical defects up to 8.5 mm long.
The entire error detection and correction algorithm is summarized on the following table.
This is Figure 12 from the IEC 908 standard. This table will be described in more detail below.
The original musical signal is a waveform in time. A sample of this waveform in time is taken and "digitized" into two 16-bit words, one for the left channel and one for the right channel.
For example, a single sample of the musical signal might look like:
L1 = 0111 0000 1010 1000
R1 = 1100 0111 1010 1000
Six samples (six of the left and six of the right for a total of twelve) are taken to form a frame.
L1 R1 L2 R2 L3 R3 L4 R4 L5 R5 L6 R6
The frame is then encoded in the form of 8-bit words. Each 16-bit audio signal turns into two 8-bit words.
L1 LI R1 R1 L2 L2 R2 R2 L3 L3 R3 R3 L4 L4 R4 R4 L5 L5 R5 R5 L6 L6 R6 R6
This gives a grand total of 24 8-bit words. This is column two on the IEC 908 table.
The even words are then delayed by two blocks and the resulting "word" scrambled. This delay and scramble is the first part of the interleaving process.
The resulting 24 byte word (remember, it has an included two block delay -- so some symbols in this word are from blocks two blocks behind) has 4 bytes of parity added. This particular parity is called "Q" parity. Parity errors found in this part of the algorithm are called C1 errors. More on the Q parity later.
Now, the resulting 24 + 4Q = 28 bytes word is interleaved. Each of the 28 bytes is delayed by a different period. Each period is an integral multiple of 4 blocks. So the first byte might be delayed by 4 blocks, the second by 8 blocks, the third by 12 blocks and so on. The interleaving spreads the word over a total of 28 x 4 = 112 blocks.
The resulting 28 byte words are again subjected to a parity operation. This generates four more parity bytes called P bytes which are placed at the end of the 28 bit data word. The word is now a total of 28 + 4 = 32 bytes long. Parity errors found in this part of the algorithm are called C2 errors. More on the P parity later too.
Finally, the another odd-even delay is performed -- but this time by just a single block. Both the P and Q parity bits are inverted (turning the "1s" into "0s") to assist data readout during muting.
An 8-bit subcode is then added to the front end of the word. The subcode specifies such things as the total number of selections on the disk, their length, and so on. More on this later.
Next the data words are converted to EFM format. EFM means Eight to Fourteen Modulation and is an incredibly clever way of reducing errors. The idea is to minimize the number of 0 to 1 and 1-0 transitions -- thus avoiding small pits. In EFM only those combinations of bits are used in which more than two but less than 10 zeros appear continuously.
For example, a digital 10 given as a binary 0000 1010 is an EFM 1001 0001 0000 00
Each frame finally has a 24-bit synchronization word attached to the very front end -- (just for completeness the word is (100000000001000000000010) and each group of 14 symbols is then coupled by three merge bits.
These merge bits are chosen to meet two goals:
1. No adjacent 1's from neighboring EFM encoded words
Remember that there are lots of EFM words which end in "1" -- as one example, all the eight-bit binary words from 128 to 152 end in "1". Similarly, there are EFM words that start in "1". Thus, it is relatively straightforward to have to have adjacent EFM words that create adjacent "1s".
For example -- binary 128 and binary 57
10000000 in EFM is 00111001 in EFM is
01001000100001 10000000001000
2. The digital sum value is kept near zero
Minimizing the digital sum value is just an attempt to keep the average number of "0's" and "1's" about the same. The value of +1 is assigned to the "1" states and the value of -1 is assigned to the "0" states. Then, the value of the merge bit is chosen to maintain the average near zero.
SO! The final frame (which started at 6*16*2 = 192 data bits) now contains:
1 sync word 24 bits 1 subcode signal 14 bits 6*2*2*14 data bits 336 bits 8*14 parity bits 112 bits 34*3 merge bits 102 bits GRAND TOTAL 588 bits
Part III - IEC 908 -- Now for the little details
A. P and Q parity
The eight parity symbols are calculated from the following equations:
Hp . Vp = 0
Hq . Vq = 0
Definitions for H and V are as follows.
V is pretty straightforward, just being the shifted and interleaved data bits in the data word (including the parity bits). However, H is more complex. H is defined on the Galois field GF (28) by the polynomial:
(The 's in the definitions for the H vector come from the field elements of the Galois field.)
Unfortunately, the Galois field of 28 elements of GF (28) defined by
is a set of 255 's.
However, to illustrate the principle, the Galois field of 24 elements of GF (24) formed as the field of polynomials over GF(2) modulo
is given on the next page[4].
0 = 1 = 0001 1 = 1 = 0010 2 = 2 = 0100 3 = 3 = 1000 4 = 1 + 1 = 0011 5 = 2 + 1 = 0110 6 = 3 + 2 = 1100 7 = 3 + 1 + 1 = 1011 8 = 2 + 1 = 0101 9 = 3 + 1 = 1010 10 = 2 + 1 + 1 = 0111 11 = 3 + 2 + 1 = 1110 12 = 3 + 2 + 1 + 1 = 1111 13 = 3 + 2 + 1 = 1101 14 = 3 + 1 = 1001 15 = 1 = 0
B. Subcodes
The 8-bit subcode is a very peculiar creature. Each 588 bit frame has an eight bit subcode. These bits are named P-Q-R-S-T-U-V-W. So, for each 588 bit frame, there is one P bit (not the same as P parity), one Q bit (not the same as Q parity), one R bit, one S bit and so on[5].
Now, the P-Q-R-S-T-U-V-W bits from 98 consecutive frames are collected together. These 98 bits are called a subcoding channel or just channel. Thus, there is a P-channel of 98 bits (no relation to the P parity), a Q-channel of 98 bits (no relation to Q parity), an R-channel of 98 bits and so on.
Unfortunately (just to maximize confusion with the P and Q parity bits) only the P and Q subcode channels are used. The R-W subcode channels are not yet assigned -- being held for later expansion of the standard.
P Channel -- The P channel simply designates the starting and stopping of tracks. Music data is denoted by all zeros, the start flag before the musical selection by 2-3 seconds of "1's". The lead out at the end of the disk is a 2 Hz alternating 1 and 0[6].
Q channel. The Q channel contains the majority of program and timing information. The first two bits (S0 and S1) are synchronization bits. The next four (bits 3-6) are the control bits. Bit 3 controls the number of channels (2 or 4), bit 4 is unassigned, bit 5 is the copy protect and bit 6 is the pre-emphasis bit. The next four bits control the mode (three defined modes). The next 72 bits are data -- and the last 16 are a cyclic redundancy check on the channel data.
Mode 1 -- contains the primary selection timing information. In the lead-in area, this information consists of the number of tracks and the absolute starting time of each track. This information is continually repeated in the lead-in area, and allows the CD player to build the table of contents[7].
In the program and lead-out areas, the Mode 1 information is track number, index numbers within a track, time within a track, and absolute time[8].
Mode 2 contains a catalog number of the disk -- plus a continuation of the absolute time count[9].
Mode 3 contains IRSC codes for identifying each track -- allowing for such things as automatic copyright logging. Mode 3 also contains a continuation o f the absolute time count. Mode 3 is irregularly used at this time[10].
Next: CD/ROM