Selasa, 28 Februari 2012

The Mp3 Encoding Process Overview



The input of the encoder is some PCM samples.  The PCM samples are split into frames of 1152 samples.  The frames are further divided into two granules of 576 samples each. The frames are sent to both the Fast Fourier Transform (FFT) block and the analysis filterbank.

The 576 samples are processed by the analysis filterbank by splitting them into different frequency band.  In parallel, FFT translates the input samples into frequency spectrum. The psychoacoustic model uses the frequency information from FFT block to remove perceptual irrelevant information and to determine the masking thresholds for all frequencies.

The analysis filterbank will arrange the 32 subbands.  After which, Alias reductions are performed to compensate the non-ideal bandpass filtering in the filterbank.

The 32 subband samples are transformed from time domain to frequency domain using Modified Discrete Cosine Transform (MDCT).  In MDCT, windowing is applied to the samples in each subband.  If short windows are used, 6 samples are performed at a time which is known as long block.  If long windows are used, 18 samples are performed at a time which is known as short block.

After transforming the samples to frequency domain, the quantizer makes use of the masking thresholds to determine the number of bits that are required to encode each sample.

Next, Huffman encoding is performed to compress the data. Information on how the data is encoded are saved and uncompressed in the side information which will be used by the decoder.  The Huffman encoded data, scalefactors and side information are combined and stored in the bitstream.



Tiada ulasan:

Catat Ulasan