The input of the encoder is some PCM samples. The PCM samples are split into frames of 1152
samples. The frames are further divided
into two granules of 576 samples each. The frames are sent to both the Fast
Fourier Transform (FFT) block and the analysis filterbank.
The 576 samples are processed by the analysis filterbank by splitting
them into different frequency band. In
parallel, FFT translates the input samples into frequency spectrum. The
psychoacoustic model uses the frequency information from FFT block to remove
perceptual irrelevant information and to determine the masking thresholds for
all frequencies.
The analysis filterbank will arrange the 32 subbands. After which, Alias reductions are performed
to compensate the non-ideal bandpass filtering in the filterbank.
The 32 subband samples are transformed from time domain to frequency
domain using Modified Discrete Cosine Transform (MDCT). In MDCT, windowing is applied to the samples
in each subband. If short windows are
used, 6 samples are performed at a time which is known as long block. If long windows are used, 18 samples are
performed at a time which is known as short block.
After transforming the samples to frequency domain, the quantizer
makes use of the masking thresholds to determine the number of bits that are
required to encode each sample.
Next, Huffman encoding is performed to compress the data. Information on how
the data is encoded are saved and uncompressed in the side information which will
be used by the decoder. The Huffman
encoded data, scalefactors and side information are combined and stored in the
bitstream.
Tiada ulasan:
Catat Ulasan