Rabu, 29 Februari 2012

The Mp3 Frame Layout

The frame consist of five parts; header, CRC (optional), side information, main data and ancillary data as shown beow


The Frame Format

Each frame consists of 1152 mono or stereo frequency-domain output samples.  It is divided into two granules of 576 samples each.  The granule is made up of 18 frequency lines * 32 subband blocks

 Frequency lines of frame compositions of MP3 in two audio channels   

 The size of the frame will depend on the bitrate and the sampling frequency.  It can be computed using the formula:

         [(144*bitrate)/sampling frequency] + padding [bytes]

A MP3 song with bit rate of 128kbps, sampling frequency of 44.1kHz and padding bytes set will have a frame size of  417, calculation;

 [(144*128000)/44100]+1 = 417.    

    

Selasa, 28 Februari 2012

The Mp3 Encoding Process Overview



The input of the encoder is some PCM samples.  The PCM samples are split into frames of 1152 samples.  The frames are further divided into two granules of 576 samples each. The frames are sent to both the Fast Fourier Transform (FFT) block and the analysis filterbank.

The 576 samples are processed by the analysis filterbank by splitting them into different frequency band.  In parallel, FFT translates the input samples into frequency spectrum. The psychoacoustic model uses the frequency information from FFT block to remove perceptual irrelevant information and to determine the masking thresholds for all frequencies.

The analysis filterbank will arrange the 32 subbands.  After which, Alias reductions are performed to compensate the non-ideal bandpass filtering in the filterbank.

The 32 subband samples are transformed from time domain to frequency domain using Modified Discrete Cosine Transform (MDCT).  In MDCT, windowing is applied to the samples in each subband.  If short windows are used, 6 samples are performed at a time which is known as long block.  If long windows are used, 18 samples are performed at a time which is known as short block.

After transforming the samples to frequency domain, the quantizer makes use of the masking thresholds to determine the number of bits that are required to encode each sample.

Next, Huffman encoding is performed to compress the data. Information on how the data is encoded are saved and uncompressed in the side information which will be used by the decoder.  The Huffman encoded data, scalefactors and side information are combined and stored in the bitstream.



Background of MP3


MPEG 1 layer III standard commonly known as MP3 is a complex audio compression method.  The algorithms were approved in 1991 and were standardized as MPEG-1 by the Motion Pictures Expert Group (MPEG) in 1992.  It was formed by the ISO/IEC organizations coding standard as ISO/IEC 1172-3 in 1993 .

The standard defines layer I, II and III. The layers increase complexity in terms of less transmission bandwidth is required.  Layers III is the most efficient due to its high quality coding of near to CD audio at a ratio of 12:1. Table below shows the transmission rates and ratio required from each layers.

Coding method
Bit rate
Ratio
PCM CD audio
1.4 Mbps
1:1
Layer I
384 kbps
4:1
Layer II
192 kbps
8:1
Layer III
128 kbps
12:1
                                  
                                                 Table : Comparison of bitrate and Ratio

 

Rabu, 8 Februari 2012

Field Programmable Get Array (FPGA)



FPGA are a very popular means for computing and prototyping. They provide great design flexibility, fast turnaround time and simpler design flow. These reasons, why FPGA are choose for design environment. Particularly, Altera DE2 board is used for the experiment platform. It hosts Cyclone II EP2C35 FPGA, SDRAM, SRAM, flash memory, SD memory card slot, 24-bit Audiocodec (WM8731), VGA codec, LEDs and other components. Cyclone II is a low-cost FPGA, which has a capacity of 33,216 logical-elements and 105 M4K RAM blocks. The components are using besides Cyclone II are: SDRAM and SRAM for storing data and instructions, audio codec for converting PCM samples to analog signals, and the SD card socket for input.