SOPC-based MPEG-4 video player

This article refers to the address: http://

Abstract MPEG-4 for video player on the Altera SOPC platform. Real-time instruction modules such as IQ, IDCT, and MC are implemented in Nios II user-defined instructions to realize real-time decoding of MPEG-4 video of L1, QCIF, and 25fps under the Simple Profile visual framework based on XviD Codec.
Keywords MPEG-4 video player SOPC NiosII

Introduction One of the key technologies for the practical application of multimedia technology is to solve the contradiction between the large amount of data after digitization of video and audio, and the small capacity of digital storage media and communication network. The solution is compression.

In order to support low bit rate video transmission services, MPEG (Moving Picture Expert 5 Group) has introduced the MPEG-4 standard. MPEG-4, which officially became an international standard in 1999, is a video and audio solution suitable for low transmission rates, with a focus on the interactivity and flexibility of multimedia systems. The MPEG-4 video compression standard provides a highly flexible, "content-based" encoding method that allows the decoder to "decode on demand" and add objects and information. This flexibility enables MPEG-4 to have efficient coding efficiency, content-based scalability, and robustness in a susceptible environment.

These features of the MFEG-4 make it ideal for handheld devices with limited storage capacity. However, the techniques of Inverse Quantization (IQ), Inverse Discrete Cosine Transform (IDCT), and Motion Composition (MC) involved in MPEG-4 video decoding are typical computationally intensive transformations. The real-time performance of video decoding is a big challenge for handheld devices with limited processing power and limited power consumption.

The system uses Nios II user-defined instructions to implement complex, highly time-consuming functional modules such as IQ, IDCT, and MC in MPEG-4 decoding on the SOPC platform composed of Nios II and FPGA. Decoding speed. Therefore, based on the XviD Codec released by the GPL protocol, the MPEG-4 real-time decoding of L1 level, QCIF (177×144 resolution) and 25fps is realized under the Simple Profile visual framework, and displayed on the LCD by DMA.

1 System function description The system can be divided into four parts: video file access, video decoder, YUV-RGB converter and LCD control module.

1.1 Video File Access
To play video files, you first need to store and read the video files conveniently. The MP4 file played by the system is compressed by XviD Codec on the PC for 4:2:0 YUV file. The MP4 file uses a 177 x 144 resolution QCIF format, 25 frames/s. In the download mode, MP4 files can be written to the flash memory via the JTAG interface. In playback mode, the Nios II processor reads the MP4 file from the flash memory and sends it to the file buffer pool for the decoder to read and decode it.

1.2 Video Decoder
Video Decoder is the core of the system. As shown in FIG. 1, the video decoder is composed of five modules: an entropy decoder, an inverse quantization, an inverse discrete cosine transform, a motion compensation module, and a video frame buffer.

When decoding, the input code stream is first entropy decoded, and then the type of the frame is determined according to the header information of the frame. For each macroblock, the entropy is decoded first by IQ, and then by IDCT transform to obtain the value of the null domain. For the reference frame (R_Frame), since motion compensation is not required, the transformed result is directly output, and it is also stored in the video frame buffer. Leave the subsequent predicted frame (P-Frame) for motion compensation. For the predicted frame, the motion vector is first obtained by entropy decoding, and after the corresponding reference frame is searched according to the motion vector, the IDCT transformed prediction difference is added thereto to synthesize the final predicted frame image. The decoded predicted frame is also an output, and is stored in the video frame buffer.

If video decoding is implemented in pure software, the amount of computation is too large to meet real-time requirements. Using Nios II's custom instructions, the three main computationally intensive decoding units, IQ, IDCT, and MC, are implemented in hardware logic, and the complexity of hardware logic is exchanged for real-time decoding.

1.3
The YUV format image decoded by the YUV-RGB converter decoder is not suitable for direct use in LCD display. To display the decoded image on the LCD, the YUV format image must be converted to RGB format. The conversion relationship between the two is as follows:

YUV to RGB format conversion is a process that takes up a lot of CPU resources. The system uses hardware logic to implement the conversion in a table lookup manner.

1.4 LCD Control Module The
standard VGA LCD display module (640×480, @60 Hz) is a progressive scan device. This type of scanning is sequential, and the next scanning point can be predicted so that the pixel information that needs to be sent can be lined up as a stream (Streaming). With the design method of Nios II's Avalon stream mode peripheral, an Avalon stream mode LCD controller can be implemented. A DMA transfer channel is established between the flow mode LCD controller and the system SDRAM by the DMA controller, and the pixel information is read and sent out by the hardware. Nios II only needs to operate the corresponding area in the SDRAM to complete the update of the displayed image.

2 system design structure
2.1 System Hardware Structure The
system hardware structure is shown in Figure 2.

In order to achieve a real-time decoding speed of 25 fps, the four computationally intensive functional units of IDCT, IQ, MC, and YUV-RGB conversion are all implemented in the form of user-defined instructions.

2.1.1
The two-dimensional array QF[v][u] of the inverse quantized coefficients is inverse quantized to produce reconstructed DCT coefficients. The essence of this process is a multiplication operation that is a multiple of the quantization step size.

The inverse quantization process of the inner coded block DC coefficients is different from other AC coefficients. The DC inverse quantization coefficient is obtained by multiplying a constant factor intra-dc by QF[0][0]. Intra_dc is related to coding accuracy, and Table 1 shows the correspondence between the two.

The inverse quantization of the AC coefficient uses two weighting matrices for the inner sub-block and the non-internal sub-block, respectively. Users can also use a custom quantization matrix. If QDCT is used to input the quantized AC coefficients and DCT is used to represent the inverse quantized AC coefficients, then the IQ coefficients of the AC coefficients are as follows:


In the formula, the quantiser_scale is a set of two values ​​between 0 and 112, which respectively correspond to different bitstream control states. However, in the XviDCodec version adopted by this system, the bit stream control function is not implemented, so the value of quantiser_scale is fixed here.

The result of the inverse quantification is limited by [-2048, +2047] by saturation.
IQ is implemented on the FPGA in accordance with the block diagram of Figure 3.

2.1.2 Inverse Discrete Cosine Transform
    IDCT is the inverse of DCT and is used to restore the matrix of DCT coefficients.
The IDCT process can be described by the following formula:


The 8-yuan input vector [X0, X1, X2, X3, X4, X5, X6, X7] is divided into odd elements [X1, X3, X5, X7] and even elements [Xo, X2, X4, X6], 8× The 8 matrix is ​​replaced by two 4×4 matrices, and the odd and even elements are multiplied by the two matrices v and u, respectively, to generate two 4×4 vectors p and q, by adding or subtracting vectors p and q. Get the output vector x.

The algorithm can be expressed as the following formula:

The IDCT algorithm based on the 8×8 matrix is ​​implemented in hardware on the FPGA according to the structure shown in FIG. 4.

2.1.3 Motion Compensation Motion compensation is a large, monotonous operation. In order to realize motion compensation, a multi-stage and multiple arithmetic unit parallel flow operation is adopted, as shown in FIG. 5.

The control of the motion compensation module is complex. In actual design, it is divided into several sub-modules: compensation control, compensation address generation, differential data supply, and compensation operation. These sub-modules are directly designed with hardware logic and do not require Nios II processor intervention during operation. The compensation control is to complete the control of the entire motion compensation, and provides an input control signal, an output control signal, a buffer control signal, prediction data and differential data, etc.; the compensation address is generated for generating the address of the predicted data in the frame buffer and the writing of the compensation result. The address; the differential data is responsible for receiving the result of the IDCT, providing compensation for use at the appropriate time by caching; and compensating for the calculation of the final predicted data.

2.1.4 YUV-RGB conversion
According to the conversion relationship of YUV to RGB color space, the result is pre-made for each product item and stored in ROM. For each input of the YUV component, the hardware logic generates the access address and performs an addition to obtain the corresponding result. Its implementation structure is shown in Figure 6.

2.2 System Software Workflow
The software workflow of this system is shown in Figure 7.

Conclusion

    The system is implemented on the SOPC platform based on Altera FPGA embedded soft core, with low hardware cost, extensive use of IP core, and good system scalability.

LED Flood Light

LED High Bay Light Co., Ltd. , http://www.nsfloodlight.com

Posted on