Fast Video Encoding using Spatio-Temporal Features and Ensemble Classification

[vc_row el_class=”inner-body-content” css=”.vc_custom_1667821473438{padding-top: 30px !important;padding-bottom: 20px !important;}”][vc_column][vc_column_text css=”.vc_custom_1667821452979{margin-bottom: 0px !important;}”]


Fast Video Encoding using Spatio-Temporal Features and Ensemble Classification


Video Coding has evolved over the years and new compression standards are being developed at regular intervals. Latest video codecs such as High Efficiency Video Coding (HEVC) have improved compression efficiency to reduce the bit-rates of encoded streams. This improvement has resulted in high computational complexity that becomes a bottleneck in real-time implementation of these codecs. Reduction in this computational complexity without compromise on the video quality is a challenge. Another challenge in fast video encoding is the diverse nature of video content. Hence, there is need for development of intelligent techniques to reduce the computational complexity of latest video codecs that also adapt to the diverse nature of video data.

Research in this thesis focuses on identification of local and global features extracted from video data that can be used to characterize the diverse content in video sequences. Texture variations and motion content in video sequence are quantified to categorize it into simple and complex video sequence. This information is used to develop a content adaptive fast encoding framework to reduce the computational complexity of HEVC. Proposed framework performs equally well both for sequences with simple or complex video content without compromising on video quality. It has been tested with a large set of video sequences and shows promising results as compared to the other recent works.

This research work also focuses on the use of machine learning based algorithms for fast encoding to significantly reduce the complexity of video codecs while keeping bit-rate and PSNR within limitsin recent video standards like HEVC. Machine learning based techniques formulate encoding process as a classification problem and use features extracted from video data to model the classifiers that can assist in early predictions during encoding. A large set of spatial and temporal features is extracted from video data and a systematic approach is applied for optimal feature selection. Resultant optimal features are used to train a Random Forests based ensemble classifier for early selection of prediction modes and coding unit sizes in HEVC. Proposed technique has been evaluated with publicly available video data sets. Experimental results show that the proposed approach significantly reduces the complexity of different profiles in HEVC without compromising on video quality and performs better than other existing fast video encoding implementations.

Fast encoding methods developed in this research have also been validated using emerging applications of HEVC such as compression of light field images. Detailed analysis of different formats in light field image representation has been carried out to identify unique aspects that differentiate them from natural videos. These unique features are used in fast coding of light field images in various formats using HEVC. Experimental results show that proposed technique can be applied in fast coding of light field images without compromise on image quality. These results can be used as benchmark for future research on fast encoding of light field images. Hence, the fast video encoding methods developed in this research can be extended to other applications of image and video coding. These techniques can also be integrated in real life video encoding solutions to enable implementation of latest video codecs on embedded hardware platforms with limited processing power and memory.

Download full paper