Qualcomm Patent | Reference Picture Derivation and Motion Compensation for 360-Degree Video Coding

Patent: Reference Picture Derivation and Motion Compensation for 360-Degree Video Coding

Publication Number: 20190007679

Publication Date: 2019-01-03

Applicants: Qualcomm

Abstract

This disclosure describes techniques for generating reference frames packed with extended faces from a cubemap projection or adjusted cubemap projection of 360-degree video data. The reference frames packed with the extended faces may be used for inter-prediction of subsequent frames of 360-degree video data.

Background

Digital video capabilities can be incorporated into a wide range of devices, including digital televisions, digital direct broadcast systems, wireless broadcast systems, personal digital assistants (PDAs), laptop or desktop computers, tablet computers, e-book readers, digital cameras, digital recording devices, digital media players, video gaming devices, video game consoles, cellular or satellite radio telephones, so-called “smart phones,” video teleconferencing devices, video streaming devices, and the like. Digital video devices implement video coding techniques, such as those described in the standards defined by MPEG-2, MPEG-4, ITU-T H.263, ITU-T H.264/MPEG-4, Part 10, Advanced Video Coding (AVC), ITU-T H.265, also referred to as High Efficiency Video Coding (HEVC), and extensions of such standards. The video devices may transmit, receive, encode, decode, and/or store digital video information more efficiently by implementing such video coding techniques.

Video coding techniques include spatial (intra-picture) prediction and/or temporal (inter-picture) prediction to reduce or remove redundancy inherent in video sequences. For block-based video coding, a video slice (e.g., a video picture or a portion of a video picture) may be partitioned into video blocks, which may also be referred to as coding tree units (CTUs), coding units (CUs) and/or coding nodes. Video blocks in an intra-coded (I) slice of a picture are encoded using spatial prediction with respect to reference samples in neighboring blocks in the same picture. Video blocks in an inter-coded (P or B) slice of a picture may use spatial prediction with respect to reference samples in neighboring blocks in the same picture or temporal prediction with respect to reference samples in other reference pictures. Pictures may be referred to as frames, and reference pictures may be referred to as reference frames.

More recently, techniques for coding and transmitting 360-degree video, e.g., for virtual reality (VR) applications have been developed. As a result of recent developments in VR video technology, the video environment experienced by the user has become just as important as the subject of the videos themselves. Such VR video technology may use 360-degree video technology that involves real-time streaming of 360-degree video graphics and/or real-time streaming of 360-degree video from a 360-degree video camera or website to a real-time video display, such as a VR head-mount display (HMD). A VR HMD allows the user to experience action happening all around them by changing a viewing angle with a turn of the head. In order to create a 360-degree video, a special set of cameras may be used to record all 360-degrees of a scene simultaneously, or multiple views (e.g., video and/or computer-generated images) may be stitched together to form the image.

After video data has been encoded, the video data may be packetized for transmission or storage. The video data may be assembled into a video file conforming to any of a variety of standards, such as the International Organization for Standardization (ISO) base media file format and extensions thereof, such as the AVC file format.

Summary

In general, this disclosure is directed to techniques for encoding and decoding video data. In some examples, this disclosure describes reference picture derivation and motion compensation techniques for 360-degree video coding. In some examples, this disclosure describes techniques for generating reference frames packed with extended faces from a cubemap projection or adjusted cubemap projection of 360-degree video data. The reference frames packed with the extended faces may be used for inter-prediction of subsequent frames of 360-degree video data. By generating reference frames with extended faces, distortion and coding efficiency issues resulting from deformation and discontinuities at the borders between packed faces may be mitigated.

In one example, this disclosure describes a method of decoding 360-degree video data, the method comprising receiving an encoded frame of 360-degree video data, the encoded frame of 360-degree video data being arranged in packed faces obtained from a projection of a sphere of the 360-degree video data, decoding the frame of encoded 360-degree video data to obtain a decoded frame of 360-degree video data, the decoded frame of 360-degree video data being arranged in the packed faces, deriving a decoded sphere of 360-degree video data from the decoded frame of 360-degree video data, sampling the decoded sphere of 360-degree video data to produce extended faces using the projection, wherein the extended faces are larger than the packed faces of the decoded frame of 360-degree video data, deriving an extended reference frame from the extended faces, and decoding a subsequent encoded frame of 360-degree video data using an inter-prediction process and the derived extended reference frame.

In another example, this disclosure describes an apparatus configured to decode 360-degree video data, the apparatus comprising a memory configured to store an encoded frame of 360-degree video data, and one or more processors in communication with the memory, the one or more processors configured to receive the encoded frame of 360-degree video data, the encoded frame of 360-degree video data being arranged in packed faces obtained from a projection of a sphere of the 360-degree video data, decode the frame of encoded 360-degree video data to obtain a decoded frame of 360-degree video data, the decoded frame of 360-degree video data being arranged in the packed faces, derive a decoded sphere of 360-degree video data from the decoded frame of 360-degree video data, sample the decoded sphere of 360-degree video data to produce extended faces using the projection, wherein the extended faces are larger than the packed faces of the decoded frame of 360-degree video data, derive an extended reference frame from the extended faces, and decode a subsequent encoded frame of 360-degree video data using an inter-prediction process and the derived extended reference frame.

In another example, this disclosure describes an apparatus configured to decode 360-degree video data, the apparatus comprising means for receiving an encoded frame of 360-degree video data, the encoded frame of 360-degree video data being arranged in packed faces obtained from a projection of a sphere of the 360-degree video data, means for decoding the frame of encoded 360-degree video data to obtain a decoded frame of 360-degree video data, the decoded frame of 360-degree video data being arranged in the packed faces, means for deriving a decoded sphere of 360-degree video data from the decoded frame of 360-degree video data, means for sampling the decoded sphere of 360-degree video data to produce extended faces using the projection, wherein the extended faces are larger than the packed faces of the decoded frame of 360-degree video data, means for deriving an extended reference frame from the extended faces, and means for decoding a subsequent encoded frame of 360-degree video data using an inter-prediction process and the derived extended reference frame.

In another example, this disclosure describes a computer-readable storage medium storing instructions that, when executed, causes one or more processors of a device configured to decode video data to receive the encoded frame of 360-degree video data, the encoded frame of 360-degree video data being arranged in packed faces obtained from a projection of a sphere of the 360-degree video data, decode the frame of encoded 360-degree video data to obtain a decoded frame of 360-degree video data, the decoded frame of 360-degree video data being arranged in the packed faces, derive a decoded sphere of 360-degree video data from the decoded frame of 360-degree video data, sample the decoded sphere of 360-degree video data to produce extended faces using the projection, wherein the extended faces are larger than the packed faces of the decoded frame of 360-degree video data, derive an extended reference frame from the extended faces, and decode a subsequent encoded frame of 360-degree video data using an inter-prediction process and the derived extended reference frame.

In another example, this disclosure describes a method of encoding 360-degree video data, the method comprising receiving a sphere of 360-degree video data, arranging the sphere of 360-degree video data into a frame of packed faces obtained from a projection of the sphere of 360-degree video data, encoding the frame of packed faces to form a frame of encoded 360-degree video data, reconstructing the frame of encoded 360-degree video data to obtain a reconstructed frame of 360-degree video data, the reconstructed frame of 360-degree video data being arranged in the packed faces, deriving a reconstructed sphere of 360-degree video data from the reconstructed frame of 360-degree video data, sampling the reconstructed sphere of 360-degree video data to produce extended faces using the projection, wherein the extended faces are larger than the packed faces of the reconstructed frame of 360-degree video data, deriving an extended reference frame from the extended faces, and encoding a subsequent frame of 360-degree video data using an inter-prediction process and the derived extended reference frame.

In another example, this disclosure describes an apparatus configured to encode 360-degree video data, the apparatus comprising a memory configured to store a sphere of 360-degree video data, and one or more processors in communication with the memory, the one or more processors configured to receive the sphere of 360-degree video data, arrange the sphere of 360-degree video data into a frame of packed faces obtained from a projection of the sphere of 360-degree video data, encode the frame of packed faces to form a frame of encoded 360-degree video data, reconstruct the frame of encoded 360-degree video data to obtain a reconstructed frame of 360-degree video data, the reconstructed frame of 360-degree video data being arranged in the packed faces, derive a reconstructed sphere of 360-degree video data from the reconstructed frame of 360-degree video data, sample the reconstructed sphere of 360-degree video data to produce extended faces using the projection, wherein the extended faces are larger than the packed faces of the reconstructed frame of 360-degree video data, derive an extended reference frame from the extended faces, and encode a subsequent frame of 360-degree video data using an inter-prediction process and the derived extended reference frame.

In another example, this disclosure describes an apparatus configured to encode 360-degree video data, the apparatus comprising means for receiving a sphere of 360-degree video data, means for arranging the sphere of 360-degree video data into a frame of packed faces obtained from a projection of the sphere of 360-degree video data, means for encoding the frame of packed faces to form a frame of encoded 360-degree video data, means for reconstructing the frame of encoded 360-degree video data to obtain a reconstructed frame of 360-degree video data, the reconstructed frame of 360-degree video data being arranged in the packed faces, means for deriving a reconstructed sphere of 360-degree video data from the reconstructed frame of 360-degree video data, means for sampling the reconstructed sphere of 360-degree video data to produce extended faces using the projection, wherein the extended faces are larger than the packed faces of the reconstructed frame of 360-degree video data, means for deriving an extended reference frame from the extended faces, and means for encoding a subsequent frame of 360-degree video data using an inter-prediction process and the derived extended reference frame.

In another example, this disclosure describes a computer-readable storage medium storing instructions that, when executed, causes one or more processors of a device configured to encode video data to receive a sphere of 360-degree video data, arrange the sphere of 360-degree video data into a frame of packed faces obtained from a projection of the sphere of 360-degree video data, encode the frame of packed faces to form a frame of encoded 360-degree video data, reconstruct the frame of encoded 360-degree video data to obtain a reconstructed frame of 360-degree video data, the reconstructed frame of 360-degree video data being arranged in the packed faces, derive a reconstructed sphere of 360-degree video data from the reconstructed frame of 360-degree video data, sample the reconstructed sphere of 360-degree video data to produce extended faces using the projection, wherein the extended faces are larger than the packed faces of the reconstructed frame of 360-degree video data, derive an extended reference frame from the extended faces, and encode a subsequent frame of 360-degree video data using an inter-prediction process and the derived extended reference frame.

发表评论

电子邮件地址不会被公开。 必填项已用*标注