670 likes | 863 Views
Flexible Transport of 3-D Videos Over Networks. Ahmed Hamza Network Systems Lab Simon Fraser University. July 15, 2013. Outline. Introduction State of the Art 3D Video Representation 3D Video Coding Transport Protocols P2P Streaming Adaptive 3D Video Streaming Stereo Video
E N D
Flexible Transport of 3-D Videos Over Networks Ahmed Hamza Network Systems Lab Simon Fraser University July 15, 2013
Outline • Introduction • State of the Art • 3D Video Representation • 3D Video Coding • Transport Protocols • P2P Streaming • Adaptive 3D Video Streaming • Stereo Video • Multi-view Video • Case Study: DIOMEDES
Outline • Introduction • State of the Art • 3D Video Representation • 3D Video Coding • Transport Protocols • P2P Streaming • Adaptive 3D Video Streaming • Stereo Video • Multi-view Video • Case Study: DIOMEDES
Introduction • In the near term, popular 3-D media will most likely be in the form of stereoscopic and multi-view video. • Transmission of 3-D media, via broadcast or on-demand, to end users with varying 3-D display terminals (e.g., TV, laptop, and mobile devices) and bandwidths is one of the biggest challenges to bring 3-D media to the home and mobile devices. • Two main platforms for 3-D video delivery: • digital television (DTV) platforms • Internet Protocol (IP) platforms
IP-based Delivery Platforms • IPTV • multimedia services delivered over IP-based managed networks that provide the required level of quality of service (QoS) and experience, security, interactivity, and reliability • WebTV • services offered over Internet connections that support best effort delivery with no QoS guarantees, making them accessible anytime, anywhere as opposed to IPTV
Hybrid DTV-IP Approach The DVB channel is constrained by the physical channel bandwidth to allow transmitting multi-view video (MVV). The IP platform is more flexible in terms of bandwidth but is not reliable. A more recent research direction is to consider a combination of DVB and IP platforms to deliver MVV to provide free-view TV/video experience.
Outline • Introduction • State of the Art • 3D Video Representation • 3D Video Coding • Transport Protocols • P2P Streaming • Adaptive 3D Video Streaming • Stereo Video • Multi-view Video • Case Study: DIOMEDES
Stereoscopic Video • The most simple 3D video data representation • Each of the two captured views is presented to one of the eyes • Can be multiplexed either spatially (passive) or temporally (active) • Temporal multiplexing has the advantage of maintaining the full resolution of each view • Disadvantage: • hardware representation dependency (acquisition process is tailored to a specific type of displays, baseline distance between the two cameras is fixed)
Multiplexing Stereo Video Spatial Multiplexing (half the resolution) Time Multiplexing (double the frame rate)
Video Plus Depth texture 2D video signal along with geometry information of the scene depth map
Multi-view Plus Depth (MVD) Cam-6 Cam-3 Cam-0
3D Image Warping IsmaëlDaribo and Hideo Saito, “A Novel Inpainting-Based Layered Depth Video for 3DTV,” IEEE Transactions on Broadcasting, vol. 57, no. 2, June 2011 Example
Layered Depth Video (LDV) Main Layer (central color view and depth map) Enhancement Layer (color and depth occlusions) projected on central viewpoint
Outline • Introduction • State of the Art • 3D Video Representation • 3D Video Coding • Transport Protocols • P2P Streaming • Adaptive 3D Video Streaming • Stereo Video • Multi-view Video • Case Study: DIOMEDES
Three-Dimensional Video Coding • 3-D video encoding depends on the transport option and raw video format. • Simulcast encoding: • encode each view and/or depth map independently using a scalable or non-scalable monocular video codec • enables streaming each view over separate channels • clients can request as many views as their 3-D displays require • Dependent encoding: • encode views using MVC to decrease the overall bit rate by exploiting the inter-view redundancies • a special inter-view prediction structure must be employed to enable view-scalable and view-selective adaptive streaming
Multi-view Video Coding (MVC) • Multi-view extension of H.264/AVC • Enables inter-view prediction • Prediction structure is simplified by restricting inter-view prediction to anchor pictures only • Large disparity or different camera calibration affects coding efficiency • Reference MVC software (JMVC) • temporal and view scalability
Multi-view Plus Depth Coding • Independently code views and depth maps • Dependent encoding is also possible • Exploit correlation between texture and depth map • Examples: • sharing the texture video MVs with the depth map • utilizing inter-layer motion prediction tool in SVC
Outline • Introduction • State of the Art • 3D Video Representation • 3D Video Coding • Transport Protocols • P2P Streaming • Adaptive 3D Video Streaming • Stereo Video • Multi-view Video • Case Study: DIOMEDES
Transport Protocols • Transmission Control Protocol (TCP) • may not be suitable for streaming live video with a strict end-to-end delay constraint • lack of control on delay (retransmissions) • rapidly changing transmission rate (congestion control) • provides good performance when available network bandwidth is about twice the maximum video rate (few seconds pre-roll delay)
Transport Protocols • Datagram congestion control protocol (DCCP) • implements bidirectional unicast connections • both data and acknowledgements can flow in both directions • congestion-controlled, unreliable datagrams • congestion control mechanism selected at connection startup • outperforms TCP under congestion when a video streaming scenario is considered
Outline • Introduction • State of the Art • 3D Video Representation • 3D Video Coding • Transport Protocols • P2P Streaming • Adaptive 3D Video Streaming • Stereo Video • Multi-view Video • Case Study: DIOMEDES
P2P Streaming • Traditional client-server unicast streaming model is not scalable by nature. • Advantage of P2P solutions • scalable media distribution (reduce the bandwidth requirement of the server by utilizing the network capacity of the clients/peers) • P2P solutions use overlay networks (data are redirected to another peer by the application)
Tree-Based Approach • Efficient for delivering content from the server that is at the top of the tree to peers that are connected to each other in parent–child fashion. • Shortcomings: • ungraceful peer exit leads its descendants to starvation • replicating the content for feeding multiple trees leads to redundancy within the network
Mesh-Based Approach • Data are distributed over an unstructured network in which each peer can connect to multiple peers. • Increased connectivity alleviates the problem of ungraceful peer exit. • building multiple connections dynamically requires a certain amount of time (initiation interval) • More suitable for applications that may tolerate some initiation interval. • Example: BitTorrent
Outline • Introduction • State of the Art • 3D Video Representation • 3D Video Coding • Transport Protocols • P2P Streaming • Adaptive 3D Video Streaming • Stereo Video • Multi-view Video • Case Study: DIOMEDES
Adaptive Streaming • A mechanism should exist to estimate the network conditions so as to adapt the video rate accordingly, in order to optimize the received video quality. • Estimation can be performed by • requesting receiver buffer occupancy status (to prevent buffer underflow/overflow) • combining receiver buffer status with bandwidth estimation
Adaptive Streaming • DCCP + TCP-friendly rate control (TFRC) • TFRC rate calculated by DCCP can be utilized by the sender to estimate the available network rate • When the video is streamed over TCP, an average of the transmission rate can be used to determine the available network bandwidth • Basic method in DASH
Video Rate Adaptation Methods • Adapting video rate to available bandwidth depends on the encoding characteristics of the views. • One or more views can be encoded multiple times with varying bit rates, sender can switch between these streams according to the network conditions • Similar to HTTP live streaming • Encoding views once with multiple layers using SVC and switching between these layers • Real-time encoding with source rate control • Difficult with MVV
Adaptive StereoscopicVideo Streaming • The behavior of the human visual system is another paradigm for QoE-aware rate adaptation. • Exploit the suppression theory • human visual system (HVS) tolerates lack of high-frequency components in one of the views • One of the views may be presented at a lower quality without degrading the 3-D video perception. • Asymmetric quality allocation
Just Noticeable Distortion for Asymmetric Stereo Coding • Asymmetry can be achieved by scaling the quality in one of the views (secondary view) • in spatial, signal-to-noise ratio (SNR) or temporal dimensions • Questions • Which method should be used? • What is the level of asymmetry before observers start noticing visible degradations?
Just Noticeable Distortion for Asymmetric Stereo Coding • Results show that the “just noticeable” threshold PSNR is • 33 dB for the polarized projection display • 31.5 dB for the parallax barrier display
Asymmetric Encodingfor Adaptive Streaming • Asymmetric Coding at a Fixed Rate Using MVC • Spatial asymmetry • using additional down-sampling steps in the encoding process • Temporal asymmetry • skipping frames skipping from secondary view • SNR (quality) asymmetry • straightforward compared to other types of asymmetry (encoding quality of a view depends on the quantization parameter used)
Asymmetric MVC Coding Alternating views are coded at high and low quality. Inter-view dependencies should be carefully constructed (predict only from high-quality views).
Asymmetric Encodingfor Adaptive Streaming • Scalable Asymmetric Coding Using SVC • It is possible to obtain spatial and/or quality scalable right and left views if they are simulcast coded using the SVC standard. • Two encoding options for achieving scalable asymmetric stereoscopic video bitstreams when simulcast coding is used: • encoding both views using SVC • encoding one view with SVC and the other with H.264/AVC
Asymmetric Encoding for Stereoscopic 3D Video • Can be done in two ways: • encode both views using SVC • base layer of each view is encoded with a quality ~32 dB • enhancement layers are encoded at the maximum quality according to channel capacity • only one view (the first) is scalably encoded • second view is encoded using non-scalable H.264/AVC • When the available link capacity is high, the scalable coded view (with the enhancement layer) becomes the high-quality view.
Outline • Introduction • State of the Art • 3D Video Representation • 3D Video Coding • Transport Protocols • P2P Streaming • Adaptive 3D Video Streaming • Stereo Video • Multi-view Video • Case Study: DIOMEDES
Adaptive Multi-viewVideo Streaming • Straightforward approach: • extend the concept of asymmetric coding to MVV streaming (for relatively small number of views) • A more efficient (in terms of bandwidth consumption) and flexible (in terms of number of views) approach: • streaming the MVD representation (includes view scalability) • View-selective encoding and interactive streaming of multi-view video • requires computer vision methods for real-time head/gaze tracking, can be used to limit the number of views transmitted
View Scaling • Discarding one view entirely and falling back to 2D video is not a good choice. • switching from 3D to 2D results in significant viewing discomfort • With multi-view video (MVV) format, view scaling is a possible option • missing view(s) may be outside of the user’s field of view or can be replaced by an artificial view generated at the client side • Challenge • How to determine which view should be discarded for minimum degradation in perceived quality?
QoE-based Adaptation Policy Subjective tests to evaluate the performance of scaling methods in terms of delivered QoE under different network conditions. 5-view 3D display at 1920x1200 screen resolution 12 male and 4 female assessors (7 experts)
QoE-based Adaptation Policy Recommended adaptation policy:
Adaptation-ready Encoding • Introduce quality difference between adjacent views. • View that are either transmitted or not are encoded with H.264/AVC for high coding efficiency. • Views that may have different qualities to achieve asymmetry are encoded using SVC. • Example: • For a five-view display, can perform this efficiently using SVC for views 2 and 4.
MVV Adaptation Example High link capacity (4.5 Mbps) Low link capacity (3.3 Mbps) Very Low capacity (2.1 Mbps)