Media streaming has been adopted for a variety of applications such as entertainment, visualization, and design. Unlike video/audio streaming where the content is usually consumed sequentially, 3D applications such as gaming require streaming 3D assets to facilitate client-side interactions such as object manipulation and viewpoint movement. Compared to audio and video streaming, 3D streaming often requires larger data sizes and yet lower latency to ensure sufficient rendering quality, resolution, and latency for perceptual comfort. Thus, streaming 3D assets can be even more challenging than streaming audios/videos, and existing solutions often suffer from long loading time or limited quality. To address this critical and timely issue, we propose a perceptually-optimized progressive 3D streaming method for spatial quality and temporal consistency in immersive interactions. Based on the human visual mechanisms in the frequency domain, our model selects and schedules the streaming dataset for optimal spatial-temporal quality. We also train a neural network for our model to accelerate this decision process for real-time client-server applications. We evaluate our method via subjective studies and objective analysis under varying network conditions (from 3G to 5G) and client devices (HMD and traditional displays), and demonstrate better visual quality and temporal consistency than alternative solutions.
翻译:在娱乐、视觉化和设计等各种应用中,采用了媒体流。与通常按顺序消费内容的视频/视频流不同,游戏等3D应用程序需要流3D资产,以便利客户方互动,如物体操纵和视觉运动。与音频和视频流相比,3D流往往需要更大的数据规模和较低的延缓度,以确保充分提供质量、分辨率和对感官舒适的潜伏。因此,流3D资产可能比流音/视频更具挑战性,而现有解决方案往往因长时间装载时间或质量有限而受到影响。为了解决这一关键和及时的问题,我们建议一种感知性优化的3D流方法,用于空间质量和感知性互动的时间一致性。根据频率域的人类视觉机制,我们的模型选择和安排流数据集,以达到最佳的时空质量。我们还为我们的模型培训一个神经网络,以加速实时客户服务器应用的决策进程,而现有解决方案往往受到长期的制约或质量限制。为了应对这一关键和及时的问题,我们建议采用一种感官优化的3D渐进流方法,通过主观性和视觉分析,而不是不同的视觉网络,我们用不同的客户方位分析来评估我们的方法,从主观性质量和视觉分析。