Semantic image and video segmentation stand among the most important tasks in computer vision nowadays, since they provide a complete and meaningful representation of the environment by means of a dense classification of the pixels in a given scene. Recently, Deep Learning, and more precisely Convolutional Neural Networks, have boosted semantic segmentation to a new level in terms of performance and generalization capabilities. However, designing Deep Semantic Segmentation models is a complex task, as it may involve application-dependent aspects. Particularly, when considering autonomous driving applications, the robustness-efficiency trade-off, as well as intrinsic limitations - computational/memory bounds and data-scarcity - and constraints - real-time inference - should be taken into consideration. In this respect, the use of additional data modalities, such as depth perception for reasoning on the geometry of a scene, and temporal cues from videos to explore redundancy and consistency, are promising directions yet not explored to their full potential in the literature. In this paper, we conduct a survey on the most relevant and recent advances in Deep Semantic Segmentation in the context of vision for autonomous vehicles, from three different perspectives: efficiency-oriented model development for real-time operation, RGB-Depth data integration (RGB-D semantic segmentation), and the use of temporal information from videos in temporally-aware models. Our main objective is to provide a comprehensive discussion on the main methods, advantages, limitations, results and challenges faced from each perspective, so that the reader can not only get started, but also be up to date in respect to recent advances in this exciting and challenging research field.
翻译:语义图像和视频分解是当今计算机愿景中最重要的任务之一,因为这些模型通过对特定场景的像素进行密集分类,提供了对环境的完整和有意义的代表。 最近,深层学习和更精确的进化神经网络,从性能和概括能力方面将语义分解提升到一个新的水平。然而,设计深层语义分解模型是一项复杂的任务,因为它可能涉及应用依赖的方面。 特别是,在考虑自主驱动应用程序时,稳健性效率的权衡以及内在限制――在某个场景中对像素进行密集的分类,从而提供了对环境环境的完整和有意义的表述。 最近,深层学习,以及更精确的神经网络网络网络网络网络将语义分解提高到一个新的水平。 然而,设计深层语义分解模型和视频分解模式是一个复杂的任务,但对于其全部潜力来说,我们在这一文件中,我们对于深层语义分解中最相关和最新进展的调查,因此,从自主飞行器的视野中,从实时的每个方向到我们的主要数据分流流流流流流流到我们的主要分流数据分流的每个方向,只能提供最新数据分流数据。</s>