Multimodal problems are omnipresent in the real world: autonomous driving, robotic grasping, scene understanding, etc... We draw from the well-developed analysis of similarity to provide an example of a problem where neural networks are trained from different sensors, and where the features extracted from these sensors still carry similar information. More precisely, we demonstrate that for each sensor, the linear combination of the features from the last layer that correlates the most with other sensors corresponds to the classification components of the classification layer.
翻译:在现实世界中,多模式问题无处不在:自主驱动、机器人捕捉、场景理解等等。 我们从对相似性的完善分析中吸取一个实例,说明神经网络从不同传感器接受培训,从这些传感器中提取的特征仍然含有类似信息的问题。 更确切地说,我们证明,对于每个传感器来说,最与其他传感器相关的最后一个层特征的线性组合与分类层的分类组成部分相对应。