We tackle a challenging task: multi-view and multi-modal event detection that detects events in a wide-range real environment by utilizing data from distributed cameras and microphones and their weak labels. In this task, distributed sensors are utilized complementarily to capture events that are difficult to capture with a single sensor, such as a series of actions of people moving in an intricate room, or communication between people located far apart in a room. For sensors to cooperate effectively in such a situation, the system should be able to exchange information among sensors and combines information that is useful for identifying events in a complementary manner. For such a mechanism, we propose a Transformer-based multi-sensor fusion (MultiTrans) which combines multi-sensor data on the basis of the relationships between features of different viewpoints and modalities. In the experiments using a dataset newly collected for this task, our proposed method using MultiTrans improved the event detection performance and outperformed comparatives.
翻译:我们处理一项具有挑战性的任务:利用分布式照相机和麦克风及其微弱标签提供的数据,多视和多模式事件探测,在宽广的现实环境中探测事件;在这一任务中,分布式传感器被补充用于用单一传感器捕捉难以捕捉的事件,例如人们在一个复杂的房间里移动的一系列行动,或位于相距遥远的房间里的人之间的通信;为了使传感器在此种情况下进行有效合作,该系统应能在传感器之间交流信息,并结合对以互补方式识别事件有用的信息;对于这样一个机制,我们建议采用基于变换器的多传感器聚合(MultiTrans),根据不同观点和模式特征之间的关系将多传感器数据组合在一起。在为这项任务使用新收集的数据的实验中,我们提出的使用多传输系统改进事件探测性能和超越比较性能的方法。