We introduce a challenging decision-making task that we call active acquisition for multimodal temporal data (A2MT). In many real-world scenarios, input features are not readily available at test time and must instead be acquired at significant cost. With A2MT, we aim to learn agents that actively select which modalities of an input to acquire, trading off acquisition cost and predictive performance. A2MT extends a previous task called active feature acquisition to temporal decision making about high-dimensional inputs. Further, we propose a method based on the Perceiver IO architecture to address A2MT in practice. Our agents are able to solve a novel synthetic scenario requiring practically relevant cross-modal reasoning skills. On two large-scale, real-world datasets, Kinetics-700 and AudioSet, our agents successfully learn cost-reactive acquisition behavior. However, an ablation reveals they are unable to learn to learn adaptive acquisition strategies, emphasizing the difficulty of the task even for state-of-the-art models. Applications of A2MT may be impactful in domains like medicine, robotics, or finance, where modalities differ in acquisition cost and informativeness.
翻译:我们引入了一种具有挑战性的决策任务,即我们要求积极获取多式联运时间数据(A2MT),在许多现实世界情景中,输入特征在测试时间并非随时可得,而是必须大量成本才能获取。在A2MT中,我们的目标是学习积极选择一种投入获取模式、交换获取成本和预测性性能的代理商。A2MT将以前称为积极获取特征的任务扩展至对高维投入进行时间性决策。此外,我们提议了一种基于 Perceiver IO 架构的方法,在实践中解决A2MT。我们的代理商能够解决一种需要实际相关的跨模式推理技能的新型合成情景。在两个大规模、真实世界数据集,即Kinitics-700和AudioSet上,我们的代理商成功地学习了成本-反应性获取行为。然而,一种关系显示他们无法学习适应性获取战略,强调任务的困难,即使是在最先进的模型中也是如此。A2MT的应用在医学、机器人或金融等领域可能具有影响,在获取成本和知识方面的方式不同。