Creating a vision pipeline for different datasets to solve a computer vision task is a complex and time consuming process. Currently, these pipelines are developed with the help of domain experts. Moreover, there is no systematic structure to construct a vision pipeline apart from relying on experience, trial and error or using template-based approaches. As the search space for choosing suitable algorithms for achieving a particular vision task is large, human exploration for finding a good solution requires time and effort. To address the following issues, we propose a dynamic and data-driven way to identify an appropriate set of algorithms that would be fit for building the vision pipeline in order to achieve the goal task. We introduce a Transformer Architecture complemented with Deep Reinforcement Learning to recommend algorithms that can be incorporated at different stages of the vision workflow. This system is both robust and adaptive to dynamic changes in the environment. Experimental results further show that our method also generalizes well to recommend algorithms that have not been used while training and hence alleviates the need of retraining the system on a new set of algorithms introduced during test time.
翻译:创建不同数据集的愿景管道以解决计算机愿景任务是一个复杂而耗时的过程。目前,这些管道是在领域专家的帮助下开发的。此外,除了依赖经验、试验和错误或使用基于模板的方法之外,没有系统的结构来构建愿景管道。由于选择适合的算法以实现特定愿景任务的搜索空间很大,为寻找一个良好的解决方案而进行人类探索需要时间和精力。为了解决以下问题,我们提出了一个动态和数据驱动的方法,以确定适合建设愿景管道以完成目标任务的一套适当的算法。我们引入了一个由深强化学习补充的变换器结构,以推荐可纳入愿景工作流程不同阶段的算法。这个系统既健全又适应环境动态变化。实验结果进一步表明,我们的方法也很好地概括了在培训期间没有使用的算法,从而减轻了系统在测试期间引入的一套新算法上再培训的需要。