与深强化学习一起自动处理图像数据预处理 (Automated Image Data Preprocessing with Deep Reinforcement Learning)

Data preparation, i.e. the process of transforming raw data into a format that can be used for training effective machine learning models, is a tedious and time-consuming task. For image data, preprocessing typically involves a sequence of basic transformations such as cropping, filtering, rotating or flipping images. Currently, data scientists decide manually based on their experience which transformations to apply in which particular order to a given image data set. Besides constituting a bottleneck in real-world data science projects, manual image data preprocessing may yield suboptimal results as data scientists need to rely on intuition or trial-and-error approaches when exploring the space of possible image transformations and thus might not be able to discover the most effective ones. To mitigate the inefficiency and potential ineffectiveness of manual data preprocessing, this paper proposes a deep reinforcement learning framework to automatically discover the optimal data preprocessing steps for training an image classifier. The framework takes as input sets of labeled images and predefined preprocessing transformations. It jointly learns the classifier and the optimal preprocessing transformations for individual images. Experimental results show that the proposed approach not only improves the accuracy of image classifiers, but also makes them substantially more robust to noisy inputs at test time.

翻译：将原始数据转换成能够用于培训有效机器学习模型的格式,即原始数据编制过程,是一个乏味和耗时的任务。对于图像数据,预处理通常涉及一系列基本转换,如裁剪、过滤、旋转或翻转图像。目前,数据科学家根据其经验手工决定哪些转换可适用于特定图像数据集。除了构成现实世界数据科学项目的一个瓶颈外,人工图像数据处理预处理还可能产生不理想的结果,因为数据科学家在探索可能的图像变换空间时需要依赖直觉或试入方法,因此可能无法发现最有效的变换空间。为减轻人工数据预处理的无效和潜在无效,本文件提出一个深度强化学习框架,以便自动发现用于培训图像分类师的最佳数据预处理步骤。框架采用标签图像和预处理变的输入组合。它共同学习分类器和个人图像的最佳预处理变。实验结果显示,拟议的方法不仅能大大改进图像的准确性,而且能大大地改进图像分析器的精确性。

相关内容

数据预处理

关注 1176

数据预处理（data preprocessing）是指在主要的处理以前对数据进行的一些处理。如对大部分地球物理面积性观测数据在进行转换或增强处理之前，首先将不规则分布的测网经过插值转换为规则网的处理，以利于计算机的运算。另外，对于一些剖面测量数据，如地震资料预处理有垂直叠加、重排、加道头、编辑、重新取样、多路编辑等。

【图与几何深度学习，53页ppt】Graph and geometric deep learning

专知会员服务

90+阅读 · 2021年6月14日

【开放书】贝叶斯推理与机器学习，690页pdf，Bayesian Reasoning and Machine Learning

专知会员服务

191+阅读 · 2020年5月30日

可解释强化学习，Explainable Reinforcement Learning: A Survey

专知会员服务

131+阅读 · 2020年5月14日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日