The purpose of this study is to investigate the development process for Artificial inelegance (AI) and machine learning (ML) applications in order to provide the best support environment. The main stages of ML are problem understanding, data management, model building, model deployment and maintenance. This project focuses on investigating the data management stage of ML development and its obstacles as it is the most important stage of machine learning development because the accuracy of the end model is relying on the kind of data fed into the model. The biggest obstacle found on this stage was the lack of sufficient data for model learning, especially in the fields where data is confidential. This project aimed to build and develop a framework for researchers and developers that can help solve the lack of sufficient data during data management stage. The framework utilizes several data augmentation techniques that can be used to generate new data from the original dataset which can improve the overall performance of the ML applications by increasing the quantity and quality of available data to feed the model with the best possible data. The framework was built using python language to perform data augmentation using deep learning advancements.
翻译:本研究旨在调查人工智能(AI)和机器学习(ML)应用程序的开发过程,以提供最佳支持环境。ML的主要阶段是问题理解、数据管理、模型构建、模型部署和维护。本项目专注于研究ML开发的数据管理阶段及其障碍,因为数据管理阶段是机器学习开发的最重要阶段之一,最终模型的准确性取决于所提供的数据类型。在这个阶段发现的最大障碍是缺乏足够的数据来进行模型学习,特别是在数据保密的领域。本项目旨在构建和开发一个框架,为研究人员和开发人员提供帮助,以解决在数据管理阶段缺乏足够数据的问题。该框架利用多个数据增强技术,可以从原始数据集中生成新数据,从而提高ML应用程序的整体性能,增加可用数据的数量和质量,并为模型提供最佳可行的数据。该框架使用Python语言构建,使用深度学习技术来执行数据增强。