Machine learning has been utilized to perform tasks in many different domains such as classification, object detection, image segmentation and natural language analysis. Data labeling has always been one of the most important tasks in machine learning. However, labeling large amounts of data increases the monetary cost in machine learning. As a result, researchers started to focus on reducing data annotation and labeling costs. Transfer learning was designed and widely used as an efficient approach that can reasonably reduce the negative impact of limited data, which in turn, reduces the data preparation cost. Even transferring previous knowledge from a source domain reduces the amount of data needed in a target domain. However, large amounts of annotated data are still demanded to build robust models and improve the prediction accuracy of the model. Therefore, researchers started to pay more attention on auto annotation and labeling. In this survey paper, we provide a review of previous techniques that focuses on optimized data annotation and labeling for video, audio, and text data.
翻译:机器学习被用于在许多不同领域执行任务,如分类、物体探测、图像分割和自然语言分析。数据标签一直是机器学习中最重要的任务之一。然而,标明大量数据会增加机器学习的货币成本。结果,研究人员开始注重减少数据说明和标签费用。转移学习的设计和广泛使用是一种有效方法,可以合理地减少有限数据的负面影响,反过来又会降低数据编制费用。即使从源域转让先前的知识,也减少了目标域所需的数据数量。然而,仍然需要大量附加说明的数据来建立健全的模型,提高模型的预测准确性。因此,研究人员开始更多地注意自动说明和标签。在本调查文件中,我们审查了以前侧重于优化数据说明和标签的视频、音频和文本数据的技术。