Machine learning (ML) and Deep Learning (DL) tasks primarily depend on data. Most of the ML and DL applications involve supervised learning which requires labelled data. In the initial phases of ML realm lack of data used to be a problem, now we are in a new era of big data. The supervised ML algorithms require data to be labelled and of good quality. Labelling task requires a large amount of money and time investment. Data labelling require a skilled person who will charge high for this task, consider the case of the medical field or the data is in bulk that requires a lot of people assigned to label it. The amount of data that is well enough for training needs to be known, money and time can not be wasted to label the whole data. This paper mainly aims to propose a strategy that helps in labelling the data along with oracle in real-time. With balancing on model contribution for labelling is 89 and 81.1 for furniture type and intel scene image data sets respectively. Further with balancing being kept off model contribution is found to be 83.47 and 78.71 for furniture type and flower data sets respectively.
翻译:机器学习(ML)和深学习(DL)任务主要取决于数据。ML和DL应用软件大多涉及监督学习,需要贴标签的数据。在ML领域缺乏数据是一个问题,现在我们正处于一个大数据的新时代。监管ML算法要求数据贴上标签,并且质量良好。标签任务需要大量资金和时间投资。数据标签要求一名技术熟练的人对这项工作收取高额费用,考虑医疗领域的情况,或者数据是大宗,需要大量人员来给它贴标签。在ML领域缺乏数据,这在ML初始阶段,我们是一个问题。在ML领域缺乏数据,我们现在正处于一个大数据的新时代。受监管的ML算法要求数据贴上标签,而且质量良好。标签工作需要大量资金和时间。标签工作需要一名技术熟练的人,负责这项工作,考虑医疗领域的情况,或者数据是大宗数据,需要大量人员来给它贴标签。在培训方面有足够的数据数量,因此不能浪费钱和时间来给整个数据贴上标签。这份文件的主要目的是提出一项战略,帮助实时将数据贴上标签,同时贴上标签的标签,家具类型和花类和花类数据集分别需要89.1和81.1和81.1和81.1。此外的模型的平衡。