In this research. we analyze the potential of Feature Density (HD) as a way to comparatively estimate machine learning (ML) classifier performance prior to training. The goal of the study is to aid in solving the problem of resource-intensive training of ML models which is becoming a serious issue due to continuously increasing dataset sizes and the ever rising popularity of Deep Neural Networks (DNN). The issue of constantly increasing demands for more powerful computational resources is also affecting the environment, as training large-scale ML models are causing alarmingly-growing amounts of CO2, emissions. Our approach 1s to optimize the resource-intensive training of ML models for Natural Language Processing to reduce the number of required experiments iterations. We expand on previous attempts on improving classifier training efficiency with FD while also providing an insight to the effectiveness of various linguistically-backed feature preprocessing methods for dialog classification, specifically cyberbullying detection.
翻译:在这项研究中,我们分析了地物密度(HD)的潜力,作为在培训前比较估计机器学习(ML)分类人员业绩的一种方法,研究的目的是帮助解决对ML模型进行资源密集型培训的问题,由于数据集规模不断增加,深神经网络越来越受欢迎,这个问题正在成为一个严重问题。对更强大的计算资源的不断增长的需求问题也正在影响环境,因为培训大型ML模型正在造成惊人的二氧化碳排放量增长。我们的方法1是优化自然语言处理ML模型的资源密集型培训,以减少所需的实验重复次数。我们扩大了以往关于提高DFD的分类培训效率的尝试,同时也为对话分类,特别是网络屏障探测的各种语言支持的特性预处理方法的有效性提供了见解。