The amount of data managed in many academic institutions has increased in recent years, particularly in all the research work done by undergraduate students, who simply use empirical techniques for keyword selection, forgetting existing technical methods to assist their students in this process. Information and communication technologies, such as the platform for integrated research and academic work with responsibility (PILAR), which records information about research projects, such as titles, summaries, and keywords in their various modalities, have gained relevance and importance in the management of these. We proved algorithms with these records of research projects that have been analysed in this study, and predictions were made for each of the nine (09) models of unsupervised machine learning algorithms that were implemented for each of the 7430 records from the dataset. The most efficient way of extracting keywords for this dataset was the TF-IDF method, obtaining 72% accuracy and [0.4786, SD 0.0501] in average extraction time for each thesis file processed by this model.
翻译:近年来,许多学术机构管理的数据数量有所增加,特别是在本科生所做的所有研究工作中,这些本科生只是使用经验技术来选择关键词,忘记了现有的技术方法来协助学生进行这项工作。信息和通信技术,例如负责的综合研究和学术工作平台(PILAR),记录研究项目的信息,如标题、摘要和各种模式的关键词等,在管理这些研究项目方面已变得相关和重要。我们用本研究报告分析的这些研究项目记录证明了算法,并对九个(09)模型中的每一个模型作出了预测,这些模型都是为数据集的7430个记录采用的未经监督的机器学习算法。为这一数据集提取关键词的最有效方式是TF-IDF方法,获得72%的准确率和[0.4786,SD 0.0501]这一模型所处理的每个论文的平均提取时间。