In this work, we share our experience on tele-knowledge pre-training for fault analysis, a crucial task in telecommunication applications that requires a wide range of knowledge normally found in both machine log data and product documents. To organize this knowledge from experts uniformly, we propose to create a Tele-KG (tele-knowledge graph). Using this valuable data, we further propose a tele-domain language pre-training model TeleBERT and its knowledge-enhanced version, a tele-knowledge re-training model KTeleBERT. which includes effective prompt hints, adaptive numerical data encoding, and two knowledge injection paradigms. Concretely, our proposal includes two stages: first, pre-training TeleBERT on 20 million tele-related corpora, and then re-training it on 1 million causal and machine-related corpora to obtain KTeleBERT. Our evaluation on multiple tasks related to fault analysis in tele-applications, including root-cause analysis, event association prediction, and fault chain tracing, shows that pre-training a language model with tele-domain data is beneficial for downstream tasks. Moreover, the KTeleBERT re-training further improves the performance of task models, highlighting the effectiveness of incorporating diverse tele-knowledge into the model.
翻译:在这项工作中,我们分享了自己在远程知识前培训错误分析方面的经验,这是电信应用中的一项关键任务,需要通常在机器日志数据和产品文件中发现的广泛知识。为了统一组织专家的这种知识,我们提议创建Tele-KG(远程知识图 ) 。我们利用这一宝贵的数据,进一步提议了远程数据培训前培训模式TeleBERT及其知识强化版本,即远程知识再培训模式KteleBERT。其中包括有效的即时提示、适应性数字数据编码和两种知识注入模式。具体地说,我们的提案包括两个阶段:第一阶段,对2 000万个与远程相关公司进行预先培训,然后对100万个与机相关的公司进行再培训,以获得KteleBERT。我们对远程应用中的错误分析(包括根分析、事件关联预测和错误链追踪)的多重任务的评估表明,对远程数据模式进行预先培训有助于下游任务。此外,KteleBERT将多样化的绩效纳入远程数据模型,进一步提升了远程培训任务。