无人机协助检查循环中保持人类的多任务发言人-关键词分类模式多任务模式 (A Multi-tasking Model of Speaker-Keyword Classification for Keeping Human in the Loop of Drone-assisted Inspection)

from arxiv, Submitted to Engineering Applications of Artificial Intelligence journal in the end of June 2022. Received the 1st review in Sep 2022 and submitted the revision on Oct 2022. Currently it's under 2nd review

Audio commands are a preferred communication medium to keep inspectors in the loop of civil infrastructure inspection performed by a semi-autonomous drone. To understand job-specific commands from a group of heterogeneous and dynamic inspectors, a model must be developed cost-effectively for the group and easily adapted when the group changes. This paper is motivated to build a multi-tasking deep learning model that possesses a Share-Split-Collaborate architecture. This architecture allows the two classification tasks to share the feature extractor and then split subject-specific and keyword-specific features intertwined in the extracted features through feature projection and collaborative training. A base model for a group of five authorized subjects is trained and tested on the inspection keyword dataset collected by this study. The model achieved a 95.3% or higher mean accuracy in classifying the keywords of any authorized inspectors. Its mean accuracy in speaker classification is 99.2%. Due to the richer keyword representations that the model learns from the pooled training data, adapting the base model to a new inspector requires only a little training data from that inspector, like five utterances per keyword. Using the speaker classification scores for inspector verification can achieve a success rate of at least 93.9% in verifying authorized inspectors and 76.1% in detecting unauthorized ones. Further, the paper demonstrates the applicability of the proposed model to larger-size groups on a public dataset. This paper provides a solution to addressing challenges facing AI-assisted human-robot interaction, including worker heterogeneity, worker dynamics, and job heterogeneity.

翻译：音频指令是一种首选的通信媒介,可以让检查员在半自主无人驾驶飞机进行的民用基础设施检查循环中保持检查者。为了理解一组多样化和动态检查员提出的具体任务指令,必须对小组进行成本-效益高的模型开发,并在小组变化时容易调整。本文件旨在建立一个多任务深度学习模型,具有共享-Split-Collaborate结构。这一架构允许两个分类任务共享地物提取器,然后通过地物投影和协作培训,将特定主题和关键词特性分开,在抽取的特征中相互交织。五个获授权的科目组的基础模型要接受本研究收集的检查关键词数据集的培训和测试。该模型在对任何获授权的检查员关键词进行分类方面实现了95.3%或更高的平均精确度。在语音分类中的平均精确度为99.2%。由于该模型从集合培训数据中学习了更丰富的关键词,因此将基准模型改用新的检查员只需从该检查员那里获得少量的培训数据,例如5个词调每关键词。使用发言人的分类评分数来进行检查者核对。通过本研究所收集的检验的关键数据,可以达到76.9%的比标准标准标准标准,他在核查工人的纸张中可以进一步测试文件的操作中提供一个最高级的测试。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/