Broca's aphasia is a type of aphasia characterized by non-fluent, effortful and agrammatic speech production with relatively good comprehension. Since traditional aphasia treatment methods are often time-consuming, labour-intensive, and do not reflect real-world conversations, applying natural language processing based approaches such as Large Language Models (LLMs) could potentially contribute to improving existing treatment approaches. To address this issue, we explore the use of sequence-to-sequence LLMs for completing Broca's aphasic sentences. We first generate synthetic Broca's aphasic data using a rule-based system designed to mirror the linguistic characteristics of Broca's aphasic speech. Using this synthetic data (without authentic aphasic samples), we then fine-tune four pre-trained LLMs on the task of completing agrammatic sentences. We evaluate our fine-tuned models on both synthetic and authentic Broca's aphasic data. We demonstrate LLMs' capability for reconstructing agrammatic sentences, with the models showing improved performance with longer input utterances. Our result highlights the LLMs' potential in advancing communication aids for individuals with Broca's aphasia and possibly other clinical populations.
翻译:布罗卡失语症是一种以非流利、费力及语法缺失的言语产出为特征,同时理解能力相对保留的失语症类型。由于传统的失语症治疗方法通常耗时耗力,且难以反映真实对话场景,应用基于自然语言处理的方法(如大型语言模型)有望改进现有治疗方案。为此,我们探索使用序列到序列的大型语言模型来完成布罗卡失语症患者的句子。我们首先通过基于规则的系统生成合成布罗卡失语症数据,该系统模拟了布罗卡失语症言语的语言学特征。利用这些合成数据(不含真实失语症样本),我们对四个预训练的大型语言模型进行微调,以完成语法缺失句子的补全任务。我们在合成数据与真实布罗卡失语症数据上评估了微调后的模型。实验证明大型语言模型能够有效重构语法缺失的句子,且模型在输入语句更长时表现出更好的性能。我们的结果凸显了大型语言模型在推进布罗卡失语症患者及其他临床人群沟通辅助工具方面的潜力。