We developed a task-oriented dialogue framework structured as a Directed Acyclic Graph (DAG) of medical questions. The system integrates: (1) a systematic pipeline for transforming medical algorithms and guidelines into a clinical question corpus; (2) a cold-start mechanism based on hierarchical clustering to generate efficient initial questioning without prior patient information; (3) an expand-and-prune mechanism enabling adaptive branching and backtracking based on patient responses; (4) a termination logic to ensure interviews end once sufficient information is gathered; and (5) automated synthesis of doctor-friendly structured reports aligned with clinical workflows. Human-computer interaction principles guided the design of both the patient and physician applications. Preliminary evaluation involved five physicians using standardized instruments: NASA-TLX (cognitive workload), the System Usability Scale (SUS), and the Questionnaire for User Interface Satisfaction (QUIS). The patient application achieved low workload scores (NASA-TLX = 15.6), high usability (SUS = 86), and strong satisfaction (QUIS = 8.1/9), with particularly high ratings for ease of learning and interface design. The physician application yielded moderate workload (NASA-TLX = 26) and excellent usability (SUS = 88.5), with satisfaction scores of 8.3/9. Both applications demonstrated effective integration into clinical workflows, reducing cognitive demand and supporting efficient report generation. Limitations included occasional system latency and a small, non-diverse evaluation sample.
翻译:我们开发了一种面向任务的对话框架,其结构为医疗问题的有向无环图。该系统整合了以下组件:(1) 一个将医疗算法与指南转化为临床问题语料库的系统化流程;(2) 一种基于层次聚类的冷启动机制,可在无先验患者信息的情况下生成高效的初始提问;(3) 一种扩展与剪枝机制,能够根据患者回答实现自适应分支与回溯;(4) 一种终止逻辑,确保在收集到足够信息后结束访谈;(5) 自动生成符合临床工作流程、便于医生使用的结构化报告。患者端与医生端应用的设计均遵循人机交互原则。初步评估邀请了五位医生使用标准化工具进行:NASA-TLX(认知负荷)、系统可用性量表(SUS)以及用户界面满意度问卷(QUIS)。患者端应用获得了较低的负荷评分(NASA-TLX = 15.6)、高可用性(SUS = 86)和强满意度(QUIS = 8.1/9),尤其在易学性和界面设计方面评分极高。医生端应用产生了中等负荷(NASA-TLX = 26)和卓越的可用性(SUS = 88.5),满意度评分为8.3/9。两个应用均展现出与临床工作流程的有效整合,能够降低认知需求并支持高效报告生成。局限性包括偶发的系统延迟以及评估样本量小且多样性不足。