作为纵向医疗记录类似措施的多变量、基于抽象、基于间基的动态时间-战争方法的实施和评价 (Implementation and Evaluation of a Multivariate Abstraction-Based, Interval-Based Dynamic Time-Warping Method as a Similarity Measure for Longitudinal Medical Records)

2021 年 8 月 26 日

Implementation and Evaluation of a Multivariate Abstraction-Based, Interval-Based Dynamic Time-Warping Method as a Similarity Measure for Longitudinal Medical Records

翻译：作为纵向医疗记录类似措施的多变量、基于抽象、基于间基的动态时间-战争方法的实施和评价

Yuval Shahar,Matan Lion

from arxiv, 38 pages; 5 figures; 8 tables including three in the Appendix. Compared to the previous version, we have recomputed more accurately the number of experimental instances, and added two tables of summary and examples to the Results

We extended dynamic time warping (DTW) into interval-based dynamic time warping (iDTW), including (A) interval-based representation (iRep): [1] abstracting raw, time-stamped data into interval-based abstractions, [2] comparison-period scoping, [3] partitioning abstract intervals into a given temporal granularity; (B) interval-based matching (iMatch): matching partitioned, abstract-concepts records, using a modified DTW. Using domain knowledge, we abstracted the raw data of medical records, for up to three concepts out of four or five relevant concepts, into two interval types: State abstractions (e.g. LOW, HIGH) and Gradient abstractions (e.g. INCREASING, DECREASING). We created all uni-dimensional (State or Gradient) or multi-dimensional (State and Gradient) abstraction combinations. Tasks: Classifying 161 oncology patients records as autologous or allogenic bone-marrow transplantation; classifying 125 hepatitis patients records as B or C hepatitis; predicting micro- or macro-albuminuria in the next year for 151 Type 2 diabetes patients. We used a k-Nearest-Neighbors majority, k = an odd number from 1 to SQRT(N), N = set size. 75,936 10-fold cross-validation experiments were performed: 33,600 (Oncology), 28,800 (Hepatitis), 13,536 (Diabetes). Measures: Area Under the Curve (AUC), optimal Youden's Index. Paired t-tests compared result vectors for equivalent configurations other than a tested variable, to determine a significant mean accuracy difference (P<0.05). Mean classification and prediction using abstractions was significantly better than using only raw time-stamped data. In each domain, at least one abstraction combination led to a significantly better mean performance than raw data. Increasing feature number and using Multi-dimensional abstractions enhanced performance. Unlike when using raw data, optimal mean performance was often reached with k=5, using abstractions.

翻译：我们将动态时间扭曲(DTW)扩展为基于间距的动态时间扭曲(IDTW),包括(A)基于间距的表达式(iRep):[1]将原始的、时间戳版的数据提取到基于间距的抽象抽取器中,[2]比较期范围,[3]将抽象间隔分割到给定的时间颗粒度中;(B)基于间距的匹配(iMatch):使用经修改的DTW,匹配分隔分隔式的、抽象的、感知式的、抽象的。我们利用域知识,将医疗记录的原始数据转换为自动或感官的骨质移植,最多分为四个或五个相关概念,分为两个间距类型:国家抽取(例如,LOW,高级)和渐进式抽取数据提取(例如,比较期),[比较期,比较期,比较期,比较期,比较期,我们创造了一个单度(州或梯度)或多维(州和梯度)的抽测式混合。任务:将161个肿瘤患者记录分为16个,作为自自动或直径直系的、直系的、直系的骨质移植;将125肝移植的肝移植记录分类,使用B或直等等值记录分类,使用B或C肝脏的比比比比比比对等的。