We extended dynamic time warping (DTW) into interval-based dynamic time warping (iDTW), including (A) interval-based representation (iRep): [1] abstracting raw, time-stamped data into interval-based abstractions, [2] comparison-period scoping, [3] partitioning abstract intervals into a given temporal granularity; (B) interval-based matching (iMatch): matching partitioned, abstract-concepts records, using a modified DTW. Using domain knowledge, we abstracted the raw data of medical records, for up to three concepts out of four or five relevant concepts, into two interval types: State abstractions (e.g. LOW, HIGH) and Gradient abstractions (e.g. INCREASING, DECREASING). We created all uni-dimensional (State or Gradient) or multi-dimensional (State and Gradient) abstraction combinations. Tasks: Classifying 161 oncology patients records as autologous or allogenic bone-marrow transplantation; classifying 125 hepatitis patients records as B or C hepatitis; predicting micro- or macro-albuminuria in the next year for 151 Type 2 diabetes patients. We used a k-Nearest-Neighbors majority, k = an odd number from 1 to SQRT(N), N = set size. 75,936 10-fold cross-validation experiments were performed: 33,600 (Oncology), 28,800 (Hepatitis), 13,536 (Diabetes). Measures: Area Under the Curve (AUC), optimal Youden's Index. Paired t-tests compared result vectors for equivalent configurations other than a tested variable, to determine a significant mean accuracy difference (P<0.05). Mean classification and prediction using abstractions was significantly better than using only raw time-stamped data. In each domain, at least one abstraction combination led to a significantly better mean performance than raw data. Increasing feature number and using Multi-dimensional abstractions enhanced performance. Unlike when using raw data, optimal mean performance was often reached with k=5, using abstractions.
翻译:我们将动态时间扭曲(DTW)扩展为基于间距的动态时间扭曲(IDTW),包括(A)基于间距的表达式(iRep):[1]将原始的、时间戳版的数据提取到基于间距的抽象抽取器中,[2]比较期范围,[3]将抽象间隔分割到给定的时间颗粒度中;(B)基于间距的匹配(iMatch):使用经修改的DTW,匹配分隔分隔式的、抽象的、感知式的、抽象的。我们利用域知识,将医疗记录的原始数据转换为自动或感官的骨质移植,最多分为四个或五个相关概念,分为两个间距类型:国家抽取(例如,LOW,高级)和渐进式抽取数据提取(例如,比较期),[比较期,比较期,比较期,比较期,比较期,我们创造了一个单度(州或梯度)或多维(州和梯度)的抽测式混合。任务:将161个肿瘤患者记录分为16个,作为自自动或直径直系的、直系的、直系的骨质移植;将125肝移植的肝移植记录分类,使用B或直等等值记录分类,使用B或C肝脏的比比比比比比对等的。