Distance metrics and their nonlinear variant play a crucial role in machine learning based real-world problem solving. We demonstrated how Euclidean and cosine distance measures differ not only theoretically but also in real-world medical application, namely, outcome prediction of drug prescription. Euclidean distance exhibits favorable properties in the local geometry problem. To this regard, Euclidean distance can be applied under short-term disease with low-variation outcome observation. Moreover, when presenting to highly variant chronic disease, it is preferable to use cosine distance. These different geometric properties lead to different submanifolds in the original embedded space, and hence, to different optimizing nonlinear kernel embedding frameworks. We first established the geometric properties that we needed in these frameworks. From these properties interpreted their differences in certain perspectives. Our evaluation on real-world, large-scale electronic health records and embedding space visualization empirically validated our approach.
翻译:远程测量及其非线性变体在基于机器学习的基于现实世界问题的解决中发挥着关键作用。 我们演示了欧几里德和余弦距离测量方法不仅在理论上,而且在实际医疗应用(即药物处方结果预测)上如何不同。 欧几里德的距离展示了当地几何问题中有利的属性。 在这方面,欧几里德的距离可以在短期疾病下以低变量观测结果来应用。 此外,当出现高变异慢性疾病时,最好使用焦因距离。 这些不同的几何特性导致原始嵌入空间的不同次层,从而导致不同的优化非线性内内嵌框架。我们首先确定了这些框架所需的几何特性,从这些特性的角度解释它们在某些方面的差异。 我们对真实世界的评估、大规模电子健康记录和嵌入空间的可视化经验证实了我们的做法。