Machine learning is a general-purpose technology holding promises for many interdisciplinary research problems. However, significant barriers exist in crossing disciplinary boundaries when most machine learning tools are developed in different areas separately. We present Pykale - a Python library for knowledge-aware machine learning on graphs, images, texts, and videos to enable and accelerate interdisciplinary research. We formulate new green machine learning guidelines based on standard software engineering practices and propose a novel pipeline-based application programming interface (API). PyKale focuses on leveraging knowledge from multiple sources for accurate and interpretable prediction, thus supporting multimodal learning and transfer learning (particularly domain adaptation) with latest deep learning and dimensionality reduction models. We build PyKale on PyTorch and leverage the rich PyTorch ecosystem. Our pipeline-based API design enforces standardization and minimalism, embracing green machine learning concepts via reducing repetitions and redundancy, reusing existing resources, and recycling learning models across areas. We demonstrate its interdisciplinary nature via examples in bioinformatics, knowledge graph, image/video recognition, and medical imaging.
翻译:机械学习是一种通用技术,它有助于解决许多跨学科研究问题,然而,当大多数机械学习工具在不同领域单独开发时,在跨越学科界限方面存在着重大障碍。我们介绍了Pykale——一个Pykale图书馆,这是一个在图表、图像、文本和视频方面进行有知识意识的机器学习的Python图书馆,以利和加速跨学科研究。我们根据标准软件工程做法制定了新的绿色机器学习准则,并提出了一个新的基于管道的应用编程界面。PyKale侧重于利用多种来源的知识进行准确和可解释的预测,从而支持以最新的深层学习和多维度减少模型进行多式学习和转移学习(特别是领域适应)。我们用PyTorrch建立PyKale,利用丰富的PyTorrch生态系统。我们基于管道的APi设计实施了标准化和最小化,通过减少重复和冗余、重新利用现有资源和在各地区回收学习模式,从而接受绿色机器学习概念。我们通过生物信息学、知识图、图像/视频识别和医学成像等实例来展示其跨学科性质。