直播预告 | Mila实验室来啦！

2022 年 3 月 28 日 学术头条

MILA DeepGraph

Mila是由图灵奖获得者、深度学习三巨头之一Yoshua Bengio领导的人工智能中心。目前有教师80余人，研究人员900余人，是全世界在学术界最大的人工智能研究中心之一。唐建老师是Mila的核心教师成员之一（共二十位左右），研究团队目前包括学生20人左右。研究方向主要包括几何深度学习、深度生成模型、知识图谱以及这些方法在药物发现中的应用。唐建老师团队曾获得ICML2014最佳论文，WWW2016最佳论文提名，发表了多篇图表示学习领域的经典论文如LINE，RotatE等。团队目前的核心研究方向是AI for Drug Discovery，在这个领域做出了一系列代表性工作，并且在近期开源了一个专门用于药物研发的机器学习系统TorchDrug，受到广泛关注，未来团队将致力于推进AI for Science。

3月29日、30日、31日晚20:00，AI TIME 特别邀请唐建老师和他的七位学生给大家带来精彩的报告分享！

特邀嘉宾

Jian Tang is currently an assistant professor at Mila-Quebec AI Institute and also at Computer Science Department and Business School of University of Montreal. He is a Canada CIFAR AI Research Chair. His main research interests are graph representation learning, graph neural networks, geometric deep learning, deep generative models, knowledge graphs and drug discovery. During his PhD, he was awarded with the best paper in ICML2014; in 2016, he was nominated for the best paper award in the top data mining conference World Wide Web (WWW); in 2020, he is awarded with Amazon and Tencent Faculty Research Award. He is one of the most representative researchers in the growing field of graph representation learning and has published a set of representative works in this field such as LINE and RotatE. His work LINE on node representation learning has been widely recognized and is the most cited paper at the WWW conference between 2015 and 2019. Recently, his group just released an open-source machine learning package, called TorchDrug, aiming at making AI drug discovery software and libraries freely available to the research community. He is an area chair of ICML and NeurIPS.

3月29日 20:00-21:30

朱兆成：

蒙特利尔学习算法研究所在读博士生，师从唐建老师。本科毕业于北京大学。他的主要研究方向包括图表征学习、知识图谱推理、药物发现和大规模机器学习系统。更多信息请参考个人主页：https://kiddozhu.github.io/

分享内容：

用机器学习平台助力药物发现

报告简介：

传统药物发现过程既需要漫长的研发周期，又需要大量的资金投入。利用机器学习技术对药物发现的各个环节进行预测，能有效降低药物发现的时间和经济成本。然而，在药物发现里进行机器学习算法开发并非易事。一方面，很多药物发现任务缺少统一的实现和标准的基准测试。另一方面，处理有关数据不仅需要生物制药的知识，也需要高效的并行算法实现。对此，我们开发了一套强大而灵活的机器学习平台TorchDrug，用于推动药物发现任务的研发。TorchDrug针对药物发现中若干重要任务（包括性质预测、预训练分子表征、分子生成与优化、逆合成预测和生物知识图谱推理）进行了全面的基准测试。平台不仅为图和分子提供了灵活的数据结构和GPU并行操作，还内置了大量常用的机器学习算法模块，包括但不限于几何机器学习（图机器学习）、深度生成模型、强化学习和知识图谱推理算法。无论是复现已有模型还是设计新的算法，都可以在TorchDrug中快速实现。相关教程、基准测试和文档请见官网：https://torchdrug.ai/

史晨策：

史晨策是蒙特利尔学习算法研究所（Mila）二年级博士研究生，师从唐建老师。他是北京大学第一届图灵班毕业生。他的主要研究方向包括图表征学习，几何深度学习与图结构预测，以及他们在基础自然学科中的应用。个人主页：https://chenceshi.com

分享内容：

复杂图结构预测中的对称性原理--以分子与蛋白质结构预测为例

报告简介：

对称性 (Symmetry) 在物理系统中无处不在。例如，空间平移不变性（动量守恒），分子构象（conformation）, 蛋白质（protein）或点云(point cloud)的旋转对称性。在建模物理系统时，赋予深度学习模型这种归纳偏置对于模型的训练和泛化能力都至关重要。本次报告将从物理系统的对称性出发，简单回顾复杂图（如分子，蛋白质, 晶体）结构预测模型对物理系统对称性的建模。涉及的技术主要包括平移旋转不变的梯度场估计(ConfGF)，平移旋转不变的图神经网络(EGNN)，以及基于（刚体）相对坐标系的结构建模（AlphaFold2）。

3月30日 20:00-21:30

Louis-Pascal Xhonneux：

Louis-Pascal is currently a third year PhD student with Prof. Jian Tang working on Graph Neural Networks with a focus towards drug discovery and algorithmic reasoning. He did his Undergraduate and Masters degrees at the University of Cambridge in Computer Science. His Masters' thesis studied the BGP complexity class in computational complexity. He has previously interned with Dr. Eoin McKinney and worked on modelling the Type I Diabetes in Children.

分享内容：

Algorithmic Reasoning on Graphs

报告简介：

Learning to execute algorithms is a fundamental problem that has been widely studied. Prior work has shown that to enable systematic generalisation on graph algorithms it is critical to have access to the intermediate steps of the program/algorithm. In many reasoning tasks, where algorithmic-style reasoning is important, we only have access to the input and output examples. Thus, inspired by the success of pre-training on similar tasks or data in Natural Language Processing (NLP) and Computer Vision, we set out to study how we can transfer algorithmic reasoning knowledge. Specifically, we investigate how we can use algorithms for which we have access to the execution trace to learn to solve similar tasks for which we do not. We investigate two major classes of graph algorithms, parallel algorithms such as breadth-first search and Bellman-Ford and sequential greedy algorithms such as Prim and Dijkstra. Due to the fundamental differences between algorithmic reasoning knowledge and feature extractors such as used in Computer Vision or NLP, we hypothesise that standard transfer techniques will not be sufficient to achieve systematic generalisation. To investigate this empirically we create a dataset including 9 algorithms and 3 different graph types. We validate this empirically and show how instead multi-task learning can be used to achieve the transfer of algorithmic reasoning knowledge.

Andreea-Ioana Deac：

PhD student in Machine Learning at Mila, with Prof Jian Tang. I am broadly interested in how learning can be improved through the use of graph representations, having previously worked on neural algorithmic reasoners for implicit planning and applications to biotechnology, focusing on drug discovery.

分享内容：

Graph Neural Networks for Reinforcement Learning

报告简介：

Implicit planning has emerged as an elegant technique for combining learned models of the world with end-to-end model-free reinforcement learning. We study the class of implicit planners inspired by value iteration, an algorithm that is guaranteed to yield perfect policies in fully-specified tabular environments. We find that prior approaches either assume that the environment is provided in such a tabular form---which is highly restrictive---or infer “local neighbourhoods” of states to run value iteration over---for which we discover an algorithmic bottleneck effect. This effect is caused by explicitly running the planning algorithm based on scalar predictions in every state, which can be harmful to data efficiency if such scalars are improperly predicted. We propose eXecuted Latent Value Iteration Networks (XLVINs), which alleviate the above limitations. Our method performs all planning computations in a high-dimensional latent space, breaking the algorithmic bottleneck. It maintains alignment with value iteration by carefully leveraging neural graph-algorithmic reasoning and contrastive self-supervised learning. Across seven low-data settings---including classical control, navigation and Atari---XLVINs provide significant improvements to data efficiency against value iteration-based implicit planners, as well as relevant model-free baselines. Lastly, we empirically verify that XLVINs can closely align with value iteration.

徐民凯：

Minkai is a graduate student at MILA. His research interests primarily lie in developing principled and interpretable probabilistic models, with an emphasis on their intersections with geometric representation learning. Previously, he received his bachelor's degree from Shanghai Jiao Tong University. Personal website: https://minkaixu.com/

分享内容：

GeoDiff: A Geometric Diffusion Model for Molecular Conformation Generation

报告简介：

Predicting molecular conformations from molecular graphs is a fundamental problem in cheminformatics and drug discovery. Recently, significant progress has been achieved with machine learning approaches, especially with deep generative models. Inspired by the diffusion process in classical non-equilibrium thermodynamics where heated particles will diffuse from original states to a noise distribution, recently we propose a novel generative model named GeoDiff for molecular conformation prediction. GeoDiff treats each atom as a particle and learns to directly reverse the diffusion process (i.e., transforming from a noise distribution to stable conformations) as a Markov chain. Modeling such a generation process is however very challenging as the likelihood of conformations should be roto-translational invariant. We theoretically show that Markov chains evolving with equivariant Markov kernels can induce an invariant distribution by design, and further propose building blocks for the Markov kernels to preserve the desirable equivariance property. The whole framework can be efficiently trained in an end-to-end fashion by optimizing a weighted variational lower bound to the (conditional) likelihood. Experiments on multiple benchmarks show that GeoDiff is superior or comparable to existing state-of-the-art approaches, especially on large molecules.

3月31日 20:00-21:00

刘圣超：

刘圣超是蒙特利尔学习算法研究所（Mila）在读博士生，师从唐建老师。他的研究方向包括基于结构数据的图表示学习、自监督学习、多任务学习、生成任务学习，并将其运用到药物研发的任务中。更多信息请参考个人主页：https://chao1224.github.io/。

分享内容：

使用3D几何信息帮助图分子进行预训练 -- 关于结构化数据自监督学习的思考

报告简介：

Molecular graph representation learning is a fundamental problem in modern drug and material discovery. Molecular graphs are typically modeled by their 2D topological structures, but it has been recently discovered that 3D geometric information plays a more vital role in predicting molecular functionalities. However, the lack of 3D information in real-world scenarios has significantly impeded the learning of geometric graph representation. To cope with this challenge, we propose the Graph Multi-View Pre-training (GraphMVP) framework where self-supervised learning (SSL) is performed by leveraging the correspondence and consistency between 2D topological structures and 3D geometric views. GraphMVP effectively learns a 2D molecular graph encoder that is enhanced by richer and more discriminative 3D geometry. We further provide theoretical insights to justify the effectiveness of GraphMVP. Finally, comprehensive experiments show that GraphMVP can consistently outperform existing graph SSL methods.

瞿锰：

蒙特利尔学习算法研究所（Mila）在读博士生，师从唐建老师。他本科毕业于北京大学。他的研究方向包括结合感知和认知的知识推理、知识图谱、概率模型。更多信息请参考个人主页：https://mnqu.github.io/

分享内容：

Neural Structured Prediction for Inductive Node Classification

报告简介：

归纳式节点分类是机器学习领域的重要问题，旨在通过全标注图数据训练分类器、对未标注图数据进行节点分类。该问题在图机器学习、结构化预测领域被广泛研究，代表性方法分别为图神经网络 (GNN) 以及条件随机场 (CRF)。在该报告中，我们提出了一种称为结构化代理网络 (SPN) 的新方法，结合了两个领域的优势。SPN 在 CRF 框架中引入了由 GNN 表征的灵活势函数。然而，训练这样的模型并非易事，因为它涉及到极大极小优化问题。受马尔可夫网络中联合分布和边际分布之间潜在联系的启发，我们提出一个代理问题，作为原问题的近似。该问题形式简单、易被优化。两种设置下的实验表明，我们的方法优于许多已有的模型。

直播结束后大家可以在群内进行提问，请添加“AI TIME小助手（微信号：AITIME_HY）”，回复“PhD-4”，将拉您进“AI TIME PhD 交流群-4”！