伯克利客座教授：AlphaGo Zero and Deep Learning

伯克利客座教授：AlphaGo Zero and Deep Learning | GAIR大讲堂

2017 年 11 月 5 日 AI研习社 不灵叔

分享内容

▼

本场GAIR大讲堂嘉宾将解析AlphaGo Zero如何将白板学习、Resnet、MCTS等技术，将Polic Network和Value Network组合框架下使用Self-play解决零经验下自学习过程。介绍目前最新的深度学习方式如何将机器感知向机器认知方向的演进，目前王强博士团队应用深度学习的最新研究方向分享。由于演讲人多年担任SCI期刊编委，也将对学术论文撰写经验进行分享。

建议预读文献

《Mastering the game of Go without human knowledge》

论文地址：http://t.cn/RWkV1B6

A long-standing goal of artificial intelligence is an algorithm that learns, tabula rasa, superhuman proficiency in challenging domains. Recently, AlphaGo became the first program to defeat a world champion in the game of Go. The tree search in AlphaGo evaluated positions and selected moves using deep neural networks. These neural networks were trained by supervised learning from human expert moves, and by reinforcement learning from self-play. Here we introduce an algorithm based solely on reinforcement learning, without human data, guidance or domain knowledge beyond game rules. AlphaGo becomes its own teacher: a neural network is trained to predict AlphaGo’s own move selections and also the winner of AlphaGo’s games. This neural network improves the strength of the tree search, resulting in higher quality move selection and stronger self-play in the next iteration. Starting tabula rasa, our new program AlphaGo Zero achieved superhuman performance, winning 100–0 against the previously published, champion-defeating AlphaGo.

《How to Escape Saddle Points Efficiently》

论文地址：https://arxiv.org/abs/1703.00887

This paper shows that a perturbed form of gradient descent converges to a second-order stationary point in a number iterations which depends only poly-logarithmically on dimension (i.e., it is almost "dimension-free"). The convergence rate of this procedure matches the well-known convergence rate of gradient descent to first-order stationary points, up to log factors. When all saddle points are non-degenerate, all second-order stationary points are local minima, and our result thus shows that perturbed gradient descent can escape saddle points almost for free. Our results can be directly applied to many machine learning applications, including deep learning. As a particular concrete example of such an application, we show that our results can be used directly to establish sharp global convergence rates for matrix factorization. Our results rely on a novel characterization of the geometry around saddle points, which may be of independent interest to the non-convex optimization community.

《Benchmarking State-of-the-Art Deep Learning Software Tools》

论文地址：https://arxiv.org/abs/1608.07249

Deep learning has been shown as a successful machine learning method for a variety of tasks, and its popularity results in numerous open-source deep learning software tools. Training a deep network is usually a very time-consuming process. To address the computational challenge in deep learning, many tools exploit hardware features such as multi-core CPUs and many-core GPUs to shorten the training time. However, different tools exhibit different features and running performance when training different types of deep networks on different hardware platforms, which makes it difficult for end users to select an appropriate pair of software and hardware. In this paper, we aim to make a comparative study of the state-of-the-art GPU-accelerated deep learning software tools, including Caffe, CNTK, MXNet, TensorFlow, and Torch. We first benchmark the running performance of these tools with three popular types of neural networks on two CPU platforms and three GPU platforms. We then benchmark some distributed versions on multiple GPUs. Our contribution is two-fold. First, for end users of deep learning tools, our benchmarking results can serve as a guide to selecting appropriate hardware platforms and software tools. Second, for software developers of deep learning tools, our in-depth analysis points out possible future directions to further optimize the running performance.

分享提纲

▼

1.AI和深度学习主要论题

2.深度学习在AI的应用

3.AlphaGo与AlphaGo Zero的介绍与对比

4.从机器感知到机器认知

5.团队最新研究方向

分享主题

▼

AlphaGo Zero and Deep Learning -from Machine Perception to Machine Cognition

分享人简介

▼

王强博士，本科毕业于西安交通大学计算机科学与技术专业，后获得卡内基梅隆大学软件工程专业硕士学位、机器人博士学位。美国货币监理署（OCC）审计专家库成员、IBM商业价值研究院院士及纽约Thomas J. Watson研究院主任研究员。IEEE高级会员，并担任了2008、2009、2013及未来2018年CVPR的论文评委，同时是PAMI和TIP两个全球顶级期刊的编委。王强博士在国际顶级期刊发表了90多篇论文，并多次在ICCV，CVPR等大会做论文分享。其主要研究领域图像理解、机器学习、智能交易、金融反欺诈及风险预测等。

分享时间

▼

北京时间11月6日（周一）晚20:00

参与方式

▼

扫描海报二维码添加社长微信，备注「王强」

公开课精彩往期回顾

复旦Ph.D沈志强：用于目标检测的DSOD模型（ICCV 2017）

极限元刘斌：深度学习在语音生成问题上的典型应用

搜狗文仕学：基于深度学习的语音分离

Video ++孙兆民：视频内容识别行业分析

悉尼科大王超岳：基于生成对抗网络的图像编辑方法

达观数据张健：文本分类方法和应用案例

清华Ph.D王书浩：基于深度学习的电商交易欺诈检测系统

Twitter工程师王东：详解YOLO2与YOLO9000目标检测系统

Kaggle比赛金牌团队：图像比赛的通用套路有哪些？

宜远智能刘凯：显著降低模型训练成本的主动增量学习

如果你觉得活动不错，欢迎点击报名~

▼▼▼

登录查看更多

相关内容

AlphaGo Zero

关注 13

AlphaGo Zero是谷歌下属公司Deepmind的新版程序。从空白状态学起，在无任何人类输入的条件下，AlphaGo Zero能够迅速自学围棋，并以100:0的战绩击败“前辈”。 2017年10月19日凌晨，在国际学术期刊《自然》（Nature）上发表的一篇研究论文中，谷歌下属公司Deepmind报告新版程序AlphaGo Zero：从空白状态学起，在无任何人类输入的条件下，它能够迅速自学围棋，并以100:0的战绩击败“前辈”。Deepmind的论文一发表，TPU的销量就可能要大增了。其100:0战绩有“造”真嫌疑。

TensorFlow深度学习，从线性回归到强化学习的深度学习（TensorFlow for Deep Learning From Linear Regression to Reinforcement Learning），附页256页pdf

专知会员服务

46+阅读 · 2020年1月1日

吴恩达新书《Machine Learning Yearning》完整中文版

专知会员服务

147+阅读 · 2019年10月27日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

31+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日