监督学习是指:利用一组已知类别的样本调整分类器的参数,使其达到所要求性能的过程,也称为监督训练或有教师学习。 监督学习是从标记的训练数据来推断一个功能的机器学习任务。训练数据包括一套训练示例。在监督学习中,每个实例都是由一个输入对象(通常为矢量)和一个期望的输出值(也称为监督信号)组成。监督学习算法是分析该训练数据,并产生一个推断的功能,其可以用于映射出新的实例。一个最佳的方案将允许该算法来正确地决定那些看不见的实例的类标签。这就要求学习算法是在一种“合理”的方式从一种从训练数据到看不见的情况下形成。

VIP内容

这本关于机器学习的研究生教科书讲述了数据模式如何支持预测和结果行动的故事。从决策的基础开始,我们将涵盖作为有监督学习的组成部分的表示、优化和泛化。关于数据集作为基准检查他们的历史和科学基础的一章。对因果关系的介绍,因果推理的实践,序列决策,和强化学习使读者了解概念和工具来。整本书讨论了历史背景和社会影响。读者有概率论、微积分和线性代数方面的经验就足够了。

https://mlstory.org/

目录内容:

导论 Introduction

决策 Decision making

监督学习 Supervised learning

表示学习 Representations and features

优化 Optimization

泛化 Generalization

深度学习 Deep learning

数据 Datasets

因果性 Causality

因果性实践 Causal inference in practice

序列决策与动态优化,Sequential decision making and dynamic programming

强化学习,Reinforcement learning

Epilogue

Mathematical background

成为VIP会员查看完整内容
0
52

最新内容

We tackle a common scenario in imitation learning (IL), where agents try to recover the optimal policy from expert demonstrations without further access to the expert or environment reward signals. Except the simple Behavior Cloning (BC) that adopts supervised learning followed by the problem of compounding error, previous solutions like inverse reinforcement learning (IRL) and recent generative adversarial methods involve a bi-level or alternating optimization for updating the reward function and the policy, suffering from high computational cost and training instability. Inspired by recent progress in energy-based model (EBM), in this paper, we propose a simplified IL framework named Energy-Based Imitation Learning (EBIL). Instead of updating the reward and policy iteratively, EBIL breaks out of the traditional IRL paradigm by a simple and flexible two-stage solution: first estimating the expert energy as the surrogate reward function through score matching, then utilizing such a reward for learning the policy by reinforcement learning algorithms. EBIL combines the idea of both EBM and occupancy measure matching, and via theoretic analysis we reveal that EBIL and Max-Entropy IRL (MaxEnt IRL) approaches are two sides of the same coin, and thus EBIL could be an alternative of adversarial IRL methods. Extensive experiments on qualitative and quantitative evaluations indicate that EBIL is able to recover meaningful and interpretative reward signals while achieving effective and comparable performance against existing algorithms on IL benchmarks.

0
0
下载
预览

最新论文

We tackle a common scenario in imitation learning (IL), where agents try to recover the optimal policy from expert demonstrations without further access to the expert or environment reward signals. Except the simple Behavior Cloning (BC) that adopts supervised learning followed by the problem of compounding error, previous solutions like inverse reinforcement learning (IRL) and recent generative adversarial methods involve a bi-level or alternating optimization for updating the reward function and the policy, suffering from high computational cost and training instability. Inspired by recent progress in energy-based model (EBM), in this paper, we propose a simplified IL framework named Energy-Based Imitation Learning (EBIL). Instead of updating the reward and policy iteratively, EBIL breaks out of the traditional IRL paradigm by a simple and flexible two-stage solution: first estimating the expert energy as the surrogate reward function through score matching, then utilizing such a reward for learning the policy by reinforcement learning algorithms. EBIL combines the idea of both EBM and occupancy measure matching, and via theoretic analysis we reveal that EBIL and Max-Entropy IRL (MaxEnt IRL) approaches are two sides of the same coin, and thus EBIL could be an alternative of adversarial IRL methods. Extensive experiments on qualitative and quantitative evaluations indicate that EBIL is able to recover meaningful and interpretative reward signals while achieving effective and comparable performance against existing algorithms on IL benchmarks.

0
0
下载
预览
子主题
Top