该课程将发展数据科学的核心领域(如:模型的回归和分类)从几个角度:概念的形成和属性,解决算法及其实现,数据可视化的探索性数据分析和建模输出的有效表示。课程将以使用Python、scikit-learn和TensorFlow的实践课程作为补充。
引言 Introduction. Motivation, applications, examples, common data formats (csv, json), loading data with Python, calculating statistics over a dataset with numpy, logistics and overview of the course.
线性回归 Linear Regression. Defining a model, fitting a model, least squares regression, linear regression, gradient descent, scikit-learn.
Practical: Linear Regression
分类 Classification, part I. Classification, logistic regression, perceptron, multi-class classification, classification performance measures.
Practical: Classification I
Classification, part II. An overview of other classification techniques (e.g., decision trees, SVMs) and more advanced techniques including ensemble-based models (boosting, bagging, exemplified with AdaBoost and Random Forests).
Practical: Classification II
深度学习基础 Deep learning basics. Neural networks, applications in the world, optimization, stochastic gradient descent, backpropagation, learning rates
TensorFlow深度学习 Deep learning with TensorFlow. Introduction to TensorFlow, minimal TensorFlow example, symbolic graphs, training a network, practical tips for deep learning.
Practical: Deep learning with TensorFlow
深度学习架构 Deep learning architectures. Convolutional networks, RNNs, LSTMs, autoencoders, regularization.
Practical: Deep learning architectures
Visualization, part I. Scales and coordinates, depicting comparisons.
Visualization, part II. Common plotting patterns, including dimension reduction.
可视化 Practical: Visualization
Challenges in Data Science. Summary of the course, ethics and privacy in data science, P-hacking, look-everywhere effect, bias in the training data, interpretability, information about the hand out test.