Probabilistic models such as logistic regression, Bayesian classification, neural networks, and models for natural language processing, are increasingly more present in the undergraduate statistics and data science curriculum due to their wide range of applications. In this paper, we present a one-week undergraduate course module on variational inference, a popular optimization-based approach for approximate inference with probabilistic models. Our proposed module is guided by active learning principles: In addition to lecture materials on variational inference, we provide an accompanying class activity, an \texttt{R shiny} app, and a guided lab based on a real data application of clustering documents using Latent Dirichlet Allocation with \texttt{R} code. The main goal of our module is to expose undergraduate students to a method that facilitates statistical modeling and inference with large datasets. Using our proposed module as a foundation, instructors can adopt it and adapt to introduce more realistic use cases and applications in data science, Bayesian statistics, multivariate analysis, and statistical machine learning courses.
翻译:逻辑回归、巴伊西亚分类、神经网络和自然语言处理模型等概率模型,由于应用范围广泛,越来越多地出现在本科本科统计和数据科学课程中。在本文件中,我们提出了一个为期一周的本科本科课程单元,内容是变异推断,一种对概率模型的近似推断的普及优化方法。我们提议的单元以积极的学习原则为指导:除了关于变异推断的讲座材料外,我们还提供一个配套的班级活动、一个\textt{Rshy}应用程序,以及一个基于使用Latentt Dirichlet分配和\text{R}代码进行分组文件实际数据应用的指导实验室。我们单元的主要目标是让本科学生了解一种便利统计建模和大数据集推断的方法。利用我们提议的单元作为基础,教员可以采用该单元,并适应在数据科学、Bayesian统计、多变量分析和统计机器学习课程中引入更现实的使用案例和应用。