Recently, the use of machine learning in meteorology has increased greatly. While many machine learning methods are not new, university classes on machine learning are largely unavailable to meteorology students and are not required to become a meteorologist. The lack of formal instruction has contributed to perception that machine learning methods are 'black boxes' and thus end-users are hesitant to apply the machine learning methods in their every day workflow. To reduce the opaqueness of machine learning methods and lower hesitancy towards machine learning in meteorology, this paper provides a survey of some of the most common machine learning methods. A familiar meteorological example is used to contextualize the machine learning methods while also discussing machine learning topics using plain language. The following machine learning methods are demonstrated: linear regression; logistic regression; decision trees; random forest; gradient boosted decision trees; naive Bayes; and support vector machines. Beyond discussing the different methods, the paper also contains discussions on the general machine learning process as well as best practices to enable readers to apply machine learning to their own datasets. Furthermore, all code (in the form of Jupyter notebooks and Google Colaboratory notebooks) used to make the examples in the paper is provided in an effort to catalyse the use of machine learning in meteorology.
翻译:最近,在气象学中使用机器学习方法的情况大大增加了。虽然许多机器学习方法不是新的,但气象学学生基本上无法获得大学的机器学习课程,也不要求他们成为气象学家。缺乏正式教学有助于人们认识到机器学习方法是“黑盒子”,因此最终用户不愿意在日常工作流程中应用机器学习方法。为减少机器学习方法的不透明性和降低对气象学中机器学习的偏执,本文件对一些最常用的机器学习方法进行了调查。一个熟悉的气象实例被用来将机器学习方法背景化,同时用普通语言讨论机器学习主题。以下机器学习方法得到展示:线性回归;逻辑回归;决策树;随机森林;梯度增强决策树;天真的海湾;支持传动机。除了讨论不同的方法外,本文还载有关于一般机器学习过程的讨论,以及使读者能够将机器学习应用到自己的数据集的最佳做法。此外,所有用于在机器学习中使用的代码(如Jupyter笔记本和谷歌科拉博里笔记本)都是用于在气象学中学习机器的例子的催化努力中提供的。