Data scientists and statisticians are often at odds when determining the best approach, machine learning or statistical modeling, to solve an analytics challenge. However, machine learning and statistical modeling are more cousins than adversaries on different sides of an analysis battleground. Choosing between the two approaches or in some cases using both is based on the problem to be solved and outcomes required as well as the data available for use and circumstances of the analysis. Machine learning and statistical modeling are complementary, based on similar mathematical principles, but simply using different tools in an overall analytics knowledge base. Determining the predominant approach should be based on the problem to be solved as well as empirical evidence, such as size and completeness of the data, number of variables, assumptions or lack thereof, and expected outcomes such as predictions or causality. Good analysts and data scientists should be well versed in both techniques and their proper application, thereby using the right tool for the right project to achieve the desired results.
翻译:数据科学家和统计学家在确定最佳方法、机器学习或统计建模以解决分析挑战时往往不协调,然而,机器学习和统计建模在分析战场的不同侧面是表亲多于对手的。两种方法之间或在某些情况下使用这两种方法的选择是基于有待解决的问题和需要的结果以及可供分析使用的数据和具体情况。机械学习和统计建模是相辅相成的,基于类似的数学原则,但只是在整个分析知识库中使用不同的工具。确定主要方法应当基于有待解决的问题以及经验证据,如数据的规模和完整性、变量、假设或缺乏数据的数量以及预测或因果关系等预期成果。良好的分析家和数据科学家应当精通技术和适当应用,从而利用正确的工具实现预期结果。