We argue that immature data pipelines are preventing a large portion of industry practitioners from leveraging the latest research on recommender systems. We propose our template data stack for machine learning at "reasonable scale", and show how many challenges are solved by embracing a serverless paradigm. Leveraging our experience, we detail how modern open source can provide a pipeline processing terabytes of data with limited infrastructure work.
翻译:我们争论说,不成熟的数据管道正在阻止很大一部分行业从业人员利用最新的推荐人系统研究。 我们提出模板数据堆,用于“合理规模”的机器学习,并展示通过采用无服务器模式解决了多少挑战。 我们利用我们的经验,详细介绍了现代开放源能如何提供有限基础设施工程的管道处理百万兆字节数据。