With the progressive commoditization of modeling capabilities, data-centric AI recognizes that what happens before and after training becomes crucial for real-world deployments. Following the intuition behind Model Cards, we propose DAG Cards as a form of documentation encompassing the tenets of a data-centric point of view. We argue that Machine Learning pipelines (rather than models) are the most appropriate level of documentation for many practical use cases, and we share with the community an open implementation to generate cards from code.
翻译:随着建模能力的逐渐商品化,以数据为中心的AI认识到培训前后发生的事情对于现实世界的部署至关重要。根据模型卡背后的直觉,我们建议DAG卡作为包含以数据为中心的观点原则的文件形式。 我们认为机器学习管道(而不是模型)是许多实际使用案例最适当的文件水平,我们与社区分享从代码中生成卡的公开实施。