In this paper, we present a new Python library called mPyPl, which is intended to simplify complex data processing tasks using functional approach. This library defines operations on lazy data streams of named dictionaries represented as generators (so-called multi-field datastreams), and allows enriching those data streams with more 'fields' in the process of data preparation and feature extraction. Thus, most data preparation tasks can be expressed in the form of neat linear 'pipeline', similar in syntax to UNIX pipes, or |> functional composition operator in F#. We define basic operations on multi-field data streams, which resemble classical monadic operations, and show similarity of the proposed approach to monads in functional programming. We also show how the library was used in complex deep learning tasks of event detection in video, and discuss different evaluation strategies that allow for different compromises in terms of memory and performance.
翻译:在本文中,我们介绍了一个新的称为 mPyPl 的Python 图书馆,其目的是通过功能方法简化复杂的数据处理任务。这个图书馆界定了以生成者(所谓的多场数据流)为代名词词典的懒惰数据流的运作,并允许在数据编制和特征提取过程中用更多的“字段”来丰富这些数据流。因此,大多数数据编制任务可以以纯线性“管道”的形式、类似于UNIX 管道的语法或F# 功能构成操作者的形式来表达。我们定义了多场数据流的基本操作,这些操作类似于典型的单词典操作,并显示了功能性编程中拟议对月球采用的方法的相似性。我们还展示了图书馆如何用于视频中事件探测的复杂深入学习任务,并讨论了在记忆和性能方面允许不同妥协的不同评价战略。