Submodular functions are a special class of set functions which naturally model the notion of representativeness, diversity, coverage etc. and have been shown to be computationally very efficient. A lot of past work has applied submodular optimization to find optimal subsets in various contexts. Some examples include data summarization for efficient human consumption, finding effective smaller subsets of training data to reduce the model development time (training, hyper parameter tuning), finding effective subsets of unlabeled data to reduce the labeling costs, etc. A recent work has also leveraged submodular functions to propose submodular information measures which have been found to be very useful in solving the problems of guided subset selection and guided summarization. In this work, we present Submodlib which is an open-source, easy-to-use, efficient and scalable Python library for submodular optimization with a C++ optimization engine. Submodlib finds its application in summarization, data subset selection, hyper parameter tuning, efficient training and more. Through a rich API, it offers a great deal of flexibility in the way it can be used.
翻译:子模块功能是一组特殊功能,自然地以代表性、多样性、覆盖面等概念为模型,并被证明具有很高的计算效率。过去的许多工作都应用了子模块优化,以找到各种情况下的最佳子集。一些例子包括:数据总和,以高效人类消费为目的,找到有效的小型培训数据子集,以减少模型开发时间(培训、超参数调整),找到有效的未贴标签数据子集,以降低标签成本等。最近的一项工作还利用子模块功能,以提出亚模块信息措施,这些措施被认为非常有助于解决指导子集选择和引导合成的问题。在这项工作中,我们介绍了亚模块库,这是一个开放源、易于使用、高效和可缩放的Python图书馆,用于使用C++优化引擎进行亚模块优化。亚模块集集成、数据子集选择、超参数调整、高效培训等应用。通过丰富的API,它提供了大量的灵活性。