Driven by the need for larger and more diverse datasets to pre-train and fine-tune increasingly complex machine learning models, the number of datasets is rapidly growing. audb is an open-source Python library that supports versioning and documentation of audio datasets. It aims to provide a standardized and simple user-interface to publish, maintain, and access the annotations and audio files of a dataset. To efficiently store the data on a server, audb automatically resolves dependencies between versions of a dataset and only uploads newly added or altered files when a new version is published. The library supports partial loading of a dataset and local caching for fast access to already downloaded data. audb is a lightweight library and can be interfaced with any machine learning library. It supports the management of datasets on a single PC, within a university or company, or within a whole research community.
翻译:由于需要更多、更多样化的数据集来预先培训和微调日益复杂的机器学习模式,数据集的数量正在迅速增加。 audb是一个支持音频数据集版本和文件的开放源码 Python 库,旨在提供一个标准化和简单的用户界面,用于发布、维持和访问数据集的注释和音频文件。为了有效地将数据存储在服务器上, audb 自动解决数据集版本之间的依赖关系,并且只在新版本发布时上传新添加或修改的文件。该图书馆支持部分装入数据集和本地缓存,以便快速访问已下载的数据。 audb 是轻量化的图书馆,可以与任何机器学习图书馆接口。它支持对单个计算机、大学或公司或整个研究界的数据集进行管理。</s>