Driven by the need for larger and more diverse datasets to pre-train and fine-tune increasingly complex machine learning models, the number of datasets is rapidly growing. audb is an open-source Python library that supports versioning and documentation of audio datasets. It aims to provide a standardized and simple user-interface to publish, maintain, and access the annotations and audio files of a dataset. To efficiently store the data on a server, audb automatically resolves dependencies between versions of a dataset and only uploads newly added or altered files when a new version is published. The library supports partial loading of a dataset and local caching for fast access. audb is a lightweight library and can be interfaced from any machine learning library. It supports the management of datasets on a single PC, within a university or company, or within a whole research community. audb is available at https://github.com/audeering/audb.
翻译:由于需要更多、更多样化的数据集来预先培训和微调日益复杂的机器学习模式,数据集的数量正在迅速增加。 audb是一个支持音频数据集版本和文件的开放源码 Python 库,旨在提供一个标准化和简单的用户界面,用于发布、维持和访问数据集的注释和音频文件。为了有效地将数据存储在服务器上, audb 自动解决数据集各版本之间的依赖关系,并且只在新版本发布时上传新添加或修改的文件。该图书馆支持部分装入数据集和本地缓存,以便快速访问。 audb 是轻量级的图书馆,可从任何机器学习库中接口。它支持单个PC、大学或公司内部或整个研究界的数据集管理。 audb可在 https://github.com/audering/audb查阅。</s>