This document describes version 0.10 of TorchAudio: building blocks for machine learning applications in the audio and speech processing domain. The objective of TorchAudio is to accelerate the development and deployment of machine learning applications for researchers and engineers by providing off-the-shelf building blocks. The building blocks are designed to be GPU-compatible, automatically differentiable, and production-ready. TorchAudio can be easily installed from Python Package Index repository and the source code is publicly available under a BSD-2-Clause License (as of September 2021) at https://github.com/pytorch/audio. In this document, we provide an overview of the design principles, functionalities, and benchmarks of TorchAudio. We also benchmark our implementation of several audio and speech operations and models. We verify through the benchmarks that our implementations of various operations and models are valid and perform similarly to other publicly available implementations.
翻译:本文件描述了TrchAudio的0.10版版本:音频和语音处理领域的机器学习应用程序的构件。TrchAudio的目标是通过提供现成的构件加速开发和部署研究人员和工程师的机器学习应用程序。这些构件的设计设计是GPU兼容、自动区分和可制作。TrchAudio可以很容易地从Python Info Invironment 数据库安装,源代码可在BSD-2-Clause许可证下公开查阅(截至2021年9月),网址是https://github.com/pytorch/audio。在本文件中,我们概述了TrchAudio的设计原则、功能和基准。我们还对若干音频操作和模型的实施进行了基准。我们通过基准核查各种操作和模型的实施是否有效,并与其他可公开使用的实施类似。