Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks

Chien-yu Huang,Wei-Chih Chen,Shu-wen Yang,Andy T. Liu,Chen-An Li,Yu-Xiang Lin,Wei-Cheng Tseng,Anuj Diwan,Yi-Jen Shih,Jiatong Shi,William Chen,Xuanjun Chen,Chi-Yuan Hsiao,Puyuan Peng,Shih-Heng Wang,Chun-Yi Kuan,Ke-Han Lu,Kai-Wei Chang,Chih-Kai Yang,Fabian Ritter-Gutierrez,Ming To Chuang,Kuan-Po Huang,Siddhant Arora,You-Kuan Lin,Eunjung Yeo,Kalvin Chang,Chung-Ming Chien,Kwanghee Choi,Cheng-Hsiu Hsieh,Yi-Cheng Lin,Chee-En Yu,I-Hsiang Chiu,Heitor R. Guimarães,Jionghao Han,Tzu-Quan Lin,Tzu-Yuan Lin,Homu Chang,Ting-Wu Chang,Chun Wei Chen,Shou-Jen Chen,Yu-Hua Chen,Hsi-Chun Cheng,Kunal Dhawan,Jia-Lin Fang,Shi-Xin Fang,Kuan-Yu Fang Chiang,Chi An Fu,Hsien-Fu Hsiao,Ching Yu Hsu,Shao-Syuan Huang,Lee Chen Wei,Hsi-Che Lin,Hsuan-Hao Lin,Hsuan-Ting Lin,Jian-Ren Lin,Ting-Chun Liu,Li-Chun Lu,Tsung-Min Pai,Ankita Pasad,Shih-Yun Shan Kuan,Suwon Shon,Yuxun Tang,Yun-Shao Tsai,Jui-Chiang Wei,Tzu-Chieh Wei,Chengxi Wu,Dien-Ruei Wu,Chao-Han Huck Yang,Chieh-Chi Yang,Jia Qi Yip,Shao-Xiang Yuan,Vahid Noroozi,Zhehuai Chen,Haibin Wu,Karen Livescu,David Harwath,Shinji Watanabe,Hung-yi Lee

Multimodal foundation models, such as Gemini and ChatGPT, have revolutionized human-machine interactions by seamlessly integrating various forms of data. Developing a universal spoken language model that comprehends a wide range of natural language instructions is critical for bridging communication gaps and facilitating more intuitive interactions. However, the absence of a comprehensive evaluation benchmark poses a significant challenge. We present Dynamic-SUPERB Phase-2, an open and evolving benchmark for the comprehensive evaluation of instruction-based universal speech models. Building upon the first generation, this second version incorporates 125 new tasks contributed collaboratively by the global research community, expanding the benchmark to a total of 180 tasks, making it the largest benchmark for speech and audio evaluation. While the first generation of Dynamic-SUPERB was limited to classification tasks, Dynamic-SUPERB Phase-2 broadens its evaluation capabilities by introducing a wide array of novel and diverse tasks, including regression and sequence generation, across speech, music, and environmental audio. Evaluation results indicate that none of the models performed well universally. SALMONN-13B excelled in English ASR, while WavLLM demonstrated high accuracy in emotion recognition, but current models still require further innovations to handle a broader range of tasks. We will soon open-source all task data and the evaluation pipeline.

翻译：暂无翻译

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

31+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日