音素论文 - 专知

会员服务 ·

Building Robust and Scalable Multilingual ASR for Indian Languages

Arxiv

0+阅读 · 11月19日

Enhancing Quranic Learning: A Multimodal Deep Learning Approach for Arabic Phoneme Recognition

Arxiv

0+阅读 · 11月21日

VSpeechLM: A Visual Speech Language Model for Visual Text-to-Speech Task

Arxiv

0+阅读 · 11月27日

Why Isn't Relational Learning Taking Over the World?

Arxiv

0+阅读 · 11月5日

MEGConformer: Conformer-Based MEG Decoder for Robust Speech and Phoneme Classification

Arxiv

0+阅读 · 12月1日

Do Language Models Associate Sound with Meaning? A Multimodal Study of Sound Symbolism

Arxiv

0+阅读 · 11月16日

Do Language Models Associate Sound with Meaning? A Multimodal Study of Sound Symbolism

Arxiv

0+阅读 · 12月9日

Do Language Models Associate Sound with Meaning? A Multimodal Study of Sound Symbolism

Arxiv

0+阅读 · 11月13日

Seeing isn't Hearing: Benchmarking Vision Language Models at Interpreting Spectrograms

Arxiv

0+阅读 · 11月17日

Why Isn't Relational Learning Taking Over the World?

Arxiv

0+阅读 · 10月30日

M-CIF: Multi-Scale Alignment For CIF-Based Non-Autoregressive ASR

Arxiv

0+阅读 · 10月25日

Are These Even Words? Quantifying the Gibberishness of Generative Speech Models

Arxiv

0+阅读 · 10月24日

PASE: Phoneme-Aware Speech Encoder to Improve Lip Sync Accuracy for Talking Head Synthesis

Arxiv

0+阅读 · 10月15日

I Have No Mouth, and I Must Rhyme: Uncovering Internal Phonetic Representations in LLaMA 3.2

I Have No Mouth, and I Must Rhyme: Uncovering Internal Phonetic Representations in LLaMA 3.2

Arxiv

0+阅读 · 10月15日

FAC-FACodec: Controllable Zero-Shot Foreign Accent Conversion with Factorized Speech Codec

Arxiv

0+阅读 · 10月12日

参考链接

微信扫码咨询专知VIP会员