Interspeech2018
Interspeech 是国际语音通信协会(ISCA)组织的语音领域顶级学术会议,是全球最大的综合性语音信息处理领域的科技盛会。Interspeech会议鼓励语音领域的交叉研究,特别是快速发展的人工智能和机器学习技术在语音领域的研究和应用。Interspeech 2018将于今年9月初在印度海得拉巴召开。
Paper ID | Title | Authors |
27 | Binaural Speech Intelligibility Estimation Using Deep Neural Networks | Kazuhiro Kondo, Kazuya Taira and Yosuke Kobayashi |
34 | Real-Time Scoring of an Oral Reading Assessment on Mobile Devices | Jian Cheng |
38 | Conditional End-to-End Audio Transformations | Albert Haque, Michelle Guo and Prateek Verma |
40 | Speech recognition for medical conversations | Chung-Cheng Chiu, Anshuman Tripathi, Katherine Chou, Chris Co, Navdeep Jaitly, Diana Jaunzeikare, Anjuli Kannan, Patrick Nguyen, Hasim Sak, Ananth Sankar, Justin Tansuwan, Nathan Wan, Yonghui Wu and Xuedong Zhang |
41 | Improved Supervised Locality Preserving Projection for I-vector Based Speaker Verification | Lanhua You, Wu Guo, Yan Song and Sheng Zhang |
42 | Attentive Sequence-to-Sequence Learning for Diacritic Restoration of Yorùbá Language Text | Iroro Orife |
43 | Frequency domain variants of velvet noise and their application to speech processing and synthesis | Hideki Kawahara, Ken-Ichi Sakakibara, Masanori Morise, Hideki Banno, Tomoki Toda and Toshio Irino |
45 | A novel normalization method for autocorrelation function for pitch detection and for speech activity detection | Qiguang Lin and Yiwen Shao |
46 | Dithered Quantization for Frequency-Domain Speech and Audio Coding | Tom Bäckström, Johannes Fischer and sneha das |
47 | Categorical vs Dimensional Perception of Italian Emotional Speech | Emilia Parada-Cabaleiro, Giovanni Costantini, Anton Batliner, Alice Baird and Björn Schuller |
48 | Cross-language perception of Mandarin lexical tones by Mongolian-speaking bilinguals in the Inner Mongolia Autonomous Region, China | Kimiko Tsukada and Yu rong |
51 | The INTERSPEECH 2018 Computational Paralinguistics Challenge: Atypical & Self-Assessed Affect, Crying & Heart Beats | Björn Schuller, Stefan Steidl, Anton Batliner, Peter Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian Pokorny, Eva-Maria Rathner, Karin Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou |
52 | Acoustic and Textual Data Augmentation for Improved ASR of Code-Switching Speech | Emre Yilmaz, Henk van den Heuvel and David van Leeuwen |
57 | Investigating the Effect of Audio Duration on Dementia Detection using Acoustic Features | Jochen Weiner, Miguel Angrick, Srinivasan Umesh and Tanja Schultz |
60 | The Trajectory of Voice Onset Time with Vocal Aging | Chen Xuanda, Xiong Ziyu and Hu Jian |
61 | Voice Comparison and Rhythm: Behavioral Differences between Target and Non-target Comparisons | Moez Ajili, Jean-Francois Bonastre and Solange Rossato |
62 | Entity-Aware Language Model as an Unsupervised Reranker | Mohammad Sadegh Rasooli and Sarangarajan Parthasarathy |
63 | Effects of User Controlled Speech Rate on Intelligibility in Noisy Environments | John Novak and Robert Kenyon |
65 | The ‘West Yorkshire Regional English Database’: Investigations into the generalizability of reference populations for forensic speaker comparison casework | Erica Gold, Sula Ross and Kate Earnshaw |
67 | Articulatory Features for ASR of Pathological Speech | Emre Yilmaz, Vikramjit Mitra, Chris Bartels and Horacio Franco |
68 | Vowel space as a tool to evaluate articulation problems | Rob van Son, Catherine Middag and Kris Demuynck |
69 | Performance Analysis of the 2017 NIST Language Recognition Evaluation | Seyed Omid Sadjadi, Timothee Kheyrkhah, Craig Greenberg, Douglas Reynolds, Elliot Singer, Lisa Mason and Jaime Hernandez-Cordero |
70 | Gated Convolutional Neural Network for Sentence Matching | Peixin Chen, Wu Guo, Zhi Chen, Jian Sun and Lanhua You |
73 | COSMO SylPhon: a model to assess phonological learning | Jean-Luc Schwartz |
78 | Active Memory Networks for Language Modeling | Oscar Chen, Anton Ragni, Mark Gales and Xie Chen |
79 | Lattice-free State-level Minimum Bayes Risk Training of Acoustic Models | Naoyuki Kanda, Yusuke Fujita and Kenji Nagamatsu |
83 | Deep Speech Denoising with Vector Space Projections | Jeffrey Hetherly, Paul Gamble, Maria Alejandra Barrios, Cory Stephenson and Karl Ni |
84 | What to Expect from Expected Kneser-Ney Smoothing | Michael Levit, Sarangarajan Parthasarathy and Shuangyu Chang |
91 | Emotional Prosody Perception in Mandarin-speaking Congenital Amusics | Yixin Zhang, Tianzhu Geng and Jinsong Zhang |
92 | Analysis of Length Normalization in End-to-End Speaker Verification System | Weicheng Cai, Jinkun Chen and Ming Li |
97 | Overview of the 2018 Spoken CALL Shared Task | Claudia Baur, Andrew Caines, Cathy Chua, Johanna Gerlach, Mengjie Qian, Manny Rayner, Martin Russell, Helmer Strik and Xizi Wei |
990 | Comparing the Max and Noisy-Or Pooling Functions in Multiple Instance Learning for Weakly Supervised Sequence Learning Tasks | Yun Wang, Juncheng Li and Florian Metze |
991 | Prediction of Aesthetic Elements in Karnatic Music: A Machine Learning Approach | Ragesh Rajan M, Ashwin Vijayakumar and Deepu Vijayasenan |
993 | Attentive Statistics Pooling for Deep Speaker Embedding | Koji Okabe, Takafumi Koshinaka and Koichi Shinoda |
995 | UltraFit: A speaker-friendly headset for ultrasound recordings in speech sciences | Lorenzo Spreafico, Michael Pucher and Anna Matosova |
996 | Bags in Bag: Generating Context-Aware Bags for Tracking Emotions from Speech | Jing Han, Zixing Zhang, Maximilian Schmitt, Zhao Ren, Fabien Ringeval and Björn Schuller |
999 | Articulatory-to-speech conversion using bi-directional long short-term memory | Fumiaki Taguchi and Tokihiko Kaburagi |
1000 | The CSU-K Rule-Based System for the 2nd Edition Spoken CALL Shared Task | Kay Berkling, Cem Philipp Freimoser, Mario Kunstek and Jülg Dominik |
1007 | Follow-up Question Generation using Pattern-based Seq2seq with a Small Corpus for Interview Coaching | Ming-Hsiang Su, Chung-Hsien Wu, Kun-Yi Huang, Qian-Bei Hong and Huai-Hung Huang |
1010 | Learning Acoustic Word Embeddings with Temporal Context for Query-by-Example Speech Search | Yougen Yuan, Cheung-Chi Leung, lei xie, Hongjie Chen, Bin Ma and Haizhou Li |
1013 | Capsule Networks for Low Resource Spoken Language Understanding | Vincent Renkens and Hugo Van hamme |
1015 | Learning Discriminative Features for Speaker Identification and Verification | Sarthak Yadav and Atul Rai |
1016 | LSTM based Attentive Fusion of Spectral and Prosodic Information for Keyword Spotting in Hindi Language | Laxmi Pandey and Karan Nathwani |
1018 | Detection of glottal closure instants in degraded speech using single frequency filtering analysis | Gunnam Aneeja, Sudarsana Reddy Kadiri and Bayya Yegnanarayana |
1019 | Annotator Trustability-based Cooperative Learning Solutions for Intelligent Audio Analysis | Simone Hantke, Christoph Stemp and Björn Schuller |
1020 | Deep Noise Tracking Network: A Hybrid Signal Processing/Deep Learning Approach to Speech Enhancement | Shuai Nie, Shan Liang, Bin Liu, Yaping Zhang, Wenju Liu and Jianhua Tao |
1021 | Unsupervised and Efficient Vocabulary Expansion for Recurrent Neural Network Language Models in ASR | Yerbolat Khassanov and Eng Siong Chng |
1023 | MTGAN: Speaker Verification through Multitasking Triplet Generative Adversarial Networks | Wenhao Ding and Liang HE |
1024 | Effective acoustic cue learning is not just statistical, it is discriminative | Jessie S. Nixon |
1025 | Compression of End-to-End Models | Ruoming Pang, Tara Sainath, Rohit Prabhavalkar, Suyog Gupta, Yonghui Wu, Shuyuan Zhang and Chung-Cheng Chiu |
1026 | Postfiltering with Complex Spectral Correlations for Speech and Audio Coding | sneha das and Tom Bäckström |
1027 | Postfiltering Using Log-Magnitude Spectrum for Speech and Audio Coding | sneha das and Tom Bäckström |
1030 | Improving Attention Based Sequence-to-Sequence Models for End-to-End English Conversational Speech Recognition | Chao Weng, Jia Cui, Guangsen Wang, Jun Wang, Chengzhu Yu, Dan Su and Dong Yu |
1032 | Discriminating between nasals and approximants in English language using zero time windowing | RaviShankar Prasad, Sudarsana Reddy Kadiri, Suryakanth V Gangashetty and Bayya Yegnanarayana |
1034 | Scalable Factorized Hierarchical Variational Autoencoder Training | Wei-Ning Hsu and James Glass |
1035 | Contextual Slot Carryover for Disparate Schemas | Chetan Naik, Arpit Gupta, Hancheng Ge, Mathias Lambert and Ruhi Sarikaya |
1037 | Stream Attention for Distributed Multi-Microphone Speech Recognition | Xiaofei Wang, Ruizhi Li and Hynek Hermansky |
1038 | Articulatory consequences of vocal effort elicitation method | Elisabet Eir Cortes, Marcin Wlodarczak and Juraj Šimko |
1039 | Cross-Lingual Multi-Task Neural Architecture for Spoken Language Understanding | Yujiang Li, Xuemin Zhao, Weiqun Xu and Yonghong Yan |
1042 | Spoofing Detection Using Adaptive Weighting Framework and Clustering Analysis | Yuanjun Zhao, Roberto Togneri and Victor Sreeram |
1043 | Designing a Pneumatic Bionic Voice Prosthesis - Statistical Approach for Source Excitation Generation | Farzaneh Ahmadi and Tomoki Toda |
1044 | Training Utterance-level Embedding Networks for Speaker Identification and Verification | Heewoong Park, Sukhyun Cho, Kyubyong Park, Namju Kim and Jonghun Park |
1046 | Bone-Conduction Sensor Assisted Noise Estimation for Improved Speech Enhancement | Ching-Hua Lee, Bhaskar D. Rao and Harinath Garudadri |
1047 | Comparison of Syllabification Algorithms and Training Strategies for Robust Word Count Estimation across Different Languages and Recording Conditions | Okko Räsänen, Seshadri Shreyas and Marisa Casillas |
1049 | Acoustic Modeling with DFSMN-CTC and Joint CTC-CE Learning | ShiLiang Zhang and Ming Lei |
1054 | Towards a better characterization of Parkinsonian speech: a multidimensional acoustic study | Veronique Delvaux, kathy Huet, Myriam Piccaluga, Sophie Van Malderen and Bernard Harmegnies |
1055 | Low-Latency Neural Speech Translation | Jan Niehues, Ngoc-Quan Pham, Thanh-Le Ha, Matthias Sperber and Alex Waibel |
1057 | Structured Word Embedding for Low Memory Neural Network Language Model | Kaiyu Shi and Kai Yu |
1058 | An End-to-End Text-Independent Speaker Identification System on Short Utterances | Ruifang Ji, Xinyuan Cai and Xu Bo |
1059 | Dysarthric speech classification using glottal features computed from non-words, words and sentences | Narendra N P and Paavo Alku |
1060 | Length contrast and covarying features: Whistled speech as a case study | Rachid Ridouane, Giuseppina Turco and Julien Meyer |
1062 | On the Usefulness of the Speech Phase Spectrum for Pitch Extraction | Erfan Loweimi, Jon Barker and Thomas Hain |
1063 | Semi-supervised Cross-domain Visual Feature Learning for Audio-Visual Broadcast Speech Transcription | Rongfeng Su, Xunying Liu and Lan Wang |
1065 | Regional variation of /r/ in Swiss German dialects | Adrian Leemann, Stephan Schmid, Dieter Studer-Joho and Marie-José Kolly |
1070 | i-Vectors in Language Modeling: An Efficient Way of Domain Adaptation for Feed-Forward Models | Karel Beneš, Santosh Kesiraju and Lukáš Burget |
1074 | Structural effects on properties of consonantal gestures in Tashlhiyt | Anne Hermes, Doris Mücke, Bastian Auris and Rachid Ridouane |
1076 | General Utterance-Level Feature Extraction for Classifying Crying Sounds, Atypical & Self-Assessed Affect and Heart Beats | Gábor Gosztolya, Tamás Grósz and László Tóth |
1078 | Multi-Task Learning of Speech Recognition and Speech Synthesis Parameters for Ultrasound-based Silent Speech Interfaces | László Tóth, Gábor Gosztolya, Tamás Grósz, Alexandra Markó and Tamás Gábor Csapó |
1079 | Identifying Schizophrenia Based on Temporal Parameters in Spontaneous Speech | Gábor Gosztolya, Anita Bagi, Szilvia Szalóki, István Szendi and Ildikó Hoffmann |
1080 | Implementation of Respiration in Articulatory Synthesis Using a Pressure-Volume Lung Model | Keisuke Tanihara, Shogo Yonekura and Yasuo Kuniyoshi |
1081 | Exploiting Speaker and Phonetic Diversity of Mismatched Language Resources for Unsupervised Subword Modeling | Siyuan Feng and Tan Lee |
1085 | Automatic Speech Recognition System Development in the "Wild" | Anton Ragni and Mark Gales |
1086 | Extending Recurrent Neural Aligner for Streaming End-to-End Speech Recognition in Mandarin | Linhao Dong, Shiyu Zhou, Wei Chen and Bo Xu |
1087 | A deep learning approach to assessing non-native pronunciation of English using phone distances | Konstantinos Kyriakopoulos, Kate Knill and Mark Gales |
1088 | The Conversation Continues: The Effect of Lyrics and Music Complexity of Background Music on Spoken-Word Recognition | Odette Scharenborg and Martha Larson |
1089 | Acoustic Modeling with Densely Connected Residual Network for Multichannel Speech Recognition | Jian Tang, Yan Song, Lirong Dai and Ian McLoughlin |
1093 | The Perception and Analysis of the Likeability and Human Likeness of Synthesized Speech | Alice Baird, Emilia Parada-Cabaleiro, Simone Hantke, Felix Burkhardt, Nicholas Cummins and Björn Schuller |
1096 | Punctuation Prediction Model for Conversational Speech | Piotr Żelasko, Piotr Szymański, Jan Mizgajski, Adrian Szymczak, Yishay Carmiel and Najim Dehak |
1097 | Unsupervised Adaptation with Interpretable Disentangled Representations for Distant Conversational Speech Recognition | Wei-Ning Hsu, Hao Tang and James Glass |
1098 | Detecting Packet-Loss Concealment Using Formant Features and Decision Tree Learning | Gabriel Mittag and Sebastian Möller |
1099 | The Role of Cognate Words, POS Tags, and Entrainment in Code-Switching | Victor Soto, Nishi Cestero and Julia Hirschberg |
1100 | Play Duration based User-Entity Affinity Modeling in Spoken Dialog System | Bo Xiao, Nicholas Monath, Shankar Ananthakrishnan and Abishek Ravi |
1102 | Analysis of Complementary Information Sources in the Speaker Embeddings Framework | Mahesh Kumar Nandwana, Mitchell McLaren, Diego Castan, Julien van Hout and Aaron Lawson |
1103 | Double Joint Bayesian Modeling of DNN Local I-Vector for Text Dependent Speaker Verification with Random Digit Strings | Ziqiang Shi, Huibin Lin, Liu Liu and Rujie Liu |
1105 | Estimation of the Vocal Tract Length of Vowel Sounds based on the Frequency of the Significant Spectral Valley | TV Ananthapadmanabha and Ramakrishnan AngaraiGanesan |
1107 | Syllable-Based Sequence-to-Sequence Speech Recognition with the Transformer in Mandarin Chinese | Shiyu Zhou, Dong Linhao, Shuang Xu and Bo Xu |
1108 | Tongue Segmentation with Geometrically Constrained Snake Model | Zhihua Su, Jianguo Wei, Qiang Fang, Jianrong Wang and Kiyoshi Honda |
1110 | L2-ARCTIC: a non-native English speech corpus | Guanlong Zhao, Sinem Sonsaat, Alif Silpachai, Ivana Lucic, Evgeny Chukharev-Hudilainen, John Levis and Ricardo Gutierrez-Osuna |
1111 | Improving Language Modeling with an Adversarial Critic for Automatic Speech Recognition | Yike Zhang, pengyuan zhang and Yonghong Yan |
1113 | Expressive Speech Synthesis via Modeling Expressions with Variational Autoencoder | Kei Akuzawa, Yusuke Iwasawa and Yutaka Matsuo |
1114 | A Deep Neural Network Based Harmonic Noise Model for Speech Enhancement | Zhiheng Ouyang, Hongjiang Yu, Wei-Ping Zhu and Benoit Champagne |
1115 | A comparison of input types to a deep neural network-based forced aligner | Matthew C. Kelley and Benjamin V. Tucker |
1120 | Multiple Instance Deep Learning for Weakly Supervised Small-Footprint Audio Event Detection | Shao-Yen Tseng, Juncheng Li, Yun Wang, Florian Metze, Joseph Szurley and Samarjit Das |
1121 | Voice Conversion with Conditional SampleRNN | Cong Zhou, Michael Horgan, Vivek Kumar, Cristina Vasco and Dan Darcy |
1122 | Contextual Language Model Adaptation for Conversational Agents | Anirudh Raju, Behnam Hedayatnia, Linda Liu, Ankur Gandhe, Chandra Khatri, Angeliki Metallinou, Anu Venkatesh and Ariya Rastrow |
1124 | Improved ASR for under-resourced languages through Multi-task Learning with Acoustic Landmarks | Di He, Boon Pang Lim, Xuesong Yang, Mark Hasegawa-Johnson and Deming Chen |
1125 | Self-similarity matrix based intelligibility assessment of cleft lip and palate speech | Sishir Kalita, S R Mahadeva Prasanna and Samarendra Dandapat |
1126 | Formant measures of vowels adjacent to alveolar and retroflex consonants in Arrernte: stressed and unstressed position | Marija Tabain, Richard Beare and Andrew Butcher |
1128 | Linear Prediction Residual based Short-term Cepstral Features for Replay Attacks Detection | Madhusudan Singh and Debadatta Pati |
1130 | Dialect-geographical Acoustic-Tonetics: five disyllabic tone sandhi patterns in cognate words from the Wu dialects of Zhèjiāng province | Phil Rose |
1131 | A Voice Conversion Framework with Tandem Feature Sparse Representation and Speaker-Adapted WaveNet Vocoder | Berrak Sisman, Mingyang Zhang and Haizhou Li |
1132 | EMOTION RECOGNITION FROM HUMAN SPEECH USING TEMPORAL INFORMATION AND DEEP LEARNING | John Kim and Rif A. Saurous |
1134 | Bidirectional Long-Short Term Memory Network-based Estimation of Reliable Spectral Component Locations | Aaron Nicolson and Kuldip K. Paliwal |
1135 | Joint Noise and Reverberation Adaptive Learning for Robust Speaker DOA Estimation with An Acoustic Vector Sensor | Disong Wang and Yuexian Zou |
1138 | Multi-modal attention mechanisms in LSTM and its application to acoustic scene classification | Zhang Teng, Kailai Zhang and Ji Wu |
1139 | Rapid Collection of Spontaneous Speech Corpora using Telephonic Community Forums | Agha Ali Raza, Awais Athar, Shan Randhawa, Zain Tariq, Muhammad Bilal Saleem, Haris Bin Zia, Umar Saif and Roni Rosenfeld |
1140 | Monoaural Audio Source Separation using Variational Autoencoders | Laxmi Pandey, Anurendra Kumar and Vinay Namboodiri |
1143 | Deep learning techniques for koala activity detection | Ivan Himawan, Michael Towsey, Bradley Law and Paul Roe |
1147 | Glottal Closure Instant Detection from Speech Signal Using Voting Classifier and Recursive Feature Elimination | Jindrich Matousek and Daniel Tihelka |
1149 | User Information Augmented Semantic Frame Parsing using Progressive Neural Networks | Yilin Shen, Xiangyu Zeng, Yu Wang and Hongxia Jin |
1150 | A Shifted Delta Coefficient Objective for Monaural Speech Separation using Multi-task Learning | Chenglin Xu, Wei Rao, Eng Siong Chng and Haizhou Li |
1151 | Joint Learning using Denoising Variational Autoencoders for Voice Activity Detection | Youngmoon Jung, Younggwan Kim, Yeunju Choi and Hoirin Kim |
1152 | Temporal transformer networks for acoustic scene classification | Zhang Teng, Kailai Zhang and Ji Wu |
1153 | State Gradients for RNN Memory Analysis | Lyan Verwimp, Hugo Van hamme, Vincent Renkens and Patrick Wambacq |
1154 | Waveform-Based Speaker Representations for Speech Synthesis | Moquan Wan, Gilles Degottex and Mark Gales |
1156 | Leveraging Second-Order Log-Linear model for improved deep learning based ASR performance | Ankit Raj, Shakti Rath and Jithendra Vepa |
1158 | Self-Attentive Speaker Embeddings for Text-Independent Speaker Verification | Yingke Zhu, Tom Ko, David Snyder, Brian Mak and Dan Povey |
1159 | Word Emphasis Prediction for Expressive Text to Speech | Yosi Mass, Slava Shechtman, Moran Mordechay, Ron Hoory, Oren Sar Shalom, Guy Lev and David Konopnicki |
1160 | Forward-Backward Attention Decoder | Masato Mimura, Shinsuke Sakai and Tatsuya Kawahara |
1162 | Active Learning for LF-MMI Trained Neural Networks in ASR | Yanhua Long, Hong Ye, Yijie Li and Jiaen Liang |
1165 | Using Deep Neural Networks for Identification of Slavic Languages from Acoustic Signal | Lukas Mateju, Petr Cerva, Jindrich Zdansky and Radek Safarik |
1171 | Homophone Identification and Merging for Code-switched Speech Recognition | Brij Mohan Lal Srivastava and Sunayana Sitaram |
1173 | Improved Epoch Extraction from Telephonic Speech using Chebfun and Zero Frequency Filtering | Ganga Gowri B, Soman K.P and Govind D |
1174 | Using pupillometry to measure the cognitive load of synthetic speech | Avashna Govender and Simon King |
1176 | Resyllabification in Indian Languages and its Implications in Text-to-speech Systems | Mahesh M, Jeena JPrakash and Hema Murthy |
1178 | Code-switching in Indic Speech Synthesisers | Anju Leela Thomas, Anusha Prakash, Arun Baby and Hema Murthy |
1182 | Improving Cross-Lingual Knowledge Transferability Using Multilingual TDNN-BLSTM with Language-Dependent Pre-Final Layer | Siyuan Feng and Tan Lee |
1185 | GlobalTIMIT: Acoustic-Phonetic Datasets for the World’s Languages | Nattanun Chanchaochai, Christopher Cieri, Japhet Debrah, Hongwei Ding, Yue Jiang, Sishi Liao, Mark Liberman, Jonathan Wright, Jiahong Yuan, Juhong Zhan and Yuqing Zhan |
1188 | Transcription correction for Indian languages using acoustic signatures | Jeena JPrakash, Golda Brunet Rajan and Hema Murthy |
1190 | WaveNet Vocoder with Limited Training Data for Voice Conversion | Li-Juan Liu, Zhen-Hua Ling, Yuan Jiang, Ming Zhou and Li-Rong Dai |
1198 | Learning and Modeling Unit Embeddings for Improving HMM-based Unit Selection Speech Synthesis | Xiao Zhou, Zhen-Hua Ling, Zhi-Ping Zhou and Li-Rong Dai |
1199 | Measuring the cognitive load of synthetic speech using a dual task paradigm | Avashna Govender and Simon King |
1202 | Phoneme-to-Articulatory mapping using bidirectional gated RNN | Théo Biasutto--Lervat and Slim Ouni |
1203 | Information Bottleneck based Percussion Instrument Diarization System for Taniavartanam Segments of Carnatic Music Concerts | Nauman Dawalatabad, Jom Kuriakose, Chandra Sekhar Chellu and Hema Murthy |
1204 | Compact Feedforward Sequential Memory Networks for Small-footprint Keyword Spotting | Mengzhe Chen, ShiLiang Zhang, Ming Lei, Yong Liu, Haitao Yao and Jie Gao |
1205 | Deep Extractor Network for Target Speaker Recovery From Single Channel Speech Mixtures | Jun Wang, Jie Chen, Dan Su, Lianwu Chen, Meng Yu, Yanmin Qian and Dong Yu |
1209 | Triplet loss based cosine similarity metric learning for text-independent speaker recognition | Sergey Novoselov, Vadim Shchemelinin, Andrey Shulipa, Alexandr Kozlov and Ivan Kremnev |
1210 | Collapsed speech segment detection and suppression for WaveNet vocoder | YICHIAO WU, Kazuhiro Kobayashi, Tomoki Hayashi, Patrick Lumban Tobing and Tomoki Toda |
1211 | Data augmentation improves recognition of foreign accented speech | Takashi Fukuda, Raul Fernandez, Andrew Rosenberg, Samuel Thomas, Bhuvana Ramabhadran, Alexander Sorin and Gakuto Kurata |
1212 | Segmental Encoder-Decoder Models for Large Vocabulary Automatic Speech Recognition | Eugen Beck, Mirko Hannemann, Patrick Dötsch, Ralf Schlüter and Hermann Ney |
1214 | Exploration of Local Speaking Rate Variations in Mandarin Read Speech | Guan-Ting Liou, Chen-Yu CHIANG, Yih-Ru Wang and Sin-Horng Chen |
1222 | An Active Feature Transformation Method For Attitude Recognition of Video Bloggers | Fasih Haider, Fahim A. Salim, Owen Conlan and Saturnino Luz |
1223 | A New Framework for Supervised Speech Enhancement in the Time Domain | Ashutosh Pandey and Deliang Wang |
1224 | Singing voice phoneme segmentation by hierarchically inferring syllable and phoneme onset positions | Rong Gong and Xavier Serra |
1225 | Vowels and Diphthongs in Hangzhou Wu Chinese Dialect | Yang Yue and Fang Hu |
1226 | Speaker Embedding Extraction with Phonetic Information | Yi Liu, Liang He, Jia Liu and Michael T. Johnson |
1227 | Investigating accuracy of pitch-accent annotations in neural network-based speech synthesis and denoising effects | Hieu-Thi Luong, Xin Wang, Junichi Yamagishi and Nobuyuki Nishizawa |
1230 | Time-regularized linear prediction for noise-robust extraction of the spectral envelope of speech | Manu Airaksinen, Lauri Juvela, Okko Räsänen and Paavo Alku |
1232 | S4D: Speaker Diarization Toolkit in Python | Pierre-Alexandre Broux, Florent Desnous, Anthony Larcher, Simon Petitrenaud, Jean Carrive and Sylvain Meignier |
1233 | Age-related effects on sensorimotor control of speech production | Anne Hermes, Jane Mertens and Doris Mücke |
1234 | Single-channel Speech Dereverberation via Generative Adversarial Training | Chenxing Li, Tieqiang Wang, Shuang Xu and Bo Xu |
1237 | Biophysically-inspired features improve the generalizability of neural network-based speech enhancement systems | Deepak Baby and Sarah Verhulst |
1238 | Deep Learning in Paralinguistic Recognition Tasks: Are Hand-crafted Features Still Relevant? | Johannes Wagner, Dominik Schiller, Andreas Seiderer and Elisabeth André |
1239 | Naturalness Improvement Algorithm for Reconstructed Glossectomy Patient's Speech Using Spectral Differential Modification in Voice Conversion | Hiroki Murakami, Sunao Hara, Masanobu Abe, Masaaki Sato and Shogo Minagi |
1240 | On Learning to Identify Genders from Raw Speech Signal using CNNs | Selen Hande Kabil, Hannah Muckenhirn and Mathew Magimai Doss |
1241 | Neural Language Codes for Multilingual Acoustic Models | Markus Müller, Sebastian Stüker and Alex Waibel |
1242 | An Attention Pooling based Representation Learning Method for Speech Emotion Recognition | Pengcheng Li, Yan Song, Ian McLoughlin, Wu Guo and Lirong Dai |
1243 | Unsupervised Temporal Feature Learning Based on Sparse Coding Embedded BoAW for Acoustic Event Recognition | Liwen Zhang |
1244 | Learning to adapt: a meta-learning approach for speaker adaptation | Ondrej Klejch, Joachim Fainberg, Peter Bell and Steve Renals |
1245 | Weighting Pitch Contour and Loudness Contour in Mandarin Tone Perception in Cochlear Implant Listeners | Qinglin Meng, Nengheng Zheng, Ambika Prasad Mishra, Jacinta Dan Luo and Jan W. H. Schnupp |
1246 | Co-whitening of i-vectors for short and long duration speaker verification | Longting Xu, Kong Aik Lee, Haizhou Li and Zhen Yang |
1247 | Training Augmentation using Adversarial Examples for Robust Speech Recognition | Sining Sun, Ching-Feng Yeh, Mari Ostendorf, Mei-Yuh Hwang and Lei Xie |
1248 | Multiple Concurrent Sound Source Tracking Based on Observation-Guided Adaptive Particle Filter | Hong Liu, haipeng lan, Bing Yang and Cheng Pang |
1250 | Data independent sequence augmentation method for acoustic scene classification | Zhang Teng, Kailai Zhang and Ji Wu |
1251 | Pitch-Adaptive Front-end Feature for Hypernasality Detection | Akhilesh Dubey, S R Mahadeva Prasanna and Samarendra Dandapat |
1252 | ZCU-NTIS Speaker Diarization System for the DIHARD 2018 Challenge | Zbynek Zajic, Marie Kunesova, Jan Zelinka and Marek Hrúz |
1254 | A first investigation of the timing of turn-taking in Ruuli | Tuarik Buanzur, Margaret Zellers, Saudah Namyalo and Alena Witzlack-Makarevich |
1256 | Exploring temporal reduction in dialectal Spanish: a large-scale study of lenition of voiced stops and coda-s | Ioana Vasilescu, Nidia Hernandez, Bianca Vieru and Lori Lamel |
1258 | Nebula: F0 Estimation and Voicing Detection by Modeling the Statistical Properties of Feature Extractors | Kanru Hua |
1259 | A Novel Approach for Effective Recognition of the Code-Switched Data on Monolingual Language Model | Sreeram Ganji and Rohit Sinha |
1262 | Building state-of-the-art distant speech recognition using the CHiME-4 challenge with a setup of speech enhancement baseline | Szu-Jui Chen, Aswin Shanmugam Subramanian, Hainan Xu and Shinji Watanabe |
1264 | Perceptual and automatic evaluations of the intelligibility of speech degraded by noise induced hearing loss simulation | Imed Laaridh, Julien Tardieu, Cynthia Magnen, Pascal Gaillard, Jérôme Farinas and Julien Pinquier |
1265 | Transfer Learning based Progressive Neural Networks for Acoustic Modeling in Statistical Parametric Speech Synthesis | Ruibo Fu, Jianhua Tao, Yibin Zheng and Zhengqi Wen |
1266 | Automatic Evaluation of Speech Intelligibility based on i-vectors in the context of Head and Neck Cancers | Imed Laaridh, Corinne Fredouille, Alain Ghio, muriel lalain and Virginie Woisard |
1267 | Automatic Pronunciation Evaluation of Singing | Chitralekha Gupta, Haizhou Li and Ye Wang |
1269 | Joint Localization and Classification of Multiple Sound Sources Using a Multi-task Neural Network | Weipeng He, Petr Motlicek and Jean-Marc Odobez |
1270 | Paired Phone-Posteriors Approach to ESL Pronunciation Quality Assessment | Yujia Xiao, Frank Soong and Wenping Hu |
1271 | Phoneme Resistance and Phoneme Confusion in Noise: Impact of Dyslexia | Noelia Do Carmo Blanco, Julien Meyer, Michel Hoen and Fanny Meunier |
1272 | Improving Sparse Representations in Exemplar-Based Voice Conversion with a Phoneme-Selective Objective Function | Shaojin Ding, Guanlong Zhao, Christopher Liberatore and Ricardo Gutierrez-Osuna |
1280 | A Generalization of PLDA for Joint Modeling of Speaker Identity and Multiple Nuisance Conditions | Luciana Ferrer and Mitchell McLaren |
1281 | Detection of Glottal Closure Instants from Speech Signals: A Convolutional Neural Network Based Method | Shuai Yang, Zhiyong Wu, Binbin Shen and Helen Meng |
1283 | Topic and Keyword Identification for Low-resourced Speech Using Cross-Language Transfer Learning | Wenda Chen, Mark Hasegawa-Johnson and Nancy Chen |
1284 | Should code-switching models be asymmetric? | Barbara Bullock, Wally Guzman, Jacqueline Serigos and Almeida Jacqueline Toribio |
1285 | Visual timing information in audiovisual speech perception: evidence from lexical tone contour | Hui Xie, Biao Zeng and Rui Wang |
1286 | A Weighted Superposition of Functional Contours model for modelling contextual prominence of elementary prosodic contours | Branislav Gerazov, gerard bailly and Yi Xu |
1288 | An Interlocutor-Modulated Attentional LSTM for Differentiating between Subgroups of Autism Spectrum Disorder | Yun-Shao Lin, Susan Shur-Fen Gau and Chi-Chun Lee |
1291 | Multi-resolution gammachirp envelope distortion index for intelligibility prediction of noisy speech | Katsuhiko Yamamoto, Toshio Irino, Narumi Ohashi, Shoko Araki, Keisuke Kinoshita and Tomohiro Nakatani |
1293 | A Case Study on the Importance of Belief State Representation for Dialogue Policy Management | Margarita Kotti, Vassilios Diakoloukas, Alexandros Papangelis, Michail Lagoudakis and Yannis Stylianou |
1294 | SPEECH ENHANCEMENT USING THE MINIMUM-PROBABILITY-OF-ERROR CRITERION | Jishnu Sadasivan, Subhadip Mukherjee and Chandra Sekhar Seelamantula |
1295 | Learning Structured Dictionaries for Exemplar-based Voice Conversion | Shaojin Ding, Christopher Liberatore and Ricardo Gutierrez-Osuna |
1296 | Single-Channel Dereverberation Using Direct MMSE Optimization and Bidirectional LSTM Networks | Wolfgang Mack, Soumitro Chakrabarty, Fabian-Robert Stöter, Sebastian Braun, Bernd Edler and Emanuël Habets |
1297 | Exploration of Compressed ILPR Features for Replay Attack Detection | Sarfaraz Jelil, Sishir Kalita, S R Mahadeva Prasanna and Rohit Sinha |
1298 | Learning Conditional Acoustic Latent Representation with Gender and Age Attributes for Automatic Pain Level Recognition | Jeng-Lin Li, Yi-Ming Weng, Chip-Jin Ng and Chi-Chun Lee |
1299 | A Compact and Discriminative Feature based on Auditory Summary Statistics for Acoustic Scene Classification | Hongwei Song, Jiqing Han and Shiwen Deng |
1301 | Multi-channel Attention for End-to-End Speech Recognition | Stefan Braun, Daniel Neil, Jithendar Anumula, Enea Ceolini and Shih-Chii Liu |
1302 | BUT system for low resource Indian language ASR | Bhargav Pulugundla, Murali Karthick Baskar, Santosh Kesiraju, Ekaterina Egorova, Martin Karafiat, Lukas Burget and Jan Černocký |
1305 | Deep Metric Learning for the Target Cost in Unit-Selection Speech Synthesizer | Ruibo Fu, Jianhua Tao, Yibin Zheng and Zhengqi Wen |
1306 | Acoustic-dependent phonemic transcription for text-to-speech synthesis | Kévin Vythelingum, Yannick Estève and Olivier Rosec |
1308 | Unsupervised Word Segmentation from Speech with Attention | Pierre Godard, Marcely Zanon Boito, Lucas Ondel, Alexandre Berard, François Yvon, Aline Villavicencio and Laurent Besacier |
1309 | Liulishuo's System for the Spoken CALL Shared Task 2018 | Huy Nguyen, Lei Chen, Ramon Prieto, Chuan Wang and Yang Liu |
1310 | Harmonic-Percussive Source Separation of Polyphonic Music by Suppressing Impulsive Noise Events | Gurunath Reddy M, K Sreenivasa Rao and Partha Pratim Das |
1312 | Impact of ASR Performance on Free Speaking Language Assessment | Kate Knill, Mark Gales, Konstantinos Kyriakopoulos, Andrey Malinin, Anton Ragni, Yu Wang and Andrew Caines |
1313 | A Comparison of Speaker-based and Utterance-based Data Selection for Text-to-Speech Synthesis | Kai-Zhan Lee, Erica Cooper and Julia Hirschberg |
1316 | Data requirements, selection and augmentation for DNN-based speech synthesis from crowdsourced data | Markus Toman, Geoffrey Meltzner and Rupal Patel |
1318 | Semi-supervised learning for information extraction from dialogue | Anjuli Kannan, Kai Chen, Alvin Rajkomar and Diana Jaunzeikare |
1319 | Anomaly Detection Approach for Pronunciation Verification of Disordered Speech using Speech Attribute Features | Mostafa Shahin, Beena Ahmed, Jim Ji and Kirrie Ballard |
1320 | Prosodic Focus Acquisition in French Early Cochlear Implanted Children | Chadi Farah, Stephane Roman and Mariapaola D'Imperio |
1326 | Low-Resource Speech-to-Text Translation | Sameer Bansal, Herman Kamper, Karen Livescu, Adam Lopez and Sharon Goldwater |
1327 | Stochastic Shake-Shake Regularization for Affective Learning from Speech | Che-Wei Huang and Shrikanth Narayanan |
1328 | An Optimization Based Approach for Solving Spoken CALL Shared Task | Mohammad Ateeq, Abualsoud Hanani and Aziz Qaroush |
1331 | Vocalic, Lexical and Prosodic Cues for the INTERSPEECH 2018 Self-Assessed Affect Challenge | Claude Montacié and Marie-José Caraty |
1333 | Statistical Model Compression for Small-Footprint Natural Language Understanding | Grant Strimel, Kanthashree Mysore Sathyendra and Stanislav Peshterliev |
1336 | Automatically measuring L2 speech fluency without the need of ASR: a proof-of-concept study with Japanese learners of French | Lionel Fontan, Maxime Le Coz and Sylvain Detey |
1339 | A GPU-based WFST Decoder with Exact Lattice Generation | Zhehuai Chen, Justin Luitjens, Hainan Xu, Yiming Wang, Dan Povey and Sanjeev Khudanpur |
1342 | Adding New Classes Without Access to the Original Training Data with Applications to Language Identification | Hagai Taitelbaum, Ehud Ben-Reuven and Jacob Goldberger |
完整论文列表:
http://interspeech2018.org/accepted-papers.html
-END-
专 · 知
人工智能领域主题知识资料查看与加入专知人工智能服务群:
【专知AI服务计划】专知AI知识技术服务会员群加入与人工智能领域26个主题知识资料全集获取。欢迎微信扫一扫加入专知人工智能知识星球群,获取专业知识教程视频资料和与专家交流咨询!
请PC登录www.zhuanzhi.ai或者点击阅读原文,注册登录专知,获取更多AI知识资料!
请加专知小助手微信(扫一扫如下二维码添加),加入专知主题群(请备注主题类型:AI、NLP、CV、 KG等)交流~
请关注专知公众号,获取人工智能的专业知识!
点击“阅读原文”,使用专知