语音顶级会议Interspeech2018接受论文列表!


Interspeech2018

     Interspeech 是国际语音通信协会(ISCA)组织的语音领域顶级学术会议,是全球最大的综合性语音信息处理领域的科技盛会。Interspeech会议鼓励语音领域的交叉研究,特别是快速发展的人工智能和机器学习技术在语音领域的研究和应用。Interspeech 2018将于今年9月初在印度海得拉巴召开。

Paper ID Title Authors



27 Binaural Speech Intelligibility  Estimation Using Deep Neural Networks Kazuhiro Kondo, Kazuya Taira and  Yosuke Kobayashi
34 Real-Time Scoring of an Oral Reading  Assessment on Mobile Devices Jian Cheng
38 Conditional End-to-End Audio  Transformations Albert Haque, Michelle Guo and  Prateek Verma
40 Speech recognition for medical  conversations Chung-Cheng Chiu, Anshuman Tripathi,  Katherine Chou, Chris Co, Navdeep Jaitly, Diana Jaunzeikare, Anjuli Kannan,  Patrick Nguyen, Hasim Sak, Ananth Sankar, Justin Tansuwan, Nathan Wan,  Yonghui Wu and Xuedong Zhang
41 Improved Supervised Locality  Preserving Projection for I-vector Based Speaker Verification Lanhua You, Wu Guo, Yan Song and  Sheng Zhang
42 Attentive Sequence-to-Sequence  Learning for Diacritic Restoration of Yorùbá Language Text Iroro Orife
43 Frequency domain variants of velvet  noise and their application to speech processing and synthesis Hideki Kawahara, Ken-Ichi Sakakibara,  Masanori Morise, Hideki Banno, Tomoki Toda and Toshio Irino
45 A novel normalization method for  autocorrelation function for pitch detection and for speech activity  detection Qiguang Lin and Yiwen Shao
46 Dithered Quantization for  Frequency-Domain Speech and Audio Coding Tom Bäckström, Johannes Fischer and  sneha das
47 Categorical vs Dimensional Perception  of Italian Emotional Speech Emilia Parada-Cabaleiro, Giovanni  Costantini, Anton Batliner, Alice Baird and Björn Schuller
48 Cross-language perception of Mandarin  lexical tones by Mongolian-speaking bilinguals in the Inner Mongolia  Autonomous Region, China Kimiko Tsukada and Yu rong
51 The INTERSPEECH 2018 Computational  Paralinguistics Challenge: Atypical & Self-Assessed Affect, Crying &  Heart Beats Björn Schuller, Stefan Steidl, Anton  Batliner, Peter Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke,  Florian Pokorny, Eva-Maria Rathner, Karin Bartl-Pokorny, Christa Einspieler,  Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian  Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
52 Acoustic and Textual Data  Augmentation for Improved ASR of Code-Switching Speech Emre Yilmaz, Henk van den Heuvel and  David van Leeuwen
57 Investigating the Effect of Audio  Duration on Dementia Detection using Acoustic Features Jochen Weiner, Miguel Angrick,  Srinivasan Umesh and Tanja Schultz
60 The Trajectory of Voice Onset Time  with Vocal Aging Chen Xuanda, Xiong Ziyu and Hu Jian
61 Voice Comparison and Rhythm:  Behavioral Differences between Target and Non-target Comparisons Moez Ajili, Jean-Francois Bonastre  and Solange Rossato
62 Entity-Aware Language Model as an  Unsupervised Reranker Mohammad Sadegh Rasooli and  Sarangarajan Parthasarathy
63 Effects of User Controlled Speech  Rate on Intelligibility in Noisy Environments John Novak and Robert Kenyon
65 The ‘West Yorkshire Regional English  Database’: Investigations into the generalizability of reference populations  for forensic speaker comparison casework Erica Gold, Sula Ross and Kate  Earnshaw
67 Articulatory Features for ASR of  Pathological Speech Emre Yilmaz, Vikramjit Mitra, Chris  Bartels and Horacio Franco
68 Vowel space as a tool to evaluate  articulation problems Rob van Son, Catherine Middag and  Kris Demuynck
69 Performance Analysis of the 2017 NIST  Language Recognition Evaluation Seyed Omid Sadjadi, Timothee  Kheyrkhah, Craig Greenberg, Douglas Reynolds, Elliot Singer, Lisa Mason and  Jaime Hernandez-Cordero
70 Gated Convolutional Neural Network  for Sentence Matching Peixin Chen, Wu Guo, Zhi Chen, Jian  Sun and Lanhua You
73 COSMO SylPhon: a model to assess  phonological learning Jean-Luc Schwartz
78 Active Memory Networks for Language  Modeling Oscar Chen, Anton Ragni, Mark Gales  and Xie Chen
79 Lattice-free State-level Minimum  Bayes Risk Training of Acoustic Models Naoyuki Kanda, Yusuke Fujita and  Kenji Nagamatsu
83 Deep Speech Denoising with Vector  Space Projections Jeffrey Hetherly, Paul Gamble, Maria  Alejandra Barrios, Cory Stephenson and Karl Ni
84 What to Expect from Expected  Kneser-Ney Smoothing Michael Levit, Sarangarajan  Parthasarathy and Shuangyu Chang
91 Emotional Prosody Perception in  Mandarin-speaking Congenital Amusics Yixin Zhang, Tianzhu Geng and Jinsong  Zhang
92 Analysis of Length Normalization in  End-to-End Speaker Verification System Weicheng Cai, Jinkun Chen and Ming Li
97 Overview of the 2018 Spoken CALL  Shared Task Claudia Baur, Andrew Caines, Cathy  Chua, Johanna Gerlach, Mengjie Qian, Manny Rayner, Martin Russell, Helmer  Strik and Xizi Wei
990 Comparing the Max and Noisy-Or  Pooling Functions in Multiple Instance Learning for Weakly Supervised  Sequence Learning Tasks Yun Wang, Juncheng Li and Florian  Metze
991 Prediction of Aesthetic Elements in  Karnatic Music: A Machine Learning Approach Ragesh Rajan M, Ashwin Vijayakumar  and Deepu Vijayasenan
993 Attentive Statistics Pooling for Deep  Speaker Embedding Koji Okabe, Takafumi Koshinaka and  Koichi Shinoda
995 UltraFit: A speaker-friendly headset  for ultrasound recordings in speech sciences Lorenzo Spreafico, Michael Pucher and  Anna Matosova
996 Bags in Bag: Generating Context-Aware  Bags for Tracking Emotions from Speech Jing Han, Zixing Zhang, Maximilian  Schmitt, Zhao Ren, Fabien Ringeval and Björn Schuller
999 Articulatory-to-speech conversion  using bi-directional long short-term memory Fumiaki Taguchi and Tokihiko Kaburagi
1000 The CSU-K Rule-Based System for the  2nd Edition Spoken CALL Shared Task Kay Berkling, Cem Philipp Freimoser,  Mario Kunstek and Jülg Dominik
1007 Follow-up Question Generation using  Pattern-based Seq2seq with a Small Corpus for Interview Coaching Ming-Hsiang Su, Chung-Hsien Wu,  Kun-Yi Huang, Qian-Bei Hong and Huai-Hung Huang
1010 Learning Acoustic Word Embeddings  with Temporal Context for Query-by-Example Speech Search Yougen Yuan, Cheung-Chi Leung, lei  xie, Hongjie Chen, Bin Ma and Haizhou Li
1013 Capsule Networks for Low Resource  Spoken Language Understanding Vincent Renkens and Hugo Van hamme
1015 Learning Discriminative Features for  Speaker Identification and Verification Sarthak Yadav and Atul Rai
1016 LSTM based Attentive Fusion of  Spectral and Prosodic Information for Keyword Spotting in Hindi Language Laxmi Pandey and Karan Nathwani
1018 Detection of glottal closure instants  in degraded speech using single frequency filtering analysis Gunnam Aneeja, Sudarsana Reddy Kadiri  and Bayya Yegnanarayana
1019 Annotator Trustability-based  Cooperative Learning Solutions for Intelligent Audio Analysis Simone Hantke, Christoph Stemp and  Björn Schuller
1020 Deep Noise Tracking Network: A Hybrid  Signal Processing/Deep Learning Approach to Speech Enhancement Shuai Nie, Shan Liang, Bin Liu,  Yaping Zhang, Wenju Liu and Jianhua Tao
1021 Unsupervised and Efficient Vocabulary  Expansion for Recurrent Neural Network Language Models in ASR Yerbolat Khassanov and Eng Siong Chng
1023 MTGAN: Speaker Verification through  Multitasking Triplet Generative Adversarial Networks Wenhao Ding and Liang HE
1024 Effective acoustic cue learning is  not just statistical, it is discriminative Jessie S. Nixon
1025 Compression of End-to-End Models Ruoming Pang, Tara Sainath, Rohit  Prabhavalkar, Suyog Gupta, Yonghui Wu, Shuyuan Zhang and Chung-Cheng Chiu
1026 Postfiltering with Complex Spectral  Correlations for Speech and Audio Coding sneha das and Tom Bäckström
1027 Postfiltering Using Log-Magnitude  Spectrum for Speech and Audio Coding sneha das and Tom Bäckström
1030 Improving Attention Based  Sequence-to-Sequence Models for End-to-End English Conversational Speech  Recognition Chao Weng, Jia Cui, Guangsen Wang,  Jun Wang, Chengzhu Yu, Dan Su and Dong Yu
1032 Discriminating between nasals and  approximants in English language using zero time windowing RaviShankar Prasad, Sudarsana Reddy  Kadiri, Suryakanth V Gangashetty and Bayya Yegnanarayana
1034 Scalable Factorized Hierarchical  Variational Autoencoder Training Wei-Ning Hsu and James Glass
1035 Contextual Slot Carryover for  Disparate Schemas Chetan Naik, Arpit Gupta, Hancheng  Ge, Mathias Lambert and Ruhi Sarikaya
1037 Stream Attention for Distributed  Multi-Microphone Speech Recognition Xiaofei Wang, Ruizhi Li and Hynek  Hermansky
1038 Articulatory consequences of vocal  effort elicitation method Elisabet Eir Cortes, Marcin  Wlodarczak and Juraj Šimko
1039 Cross-Lingual Multi-Task Neural  Architecture for Spoken Language Understanding Yujiang Li, Xuemin Zhao, Weiqun Xu  and Yonghong Yan
1042 Spoofing Detection Using Adaptive  Weighting Framework and Clustering Analysis Yuanjun Zhao, Roberto Togneri and  Victor Sreeram
1043 Designing a Pneumatic Bionic Voice  Prosthesis - Statistical Approach for Source Excitation Generation Farzaneh Ahmadi and Tomoki Toda
1044 Training Utterance-level Embedding  Networks for Speaker Identification and Verification Heewoong Park, Sukhyun Cho, Kyubyong  Park, Namju Kim and Jonghun Park
1046 Bone-Conduction Sensor Assisted Noise  Estimation for Improved Speech Enhancement Ching-Hua Lee, Bhaskar D. Rao and  Harinath Garudadri
1047 Comparison of Syllabification  Algorithms and Training Strategies for Robust Word Count Estimation across  Different Languages and Recording Conditions Okko Räsänen, Seshadri Shreyas and  Marisa Casillas
1049 Acoustic Modeling with DFSMN-CTC and  Joint CTC-CE Learning ShiLiang Zhang and Ming Lei
1054 Towards a better characterization of  Parkinsonian speech: a multidimensional acoustic study Veronique Delvaux, kathy Huet, Myriam  Piccaluga, Sophie Van Malderen and Bernard Harmegnies
1055 Low-Latency Neural Speech Translation Jan Niehues, Ngoc-Quan Pham, Thanh-Le  Ha, Matthias Sperber and Alex Waibel
1057 Structured Word Embedding for Low  Memory Neural Network Language Model Kaiyu Shi and Kai Yu
1058 An End-to-End Text-Independent  Speaker Identification System on Short Utterances Ruifang Ji, Xinyuan Cai and Xu Bo
1059 Dysarthric speech classification  using glottal features computed from non-words, words and sentences Narendra N P and Paavo Alku
1060 Length contrast and covarying  features: Whistled speech as a case study Rachid Ridouane, Giuseppina Turco and  Julien Meyer
1062 On the Usefulness of the Speech Phase  Spectrum for Pitch Extraction Erfan Loweimi, Jon Barker and Thomas  Hain
1063 Semi-supervised Cross-domain Visual  Feature Learning for Audio-Visual Broadcast Speech Transcription Rongfeng Su, Xunying Liu and Lan Wang
1065 Regional variation of /r/ in Swiss  German dialects Adrian Leemann, Stephan Schmid,  Dieter Studer-Joho and Marie-José Kolly
1070 i-Vectors in Language Modeling: An  Efficient Way of Domain Adaptation for Feed-Forward Models Karel Beneš, Santosh Kesiraju and  Lukáš Burget
1074 Structural effects on properties of  consonantal gestures in Tashlhiyt Anne Hermes, Doris Mücke, Bastian  Auris and Rachid Ridouane
1076 General Utterance-Level Feature  Extraction for Classifying Crying Sounds, Atypical & Self-Assessed Affect  and Heart Beats Gábor Gosztolya, Tamás Grósz and  László Tóth
1078 Multi-Task Learning of Speech  Recognition and Speech Synthesis Parameters for Ultrasound-based Silent  Speech Interfaces László Tóth, Gábor Gosztolya, Tamás  Grósz, Alexandra Markó and Tamás Gábor Csapó
1079 Identifying Schizophrenia Based on  Temporal Parameters in Spontaneous Speech Gábor Gosztolya, Anita Bagi, Szilvia  Szalóki, István Szendi and Ildikó Hoffmann
1080 Implementation of Respiration in  Articulatory Synthesis Using a Pressure-Volume Lung Model Keisuke Tanihara, Shogo Yonekura and  Yasuo Kuniyoshi
1081 Exploiting Speaker and Phonetic  Diversity of Mismatched Language Resources for Unsupervised Subword Modeling Siyuan Feng and Tan Lee
1085 Automatic Speech Recognition System  Development in the "Wild" Anton Ragni and Mark Gales
1086 Extending Recurrent Neural Aligner  for Streaming End-to-End Speech Recognition in Mandarin Linhao Dong, Shiyu Zhou, Wei Chen and  Bo Xu
1087 A deep learning approach to assessing  non-native pronunciation of English using phone distances Konstantinos Kyriakopoulos, Kate  Knill and Mark Gales
1088 The Conversation Continues: The  Effect of Lyrics and Music Complexity of Background Music on Spoken-Word  Recognition Odette Scharenborg and Martha Larson
1089 Acoustic Modeling with Densely  Connected Residual Network for Multichannel Speech Recognition Jian Tang, Yan Song, Lirong Dai and  Ian McLoughlin
1093 The Perception and Analysis of the  Likeability and Human Likeness of Synthesized Speech Alice Baird, Emilia Parada-Cabaleiro,  Simone Hantke, Felix Burkhardt, Nicholas Cummins and Björn Schuller
1096 Punctuation Prediction Model for  Conversational Speech Piotr Żelasko, Piotr Szymański, Jan  Mizgajski, Adrian Szymczak, Yishay Carmiel and Najim Dehak
1097 Unsupervised Adaptation with  Interpretable Disentangled Representations for Distant Conversational Speech  Recognition Wei-Ning Hsu, Hao Tang and James  Glass
1098 Detecting Packet-Loss Concealment  Using Formant Features and Decision Tree Learning Gabriel Mittag and Sebastian Möller
1099 The Role of Cognate Words, POS Tags,  and Entrainment in Code-Switching Victor Soto, Nishi Cestero and Julia  Hirschberg
1100 Play Duration based User-Entity  Affinity Modeling in Spoken Dialog System Bo Xiao, Nicholas Monath, Shankar  Ananthakrishnan and Abishek Ravi
1102 Analysis of Complementary Information  Sources in the Speaker Embeddings Framework Mahesh Kumar Nandwana, Mitchell  McLaren, Diego Castan, Julien van Hout and Aaron Lawson
1103 Double Joint Bayesian Modeling of DNN  Local I-Vector for Text Dependent Speaker Verification with Random Digit  Strings Ziqiang Shi, Huibin Lin, Liu Liu and  Rujie Liu
1105 Estimation of the Vocal Tract Length  of Vowel Sounds based on the Frequency of the Significant Spectral Valley TV Ananthapadmanabha and Ramakrishnan  AngaraiGanesan
1107 Syllable-Based Sequence-to-Sequence  Speech Recognition with the Transformer in Mandarin Chinese Shiyu Zhou, Dong Linhao, Shuang Xu  and Bo Xu
1108 Tongue Segmentation with  Geometrically Constrained Snake Model Zhihua Su, Jianguo Wei, Qiang Fang,  Jianrong Wang and Kiyoshi Honda
1110 L2-ARCTIC: a non-native English  speech corpus Guanlong Zhao, Sinem Sonsaat, Alif  Silpachai, Ivana Lucic, Evgeny Chukharev-Hudilainen, John Levis and Ricardo  Gutierrez-Osuna
1111 Improving Language Modeling with an  Adversarial Critic for Automatic Speech Recognition Yike Zhang, pengyuan zhang and  Yonghong Yan
1113 Expressive Speech Synthesis via  Modeling Expressions with Variational Autoencoder Kei Akuzawa, Yusuke Iwasawa and  Yutaka Matsuo
1114 A Deep Neural Network Based Harmonic  Noise Model for Speech Enhancement Zhiheng Ouyang, Hongjiang Yu,  Wei-Ping Zhu and Benoit Champagne
1115 A comparison of input types to a deep  neural network-based forced aligner Matthew C. Kelley and Benjamin V.  Tucker
1120 Multiple Instance Deep Learning for  Weakly Supervised Small-Footprint Audio Event Detection Shao-Yen Tseng, Juncheng Li, Yun  Wang, Florian Metze, Joseph Szurley and Samarjit Das
1121 Voice Conversion with Conditional  SampleRNN Cong Zhou, Michael Horgan, Vivek  Kumar, Cristina Vasco and Dan Darcy
1122 Contextual Language Model Adaptation  for Conversational Agents Anirudh Raju, Behnam Hedayatnia,  Linda Liu, Ankur Gandhe, Chandra Khatri, Angeliki Metallinou, Anu Venkatesh  and Ariya Rastrow
1124 Improved ASR for under-resourced  languages through Multi-task Learning with Acoustic Landmarks Di He, Boon Pang Lim, Xuesong Yang,  Mark Hasegawa-Johnson and Deming Chen
1125 Self-similarity matrix based  intelligibility assessment of cleft lip and palate speech Sishir Kalita, S R Mahadeva Prasanna  and Samarendra Dandapat
1126 Formant measures of vowels adjacent  to alveolar and retroflex consonants in Arrernte: stressed and unstressed  position Marija Tabain, Richard Beare and  Andrew Butcher
1128 Linear Prediction Residual based  Short-term Cepstral Features for Replay Attacks Detection Madhusudan Singh and Debadatta Pati
1130 Dialect-geographical  Acoustic-Tonetics: five disyllabic tone sandhi patterns in cognate words from  the Wu dialects of Zhèjiāng province Phil Rose
1131 A Voice Conversion Framework with  Tandem Feature Sparse Representation and Speaker-Adapted WaveNet Vocoder Berrak Sisman, Mingyang Zhang and  Haizhou Li
1132 EMOTION RECOGNITION FROM HUMAN SPEECH  USING TEMPORAL INFORMATION AND DEEP LEARNING John Kim and Rif A. Saurous
1134 Bidirectional Long-Short Term Memory  Network-based Estimation of Reliable Spectral Component Locations Aaron Nicolson and Kuldip K. Paliwal
1135 Joint Noise and Reverberation  Adaptive Learning for Robust Speaker DOA Estimation with An Acoustic Vector  Sensor Disong Wang and Yuexian Zou
1138 Multi-modal attention mechanisms in  LSTM and its application to acoustic scene classification Zhang Teng, Kailai Zhang and Ji Wu
1139 Rapid Collection of Spontaneous  Speech Corpora using Telephonic Community Forums Agha Ali Raza, Awais Athar, Shan  Randhawa, Zain Tariq, Muhammad Bilal Saleem, Haris Bin Zia, Umar Saif and  Roni Rosenfeld
1140 Monoaural Audio Source Separation  using Variational Autoencoders Laxmi Pandey, Anurendra Kumar and  Vinay Namboodiri
1143 Deep learning techniques for koala  activity detection Ivan Himawan, Michael Towsey, Bradley  Law and Paul Roe
1147 Glottal Closure Instant Detection  from Speech Signal Using Voting Classifier and Recursive Feature Elimination Jindrich Matousek and Daniel Tihelka
1149 User Information Augmented Semantic  Frame Parsing using Progressive Neural Networks Yilin Shen, Xiangyu Zeng, Yu Wang and  Hongxia Jin
1150 A Shifted Delta Coefficient Objective  for Monaural Speech Separation using Multi-task Learning Chenglin Xu, Wei Rao, Eng Siong Chng  and Haizhou Li
1151 Joint Learning using Denoising  Variational Autoencoders for Voice Activity Detection Youngmoon Jung, Younggwan Kim, Yeunju  Choi and Hoirin Kim
1152 Temporal transformer networks for  acoustic scene classification Zhang Teng, Kailai Zhang and Ji Wu
1153 State Gradients for RNN Memory  Analysis Lyan Verwimp, Hugo Van hamme, Vincent  Renkens and Patrick Wambacq
1154 Waveform-Based Speaker  Representations for Speech Synthesis Moquan Wan, Gilles Degottex and Mark  Gales
1156 Leveraging Second-Order Log-Linear  model for improved deep learning based ASR performance Ankit Raj, Shakti Rath and Jithendra  Vepa
1158 Self-Attentive Speaker Embeddings for  Text-Independent Speaker Verification Yingke Zhu, Tom Ko, David Snyder,  Brian Mak and Dan Povey
1159 Word Emphasis Prediction for  Expressive Text to Speech Yosi Mass, Slava Shechtman, Moran  Mordechay, Ron Hoory, Oren Sar Shalom, Guy Lev and David Konopnicki
1160 Forward-Backward Attention Decoder Masato Mimura, Shinsuke Sakai and  Tatsuya Kawahara
1162 Active Learning for LF-MMI Trained  Neural Networks in ASR Yanhua Long, Hong Ye, Yijie Li and  Jiaen Liang
1165 Using Deep Neural Networks for  Identification of Slavic Languages from Acoustic Signal Lukas Mateju, Petr Cerva, Jindrich  Zdansky and Radek Safarik
1171 Homophone Identification and Merging  for Code-switched Speech Recognition Brij Mohan Lal Srivastava and  Sunayana Sitaram
1173 Improved Epoch Extraction from  Telephonic Speech using Chebfun and Zero Frequency Filtering Ganga Gowri B, Soman K.P and Govind D
1174 Using pupillometry to measure the  cognitive load of synthetic speech Avashna Govender and Simon King
1176 Resyllabification in Indian Languages  and its Implications in Text-to-speech Systems Mahesh M, Jeena JPrakash and Hema  Murthy
1178 Code-switching in Indic Speech  Synthesisers Anju Leela Thomas, Anusha Prakash,  Arun Baby and Hema Murthy
1182 Improving Cross-Lingual Knowledge  Transferability Using Multilingual TDNN-BLSTM with Language-Dependent  Pre-Final Layer Siyuan Feng and Tan Lee
1185 GlobalTIMIT: Acoustic-Phonetic  Datasets for the World’s Languages Nattanun Chanchaochai, Christopher  Cieri, Japhet Debrah, Hongwei Ding, Yue Jiang, Sishi Liao, Mark Liberman,  Jonathan Wright, Jiahong Yuan, Juhong Zhan and Yuqing Zhan
1188 Transcription correction for Indian  languages using acoustic signatures Jeena JPrakash, Golda Brunet Rajan  and Hema Murthy
1190 WaveNet Vocoder with Limited Training  Data for Voice Conversion Li-Juan Liu, Zhen-Hua Ling, Yuan  Jiang, Ming Zhou and Li-Rong Dai
1198 Learning and Modeling Unit Embeddings  for Improving HMM-based Unit Selection Speech Synthesis Xiao Zhou, Zhen-Hua Ling, Zhi-Ping  Zhou and Li-Rong Dai
1199 Measuring the cognitive load of  synthetic speech using a dual task paradigm Avashna Govender and Simon King
1202 Phoneme-to-Articulatory mapping using  bidirectional gated RNN Théo Biasutto--Lervat and Slim Ouni
1203 Information Bottleneck based  Percussion Instrument Diarization System for Taniavartanam Segments of  Carnatic Music Concerts Nauman Dawalatabad, Jom Kuriakose,  Chandra Sekhar Chellu and Hema Murthy
1204 Compact Feedforward Sequential Memory  Networks for Small-footprint Keyword Spotting Mengzhe Chen, ShiLiang Zhang, Ming  Lei, Yong Liu, Haitao Yao and Jie Gao
1205 Deep Extractor Network for Target  Speaker Recovery From Single Channel Speech Mixtures Jun Wang, Jie Chen, Dan Su, Lianwu  Chen, Meng Yu, Yanmin Qian and Dong Yu
1209 Triplet loss based cosine similarity  metric learning for text-independent speaker recognition Sergey Novoselov, Vadim Shchemelinin,  Andrey Shulipa, Alexandr Kozlov and Ivan Kremnev
1210 Collapsed speech segment detection  and suppression for WaveNet vocoder YICHIAO WU, Kazuhiro Kobayashi,  Tomoki Hayashi, Patrick Lumban Tobing and Tomoki Toda
1211 Data augmentation improves  recognition of foreign accented speech Takashi Fukuda, Raul Fernandez,  Andrew Rosenberg, Samuel Thomas, Bhuvana Ramabhadran, Alexander Sorin and  Gakuto Kurata
1212 Segmental Encoder-Decoder Models for  Large Vocabulary Automatic Speech Recognition Eugen Beck, Mirko Hannemann, Patrick  Dötsch, Ralf Schlüter and Hermann Ney
1214 Exploration of Local Speaking Rate  Variations in Mandarin Read Speech Guan-Ting Liou, Chen-Yu CHIANG,  Yih-Ru Wang and Sin-Horng Chen
1222 An Active Feature Transformation  Method For Attitude Recognition of Video Bloggers Fasih Haider, Fahim A. Salim, Owen  Conlan and Saturnino Luz
1223 A New Framework for Supervised Speech  Enhancement in the Time Domain Ashutosh Pandey and Deliang Wang
1224 Singing voice phoneme segmentation by  hierarchically inferring syllable and phoneme onset positions Rong Gong and Xavier Serra
1225 Vowels and Diphthongs in Hangzhou Wu  Chinese Dialect Yang Yue and Fang Hu
1226 Speaker Embedding Extraction with  Phonetic Information Yi Liu, Liang He, Jia Liu and Michael  T. Johnson
1227 Investigating accuracy of  pitch-accent annotations in neural network-based speech synthesis and  denoising effects Hieu-Thi Luong, Xin Wang, Junichi  Yamagishi and Nobuyuki Nishizawa
1230 Time-regularized linear prediction  for noise-robust extraction of the spectral envelope of speech Manu Airaksinen, Lauri Juvela, Okko  Räsänen and Paavo Alku
1232 S4D: Speaker Diarization Toolkit in  Python Pierre-Alexandre Broux, Florent  Desnous, Anthony Larcher, Simon Petitrenaud, Jean Carrive and Sylvain  Meignier
1233 Age-related effects on sensorimotor  control of speech production Anne Hermes, Jane Mertens and Doris  Mücke
1234 Single-channel Speech Dereverberation  via Generative Adversarial Training Chenxing Li, Tieqiang Wang, Shuang Xu  and Bo Xu
1237 Biophysically-inspired features  improve the generalizability of neural network-based speech enhancement  systems Deepak Baby and Sarah Verhulst
1238 Deep Learning in Paralinguistic  Recognition Tasks: Are Hand-crafted Features Still Relevant? Johannes Wagner, Dominik Schiller,  Andreas Seiderer and Elisabeth André
1239 Naturalness Improvement Algorithm for  Reconstructed Glossectomy Patient's Speech Using Spectral Differential  Modification in Voice Conversion Hiroki Murakami, Sunao Hara, Masanobu  Abe, Masaaki Sato and Shogo Minagi
1240 On Learning to Identify Genders from  Raw Speech Signal using CNNs Selen Hande Kabil, Hannah Muckenhirn  and Mathew Magimai Doss
1241 Neural Language Codes for  Multilingual Acoustic Models Markus Müller, Sebastian Stüker and  Alex Waibel
1242 An Attention Pooling based  Representation Learning Method for Speech Emotion Recognition Pengcheng Li, Yan Song, Ian  McLoughlin, Wu Guo and Lirong Dai
1243 Unsupervised Temporal Feature  Learning Based on Sparse Coding Embedded BoAW for Acoustic Event Recognition Liwen Zhang
1244 Learning to adapt: a meta-learning  approach for speaker adaptation Ondrej Klejch, Joachim Fainberg,  Peter Bell and Steve Renals
1245 Weighting Pitch Contour and Loudness  Contour in Mandarin Tone Perception in Cochlear Implant Listeners Qinglin Meng, Nengheng Zheng, Ambika  Prasad Mishra, Jacinta Dan Luo and Jan W. H. Schnupp
1246 Co-whitening of i-vectors for short  and long duration speaker verification Longting Xu, Kong Aik Lee, Haizhou Li  and Zhen Yang
1247 Training Augmentation using  Adversarial Examples for Robust Speech Recognition Sining Sun, Ching-Feng Yeh, Mari  Ostendorf, Mei-Yuh Hwang and Lei Xie
1248 Multiple Concurrent Sound Source  Tracking Based on Observation-Guided Adaptive Particle Filter Hong Liu, haipeng lan, Bing Yang and  Cheng Pang
1250 Data independent sequence  augmentation method for acoustic scene classification Zhang Teng, Kailai Zhang and Ji Wu
1251 Pitch-Adaptive Front-end Feature for  Hypernasality Detection Akhilesh Dubey, S R Mahadeva Prasanna  and Samarendra Dandapat
1252 ZCU-NTIS Speaker Diarization System  for the DIHARD 2018 Challenge Zbynek Zajic, Marie Kunesova, Jan  Zelinka and Marek Hrúz
1254 A first investigation of the timing  of turn-taking in Ruuli Tuarik Buanzur, Margaret Zellers,  Saudah Namyalo and Alena Witzlack-Makarevich
1256 Exploring temporal reduction in  dialectal Spanish: a large-scale study of lenition of voiced stops and coda-s Ioana Vasilescu, Nidia Hernandez,  Bianca Vieru and Lori Lamel
1258 Nebula: F0 Estimation and Voicing  Detection by Modeling the Statistical Properties of Feature Extractors Kanru Hua
1259 A Novel Approach for Effective  Recognition of the Code-Switched Data on Monolingual Language Model Sreeram Ganji and Rohit Sinha
1262 Building state-of-the-art distant  speech recognition using the CHiME-4 challenge with a setup of speech  enhancement baseline Szu-Jui Chen, Aswin Shanmugam  Subramanian, Hainan Xu and Shinji Watanabe
1264 Perceptual and automatic evaluations  of the intelligibility of speech degraded by noise induced hearing loss  simulation Imed Laaridh, Julien Tardieu, Cynthia  Magnen, Pascal Gaillard, Jérôme Farinas and Julien Pinquier
1265 Transfer Learning based Progressive  Neural Networks for Acoustic Modeling in Statistical Parametric Speech  Synthesis Ruibo Fu, Jianhua Tao, Yibin Zheng  and Zhengqi Wen
1266 Automatic Evaluation of Speech  Intelligibility based on i-vectors in the context of Head and Neck Cancers Imed Laaridh, Corinne Fredouille,  Alain Ghio, muriel lalain and Virginie Woisard
1267 Automatic Pronunciation Evaluation of  Singing Chitralekha Gupta, Haizhou Li and Ye  Wang
1269 Joint Localization and Classification  of Multiple Sound Sources Using a Multi-task Neural Network Weipeng He, Petr Motlicek and  Jean-Marc Odobez
1270 Paired Phone-Posteriors Approach to  ESL Pronunciation Quality Assessment Yujia Xiao, Frank Soong and Wenping  Hu
1271 Phoneme Resistance and Phoneme  Confusion in Noise: Impact of Dyslexia Noelia Do Carmo Blanco, Julien Meyer,  Michel Hoen and Fanny Meunier
1272 Improving Sparse Representations in  Exemplar-Based Voice Conversion with a Phoneme-Selective Objective Function Shaojin Ding, Guanlong Zhao,  Christopher Liberatore and Ricardo Gutierrez-Osuna
1280 A Generalization of PLDA for Joint  Modeling of Speaker Identity and Multiple Nuisance Conditions Luciana Ferrer and Mitchell McLaren
1281 Detection of Glottal Closure Instants  from Speech Signals: A Convolutional Neural Network Based Method Shuai Yang, Zhiyong Wu, Binbin Shen  and Helen Meng
1283 Topic and Keyword Identification for  Low-resourced Speech Using Cross-Language Transfer Learning Wenda Chen, Mark Hasegawa-Johnson and  Nancy Chen
1284 Should code-switching models be  asymmetric? Barbara Bullock, Wally Guzman,  Jacqueline Serigos and Almeida Jacqueline Toribio
1285 Visual timing information in  audiovisual speech perception: evidence from lexical tone contour Hui Xie, Biao Zeng and Rui Wang
1286 A Weighted Superposition of  Functional Contours model for modelling contextual prominence of elementary  prosodic contours Branislav Gerazov, gerard bailly and  Yi Xu
1288 An Interlocutor-Modulated Attentional  LSTM for Differentiating between Subgroups of Autism Spectrum Disorder Yun-Shao Lin, Susan Shur-Fen Gau and  Chi-Chun Lee
1291 Multi-resolution gammachirp envelope  distortion index for intelligibility prediction of noisy speech Katsuhiko Yamamoto, Toshio Irino,  Narumi Ohashi, Shoko Araki, Keisuke Kinoshita and Tomohiro Nakatani
1293 A Case Study on the Importance of  Belief State Representation for Dialogue Policy Management Margarita Kotti, Vassilios  Diakoloukas, Alexandros Papangelis, Michail Lagoudakis and Yannis Stylianou
1294 SPEECH ENHANCEMENT USING THE  MINIMUM-PROBABILITY-OF-ERROR CRITERION Jishnu Sadasivan, Subhadip Mukherjee  and Chandra Sekhar Seelamantula
1295 Learning Structured Dictionaries for  Exemplar-based Voice Conversion Shaojin Ding, Christopher Liberatore  and Ricardo Gutierrez-Osuna
1296 Single-Channel Dereverberation Using  Direct MMSE Optimization and Bidirectional LSTM Networks Wolfgang Mack, Soumitro Chakrabarty,  Fabian-Robert Stöter, Sebastian Braun, Bernd Edler and Emanuël Habets
1297 Exploration of Compressed ILPR  Features for Replay Attack Detection Sarfaraz Jelil, Sishir Kalita, S R  Mahadeva Prasanna and Rohit Sinha
1298 Learning Conditional Acoustic Latent  Representation with Gender and Age Attributes for Automatic Pain Level  Recognition Jeng-Lin Li, Yi-Ming Weng, Chip-Jin  Ng and Chi-Chun Lee
1299 A Compact and Discriminative Feature  based on Auditory Summary Statistics for Acoustic Scene Classification Hongwei Song, Jiqing Han and Shiwen  Deng
1301 Multi-channel Attention for  End-to-End Speech Recognition Stefan Braun, Daniel Neil, Jithendar  Anumula, Enea Ceolini and Shih-Chii Liu
1302 BUT system for low resource Indian  language ASR Bhargav Pulugundla, Murali Karthick  Baskar, Santosh Kesiraju, Ekaterina Egorova, Martin Karafiat, Lukas Burget  and Jan Černocký
1305 Deep Metric Learning for the Target  Cost in Unit-Selection Speech Synthesizer Ruibo Fu, Jianhua Tao, Yibin Zheng  and Zhengqi Wen
1306 Acoustic-dependent phonemic  transcription for text-to-speech synthesis Kévin Vythelingum, Yannick Estève and  Olivier Rosec
1308 Unsupervised Word Segmentation from  Speech with Attention Pierre Godard, Marcely Zanon Boito,  Lucas Ondel, Alexandre Berard, François Yvon, Aline Villavicencio and Laurent  Besacier
1309 Liulishuo's System for the Spoken  CALL Shared Task 2018 Huy Nguyen, Lei Chen, Ramon Prieto,  Chuan Wang and Yang Liu
1310 Harmonic-Percussive Source Separation  of Polyphonic Music by Suppressing Impulsive Noise Events Gurunath Reddy M, K Sreenivasa Rao  and Partha Pratim Das
1312 Impact of ASR Performance on Free  Speaking Language Assessment Kate Knill, Mark Gales, Konstantinos  Kyriakopoulos, Andrey Malinin, Anton Ragni, Yu Wang and Andrew Caines
1313 A Comparison of Speaker-based and  Utterance-based Data Selection for Text-to-Speech Synthesis Kai-Zhan Lee, Erica Cooper and Julia  Hirschberg
1316 Data requirements, selection and  augmentation for DNN-based speech synthesis from crowdsourced data Markus Toman, Geoffrey Meltzner and  Rupal Patel
1318 Semi-supervised learning for  information extraction from dialogue Anjuli Kannan, Kai Chen, Alvin  Rajkomar and Diana Jaunzeikare
1319 Anomaly Detection Approach for  Pronunciation Verification of Disordered Speech using Speech Attribute  Features Mostafa Shahin, Beena Ahmed, Jim Ji  and Kirrie Ballard
1320 Prosodic Focus Acquisition in French  Early Cochlear Implanted Children Chadi Farah, Stephane Roman and  Mariapaola D'Imperio
1326 Low-Resource Speech-to-Text  Translation Sameer Bansal, Herman Kamper, Karen  Livescu, Adam Lopez and Sharon Goldwater
1327 Stochastic Shake-Shake Regularization  for Affective Learning from Speech Che-Wei Huang and Shrikanth Narayanan
1328 An Optimization Based Approach for  Solving Spoken CALL Shared Task Mohammad Ateeq, Abualsoud Hanani and  Aziz Qaroush
1331 Vocalic, Lexical and Prosodic Cues  for the INTERSPEECH 2018 Self-Assessed Affect Challenge Claude Montacié and Marie-José Caraty
1333 Statistical Model Compression for  Small-Footprint Natural Language Understanding Grant Strimel, Kanthashree Mysore  Sathyendra and Stanislav Peshterliev
1336 Automatically measuring L2 speech  fluency without the need of ASR: a proof-of-concept study with Japanese  learners of French Lionel Fontan, Maxime Le Coz and  Sylvain Detey
1339 A GPU-based WFST Decoder with Exact  Lattice Generation Zhehuai Chen, Justin Luitjens, Hainan  Xu, Yiming Wang, Dan Povey and Sanjeev Khudanpur
1342 Adding New Classes Without Access to  the Original Training Data with Applications to Language Identification Hagai Taitelbaum, Ehud Ben-Reuven and  Jacob Goldberger


完整论文列表:

http://interspeech2018.org/accepted-papers.html



-END-

专 · 知

人工智能领域主题知识资料查看与加入专知人工智能服务群

【专知AI服务计划】专知AI知识技术服务会员群加入人工智能领域26个主题知识资料全集获取欢迎微信扫一扫加入专知人工智能知识星球群,获取专业知识教程视频资料和与专家交流咨询


请PC登录www.zhuanzhi.ai或者点击阅读原文,注册登录专知,获取更多AI知识资料

请加专知小助手微信(扫一扫如下二维码添加),加入专知主题群(请备注主题类型:AI、NLP、CV、 KG等)交流~

关注专知公众号,获取人工智能的专业知识!

点击“阅读原文”,使用专知

展开全文
Top
微信扫码咨询专知VIP会员