语音顶级会议Interspeech2018接受论文列表！

2018 年 6 月 10 日 专知

Interspeech2018

Interspeech 是国际语音通信协会（ISCA）组织的语音领域顶级学术会议，是全球最大的综合性语音信息处理领域的科技盛会。Interspeech会议鼓励语音领域的交叉研究，特别是快速发展的人工智能和机器学习技术在语音领域的研究和应用。Interspeech 2018将于今年9月初在印度海得拉巴召开。

Paper ID	Title	Authors

27	Binaural Speech Intelligibility Estimation Using Deep Neural Networks	Kazuhiro Kondo, Kazuya Taira and Yosuke Kobayashi
34	Real-Time Scoring of an Oral Reading Assessment on Mobile Devices	Jian Cheng
38	Conditional End-to-End Audio Transformations	Albert Haque, Michelle Guo and Prateek Verma
40	Speech recognition for medical conversations	Chung-Cheng Chiu, Anshuman Tripathi, Katherine Chou, Chris Co, Navdeep Jaitly, Diana Jaunzeikare, Anjuli Kannan, Patrick Nguyen, Hasim Sak, Ananth Sankar, Justin Tansuwan, Nathan Wan, Yonghui Wu and Xuedong Zhang
41	Improved Supervised Locality Preserving Projection for I-vector Based Speaker Verification	Lanhua You, Wu Guo, Yan Song and Sheng Zhang
42	Attentive Sequence-to-Sequence Learning for Diacritic Restoration of Yorùbá Language Text	Iroro Orife
43	Frequency domain variants of velvet noise and their application to speech processing and synthesis	Hideki Kawahara, Ken-Ichi Sakakibara, Masanori Morise, Hideki Banno, Tomoki Toda and Toshio Irino
45	A novel normalization method for autocorrelation function for pitch detection and for speech activity detection	Qiguang Lin and Yiwen Shao
46	Dithered Quantization for Frequency-Domain Speech and Audio Coding	Tom Bäckström, Johannes Fischer and sneha das
47	Categorical vs Dimensional Perception of Italian Emotional Speech	Emilia Parada-Cabaleiro, Giovanni Costantini, Anton Batliner, Alice Baird and Björn Schuller
48	Cross-language perception of Mandarin lexical tones by Mongolian-speaking bilinguals in the Inner Mongolia Autonomous Region, China	Kimiko Tsukada and Yu rong
51	The INTERSPEECH 2018 Computational Paralinguistics Challenge: Atypical & Self-Assessed Affect, Crying & Heart Beats	Björn Schuller, Stefan Steidl, Anton Batliner, Peter Marschik, Harald Baumeister, Fengquan Dong, Simone Hantke, Florian Pokorny, Eva-Maria Rathner, Karin Bartl-Pokorny, Christa Einspieler, Dajie Zhang, Alice Baird, Shahin Amiriparian, Kun Qian, Zhao Ren, Maximilian Schmitt, Panagiotis Tzirakis and Stefanos Zafeiriou
52	Acoustic and Textual Data Augmentation for Improved ASR of Code-Switching Speech	Emre Yilmaz, Henk van den Heuvel and David van Leeuwen
57	Investigating the Effect of Audio Duration on Dementia Detection using Acoustic Features	Jochen Weiner, Miguel Angrick, Srinivasan Umesh and Tanja Schultz
60	The Trajectory of Voice Onset Time with Vocal Aging	Chen Xuanda, Xiong Ziyu and Hu Jian
61	Voice Comparison and Rhythm: Behavioral Differences between Target and Non-target Comparisons	Moez Ajili, Jean-Francois Bonastre and Solange Rossato
62	Entity-Aware Language Model as an Unsupervised Reranker	Mohammad Sadegh Rasooli and Sarangarajan Parthasarathy
63	Effects of User Controlled Speech Rate on Intelligibility in Noisy Environments	John Novak and Robert Kenyon
65	The ‘West Yorkshire Regional English Database’: Investigations into the generalizability of reference populations for forensic speaker comparison casework	Erica Gold, Sula Ross and Kate Earnshaw
67	Articulatory Features for ASR of Pathological Speech	Emre Yilmaz, Vikramjit Mitra, Chris Bartels and Horacio Franco
68	Vowel space as a tool to evaluate articulation problems	Rob van Son, Catherine Middag and Kris Demuynck
69	Performance Analysis of the 2017 NIST Language Recognition Evaluation	Seyed Omid Sadjadi, Timothee Kheyrkhah, Craig Greenberg, Douglas Reynolds, Elliot Singer, Lisa Mason and Jaime Hernandez-Cordero
70	Gated Convolutional Neural Network for Sentence Matching	Peixin Chen, Wu Guo, Zhi Chen, Jian Sun and Lanhua You
73	COSMO SylPhon: a model to assess phonological learning	Jean-Luc Schwartz
78	Active Memory Networks for Language Modeling	Oscar Chen, Anton Ragni, Mark Gales and Xie Chen
79	Lattice-free State-level Minimum Bayes Risk Training of Acoustic Models	Naoyuki Kanda, Yusuke Fujita and Kenji Nagamatsu
83	Deep Speech Denoising with Vector Space Projections	Jeffrey Hetherly, Paul Gamble, Maria Alejandra Barrios, Cory Stephenson and Karl Ni
84	What to Expect from Expected Kneser-Ney Smoothing	Michael Levit, Sarangarajan Parthasarathy and Shuangyu Chang
91	Emotional Prosody Perception in Mandarin-speaking Congenital Amusics	Yixin Zhang, Tianzhu Geng and Jinsong Zhang
92	Analysis of Length Normalization in End-to-End Speaker Verification System	Weicheng Cai, Jinkun Chen and Ming Li
97	Overview of the 2018 Spoken CALL Shared Task	Claudia Baur, Andrew Caines, Cathy Chua, Johanna Gerlach, Mengjie Qian, Manny Rayner, Martin Russell, Helmer Strik and Xizi Wei
990	Comparing the Max and Noisy-Or Pooling Functions in Multiple Instance Learning for Weakly Supervised Sequence Learning Tasks	Yun Wang, Juncheng Li and Florian Metze
991	Prediction of Aesthetic Elements in Karnatic Music: A Machine Learning Approach	Ragesh Rajan M, Ashwin Vijayakumar and Deepu Vijayasenan
993	Attentive Statistics Pooling for Deep Speaker Embedding	Koji Okabe, Takafumi Koshinaka and Koichi Shinoda
995	UltraFit: A speaker-friendly headset for ultrasound recordings in speech sciences	Lorenzo Spreafico, Michael Pucher and Anna Matosova
996	Bags in Bag: Generating Context-Aware Bags for Tracking Emotions from Speech	Jing Han, Zixing Zhang, Maximilian Schmitt, Zhao Ren, Fabien Ringeval and Björn Schuller
999	Articulatory-to-speech conversion using bi-directional long short-term memory	Fumiaki Taguchi and Tokihiko Kaburagi
1000	The CSU-K Rule-Based System for the 2nd Edition Spoken CALL Shared Task	Kay Berkling, Cem Philipp Freimoser, Mario Kunstek and Jülg Dominik
1007	Follow-up Question Generation using Pattern-based Seq2seq with a Small Corpus for Interview Coaching	Ming-Hsiang Su, Chung-Hsien Wu, Kun-Yi Huang, Qian-Bei Hong and Huai-Hung Huang
1010	Learning Acoustic Word Embeddings with Temporal Context for Query-by-Example Speech Search	Yougen Yuan, Cheung-Chi Leung, lei xie, Hongjie Chen, Bin Ma and Haizhou Li
1013	Capsule Networks for Low Resource Spoken Language Understanding	Vincent Renkens and Hugo Van hamme
1015	Learning Discriminative Features for Speaker Identification and Verification	Sarthak Yadav and Atul Rai
1016	LSTM based Attentive Fusion of Spectral and Prosodic Information for Keyword Spotting in Hindi Language	Laxmi Pandey and Karan Nathwani
1018	Detection of glottal closure instants in degraded speech using single frequency filtering analysis	Gunnam Aneeja, Sudarsana Reddy Kadiri and Bayya Yegnanarayana
1019	Annotator Trustability-based Cooperative Learning Solutions for Intelligent Audio Analysis	Simone Hantke, Christoph Stemp and Björn Schuller
1020	Deep Noise Tracking Network: A Hybrid Signal Processing/Deep Learning Approach to Speech Enhancement	Shuai Nie, Shan Liang, Bin Liu, Yaping Zhang, Wenju Liu and Jianhua Tao
1021	Unsupervised and Efficient Vocabulary Expansion for Recurrent Neural Network Language Models in ASR	Yerbolat Khassanov and Eng Siong Chng
1023	MTGAN: Speaker Verification through Multitasking Triplet Generative Adversarial Networks	Wenhao Ding and Liang HE
1024	Effective acoustic cue learning is not just statistical, it is discriminative	Jessie S. Nixon
1025	Compression of End-to-End Models	Ruoming Pang, Tara Sainath, Rohit Prabhavalkar, Suyog Gupta, Yonghui Wu, Shuyuan Zhang and Chung-Cheng Chiu
1026	Postfiltering with Complex Spectral Correlations for Speech and Audio Coding	sneha das and Tom Bäckström
1027	Postfiltering Using Log-Magnitude Spectrum for Speech and Audio Coding	sneha das and Tom Bäckström
1030	Improving Attention Based Sequence-to-Sequence Models for End-to-End English Conversational Speech Recognition	Chao Weng, Jia Cui, Guangsen Wang, Jun Wang, Chengzhu Yu, Dan Su and Dong Yu
1032	Discriminating between nasals and approximants in English language using zero time windowing	RaviShankar Prasad, Sudarsana Reddy Kadiri, Suryakanth V Gangashetty and Bayya Yegnanarayana
1034	Scalable Factorized Hierarchical Variational Autoencoder Training	Wei-Ning Hsu and James Glass
1035	Contextual Slot Carryover for Disparate Schemas	Chetan Naik, Arpit Gupta, Hancheng Ge, Mathias Lambert and Ruhi Sarikaya
1037	Stream Attention for Distributed Multi-Microphone Speech Recognition	Xiaofei Wang, Ruizhi Li and Hynek Hermansky
1038	Articulatory consequences of vocal effort elicitation method	Elisabet Eir Cortes, Marcin Wlodarczak and Juraj Šimko
1039	Cross-Lingual Multi-Task Neural Architecture for Spoken Language Understanding	Yujiang Li, Xuemin Zhao, Weiqun Xu and Yonghong Yan
1042	Spoofing Detection Using Adaptive Weighting Framework and Clustering Analysis	Yuanjun Zhao, Roberto Togneri and Victor Sreeram
1043	Designing a Pneumatic Bionic Voice Prosthesis - Statistical Approach for Source Excitation Generation	Farzaneh Ahmadi and Tomoki Toda
1044	Training Utterance-level Embedding Networks for Speaker Identification and Verification	Heewoong Park, Sukhyun Cho, Kyubyong Park, Namju Kim and Jonghun Park
1046	Bone-Conduction Sensor Assisted Noise Estimation for Improved Speech Enhancement	Ching-Hua Lee, Bhaskar D. Rao and Harinath Garudadri
1047	Comparison of Syllabification Algorithms and Training Strategies for Robust Word Count Estimation across Different Languages and Recording Conditions	Okko Räsänen, Seshadri Shreyas and Marisa Casillas
1049	Acoustic Modeling with DFSMN-CTC and Joint CTC-CE Learning	ShiLiang Zhang and Ming Lei
1054	Towards a better characterization of Parkinsonian speech: a multidimensional acoustic study	Veronique Delvaux, kathy Huet, Myriam Piccaluga, Sophie Van Malderen and Bernard Harmegnies
1055	Low-Latency Neural Speech Translation	Jan Niehues, Ngoc-Quan Pham, Thanh-Le Ha, Matthias Sperber and Alex Waibel
1057	Structured Word Embedding for Low Memory Neural Network Language Model	Kaiyu Shi and Kai Yu
1058	An End-to-End Text-Independent Speaker Identification System on Short Utterances	Ruifang Ji, Xinyuan Cai and Xu Bo
1059	Dysarthric speech classification using glottal features computed from non-words, words and sentences	Narendra N P and Paavo Alku
1060	Length contrast and covarying features: Whistled speech as a case study	Rachid Ridouane, Giuseppina Turco and Julien Meyer
1062	On the Usefulness of the Speech Phase Spectrum for Pitch Extraction	Erfan Loweimi, Jon Barker and Thomas Hain
1063	Semi-supervised Cross-domain Visual Feature Learning for Audio-Visual Broadcast Speech Transcription	Rongfeng Su, Xunying Liu and Lan Wang
1065	Regional variation of /r/ in Swiss German dialects	Adrian Leemann, Stephan Schmid, Dieter Studer-Joho and Marie-José Kolly
1070	i-Vectors in Language Modeling: An Efficient Way of Domain Adaptation for Feed-Forward Models	Karel Beneš, Santosh Kesiraju and Lukáš Burget
1074	Structural effects on properties of consonantal gestures in Tashlhiyt	Anne Hermes, Doris Mücke, Bastian Auris and Rachid Ridouane
1076	General Utterance-Level Feature Extraction for Classifying Crying Sounds, Atypical & Self-Assessed Affect and Heart Beats	Gábor Gosztolya, Tamás Grósz and László Tóth
1078	Multi-Task Learning of Speech Recognition and Speech Synthesis Parameters for Ultrasound-based Silent Speech Interfaces	László Tóth, Gábor Gosztolya, Tamás Grósz, Alexandra Markó and Tamás Gábor Csapó
1079	Identifying Schizophrenia Based on Temporal Parameters in Spontaneous Speech	Gábor Gosztolya, Anita Bagi, Szilvia Szalóki, István Szendi and Ildikó Hoffmann
1080	Implementation of Respiration in Articulatory Synthesis Using a Pressure-Volume Lung Model	Keisuke Tanihara, Shogo Yonekura and Yasuo Kuniyoshi
1081	Exploiting Speaker and Phonetic Diversity of Mismatched Language Resources for Unsupervised Subword Modeling	Siyuan Feng and Tan Lee
1085	Automatic Speech Recognition System Development in the "Wild"	Anton Ragni and Mark Gales
1086	Extending Recurrent Neural Aligner for Streaming End-to-End Speech Recognition in Mandarin	Linhao Dong, Shiyu Zhou, Wei Chen and Bo Xu
1087	A deep learning approach to assessing non-native pronunciation of English using phone distances	Konstantinos Kyriakopoulos, Kate Knill and Mark Gales
1088	The Conversation Continues: The Effect of Lyrics and Music Complexity of Background Music on Spoken-Word Recognition	Odette Scharenborg and Martha Larson
1089	Acoustic Modeling with Densely Connected Residual Network for Multichannel Speech Recognition	Jian Tang, Yan Song, Lirong Dai and Ian McLoughlin
1093	The Perception and Analysis of the Likeability and Human Likeness of Synthesized Speech	Alice Baird, Emilia Parada-Cabaleiro, Simone Hantke, Felix Burkhardt, Nicholas Cummins and Björn Schuller
1096	Punctuation Prediction Model for Conversational Speech	Piotr Żelasko, Piotr Szymański, Jan Mizgajski, Adrian Szymczak, Yishay Carmiel and Najim Dehak
1097	Unsupervised Adaptation with Interpretable Disentangled Representations for Distant Conversational Speech Recognition	Wei-Ning Hsu, Hao Tang and James Glass
1098	Detecting Packet-Loss Concealment Using Formant Features and Decision Tree Learning	Gabriel Mittag and Sebastian Möller
1099	The Role of Cognate Words, POS Tags, and Entrainment in Code-Switching	Victor Soto, Nishi Cestero and Julia Hirschberg
1100	Play Duration based User-Entity Affinity Modeling in Spoken Dialog System	Bo Xiao, Nicholas Monath, Shankar Ananthakrishnan and Abishek Ravi
1102	Analysis of Complementary Information Sources in the Speaker Embeddings Framework	Mahesh Kumar Nandwana, Mitchell McLaren, Diego Castan, Julien van Hout and Aaron Lawson
1103	Double Joint Bayesian Modeling of DNN Local I-Vector for Text Dependent Speaker Verification with Random Digit Strings	Ziqiang Shi, Huibin Lin, Liu Liu and Rujie Liu
1105	Estimation of the Vocal Tract Length of Vowel Sounds based on the Frequency of the Significant Spectral Valley	TV Ananthapadmanabha and Ramakrishnan AngaraiGanesan
1107	Syllable-Based Sequence-to-Sequence Speech Recognition with the Transformer in Mandarin Chinese	Shiyu Zhou, Dong Linhao, Shuang Xu and Bo Xu
1108	Tongue Segmentation with Geometrically Constrained Snake Model	Zhihua Su, Jianguo Wei, Qiang Fang, Jianrong Wang and Kiyoshi Honda
1110	L2-ARCTIC: a non-native English speech corpus	Guanlong Zhao, Sinem Sonsaat, Alif Silpachai, Ivana Lucic, Evgeny Chukharev-Hudilainen, John Levis and Ricardo Gutierrez-Osuna
1111	Improving Language Modeling with an Adversarial Critic for Automatic Speech Recognition	Yike Zhang, pengyuan zhang and Yonghong Yan
1113	Expressive Speech Synthesis via Modeling Expressions with Variational Autoencoder	Kei Akuzawa, Yusuke Iwasawa and Yutaka Matsuo
1114	A Deep Neural Network Based Harmonic Noise Model for Speech Enhancement	Zhiheng Ouyang, Hongjiang Yu, Wei-Ping Zhu and Benoit Champagne
1115	A comparison of input types to a deep neural network-based forced aligner	Matthew C. Kelley and Benjamin V. Tucker
1120	Multiple Instance Deep Learning for Weakly Supervised Small-Footprint Audio Event Detection	Shao-Yen Tseng, Juncheng Li, Yun Wang, Florian Metze, Joseph Szurley and Samarjit Das
1121	Voice Conversion with Conditional SampleRNN	Cong Zhou, Michael Horgan, Vivek Kumar, Cristina Vasco and Dan Darcy
1122	Contextual Language Model Adaptation for Conversational Agents	Anirudh Raju, Behnam Hedayatnia, Linda Liu, Ankur Gandhe, Chandra Khatri, Angeliki Metallinou, Anu Venkatesh and Ariya Rastrow
1124	Improved ASR for under-resourced languages through Multi-task Learning with Acoustic Landmarks	Di He, Boon Pang Lim, Xuesong Yang, Mark Hasegawa-Johnson and Deming Chen
1125	Self-similarity matrix based intelligibility assessment of cleft lip and palate speech	Sishir Kalita, S R Mahadeva Prasanna and Samarendra Dandapat
1126	Formant measures of vowels adjacent to alveolar and retroflex consonants in Arrernte: stressed and unstressed position	Marija Tabain, Richard Beare and Andrew Butcher
1128	Linear Prediction Residual based Short-term Cepstral Features for Replay Attacks Detection	Madhusudan Singh and Debadatta Pati
1130	Dialect-geographical Acoustic-Tonetics: five disyllabic tone sandhi patterns in cognate words from the Wu dialects of Zhèjiāng province	Phil Rose
1131	A Voice Conversion Framework with Tandem Feature Sparse Representation and Speaker-Adapted WaveNet Vocoder	Berrak Sisman, Mingyang Zhang and Haizhou Li
1132	EMOTION RECOGNITION FROM HUMAN SPEECH USING TEMPORAL INFORMATION AND DEEP LEARNING	John Kim and Rif A. Saurous
1134	Bidirectional Long-Short Term Memory Network-based Estimation of Reliable Spectral Component Locations	Aaron Nicolson and Kuldip K. Paliwal
1135	Joint Noise and Reverberation Adaptive Learning for Robust Speaker DOA Estimation with An Acoustic Vector Sensor	Disong Wang and Yuexian Zou
1138	Multi-modal attention mechanisms in LSTM and its application to acoustic scene classification	Zhang Teng, Kailai Zhang and Ji Wu
1139	Rapid Collection of Spontaneous Speech Corpora using Telephonic Community Forums	Agha Ali Raza, Awais Athar, Shan Randhawa, Zain Tariq, Muhammad Bilal Saleem, Haris Bin Zia, Umar Saif and Roni Rosenfeld
1140	Monoaural Audio Source Separation using Variational Autoencoders	Laxmi Pandey, Anurendra Kumar and Vinay Namboodiri
1143	Deep learning techniques for koala activity detection	Ivan Himawan, Michael Towsey, Bradley Law and Paul Roe
1147	Glottal Closure Instant Detection from Speech Signal Using Voting Classifier and Recursive Feature Elimination	Jindrich Matousek and Daniel Tihelka
1149	User Information Augmented Semantic Frame Parsing using Progressive Neural Networks	Yilin Shen, Xiangyu Zeng, Yu Wang and Hongxia Jin
1150	A Shifted Delta Coefficient Objective for Monaural Speech Separation using Multi-task Learning	Chenglin Xu, Wei Rao, Eng Siong Chng and Haizhou Li
1151	Joint Learning using Denoising Variational Autoencoders for Voice Activity Detection	Youngmoon Jung, Younggwan Kim, Yeunju Choi and Hoirin Kim
1152	Temporal transformer networks for acoustic scene classification	Zhang Teng, Kailai Zhang and Ji Wu
1153	State Gradients for RNN Memory Analysis	Lyan Verwimp, Hugo Van hamme, Vincent Renkens and Patrick Wambacq
1154	Waveform-Based Speaker Representations for Speech Synthesis	Moquan Wan, Gilles Degottex and Mark Gales
1156	Leveraging Second-Order Log-Linear model for improved deep learning based ASR performance	Ankit Raj, Shakti Rath and Jithendra Vepa
1158	Self-Attentive Speaker Embeddings for Text-Independent Speaker Verification	Yingke Zhu, Tom Ko, David Snyder, Brian Mak and Dan Povey
1159	Word Emphasis Prediction for Expressive Text to Speech	Yosi Mass, Slava Shechtman, Moran Mordechay, Ron Hoory, Oren Sar Shalom, Guy Lev and David Konopnicki
1160	Forward-Backward Attention Decoder	Masato Mimura, Shinsuke Sakai and Tatsuya Kawahara
1162	Active Learning for LF-MMI Trained Neural Networks in ASR	Yanhua Long, Hong Ye, Yijie Li and Jiaen Liang
1165	Using Deep Neural Networks for Identification of Slavic Languages from Acoustic Signal	Lukas Mateju, Petr Cerva, Jindrich Zdansky and Radek Safarik
1171	Homophone Identification and Merging for Code-switched Speech Recognition	Brij Mohan Lal Srivastava and Sunayana Sitaram
1173	Improved Epoch Extraction from Telephonic Speech using Chebfun and Zero Frequency Filtering	Ganga Gowri B, Soman K.P and Govind D
1174	Using pupillometry to measure the cognitive load of synthetic speech	Avashna Govender and Simon King
1176	Resyllabification in Indian Languages and its Implications in Text-to-speech Systems	Mahesh M, Jeena JPrakash and Hema Murthy
1178	Code-switching in Indic Speech Synthesisers	Anju Leela Thomas, Anusha Prakash, Arun Baby and Hema Murthy
1182	Improving Cross-Lingual Knowledge Transferability Using Multilingual TDNN-BLSTM with Language-Dependent Pre-Final Layer	Siyuan Feng and Tan Lee
1185	GlobalTIMIT: Acoustic-Phonetic Datasets for the World’s Languages	Nattanun Chanchaochai, Christopher Cieri, Japhet Debrah, Hongwei Ding, Yue Jiang, Sishi Liao, Mark Liberman, Jonathan Wright, Jiahong Yuan, Juhong Zhan and Yuqing Zhan
1188	Transcription correction for Indian languages using acoustic signatures	Jeena JPrakash, Golda Brunet Rajan and Hema Murthy
1190	WaveNet Vocoder with Limited Training Data for Voice Conversion	Li-Juan Liu, Zhen-Hua Ling, Yuan Jiang, Ming Zhou and Li-Rong Dai
1198	Learning and Modeling Unit Embeddings for Improving HMM-based Unit Selection Speech Synthesis	Xiao Zhou, Zhen-Hua Ling, Zhi-Ping Zhou and Li-Rong Dai
1199	Measuring the cognitive load of synthetic speech using a dual task paradigm	Avashna Govender and Simon King
1202	Phoneme-to-Articulatory mapping using bidirectional gated RNN	Théo Biasutto--Lervat and Slim Ouni
1203	Information Bottleneck based Percussion Instrument Diarization System for Taniavartanam Segments of Carnatic Music Concerts	Nauman Dawalatabad, Jom Kuriakose, Chandra Sekhar Chellu and Hema Murthy
1204	Compact Feedforward Sequential Memory Networks for Small-footprint Keyword Spotting	Mengzhe Chen, ShiLiang Zhang, Ming Lei, Yong Liu, Haitao Yao and Jie Gao
1205	Deep Extractor Network for Target Speaker Recovery From Single Channel Speech Mixtures	Jun Wang, Jie Chen, Dan Su, Lianwu Chen, Meng Yu, Yanmin Qian and Dong Yu
1209	Triplet loss based cosine similarity metric learning for text-independent speaker recognition	Sergey Novoselov, Vadim Shchemelinin, Andrey Shulipa, Alexandr Kozlov and Ivan Kremnev
1210	Collapsed speech segment detection and suppression for WaveNet vocoder	YICHIAO WU, Kazuhiro Kobayashi, Tomoki Hayashi, Patrick Lumban Tobing and Tomoki Toda
1211	Data augmentation improves recognition of foreign accented speech	Takashi Fukuda, Raul Fernandez, Andrew Rosenberg, Samuel Thomas, Bhuvana Ramabhadran, Alexander Sorin and Gakuto Kurata
1212	Segmental Encoder-Decoder Models for Large Vocabulary Automatic Speech Recognition	Eugen Beck, Mirko Hannemann, Patrick Dötsch, Ralf Schlüter and Hermann Ney
1214	Exploration of Local Speaking Rate Variations in Mandarin Read Speech	Guan-Ting Liou, Chen-Yu CHIANG, Yih-Ru Wang and Sin-Horng Chen
1222	An Active Feature Transformation Method For Attitude Recognition of Video Bloggers	Fasih Haider, Fahim A. Salim, Owen Conlan and Saturnino Luz
1223	A New Framework for Supervised Speech Enhancement in the Time Domain	Ashutosh Pandey and Deliang Wang
1224	Singing voice phoneme segmentation by hierarchically inferring syllable and phoneme onset positions	Rong Gong and Xavier Serra
1225	Vowels and Diphthongs in Hangzhou Wu Chinese Dialect	Yang Yue and Fang Hu
1226	Speaker Embedding Extraction with Phonetic Information	Yi Liu, Liang He, Jia Liu and Michael T. Johnson
1227	Investigating accuracy of pitch-accent annotations in neural network-based speech synthesis and denoising effects	Hieu-Thi Luong, Xin Wang, Junichi Yamagishi and Nobuyuki Nishizawa
1230	Time-regularized linear prediction for noise-robust extraction of the spectral envelope of speech	Manu Airaksinen, Lauri Juvela, Okko Räsänen and Paavo Alku
1232	S4D: Speaker Diarization Toolkit in Python	Pierre-Alexandre Broux, Florent Desnous, Anthony Larcher, Simon Petitrenaud, Jean Carrive and Sylvain Meignier
1233	Age-related effects on sensorimotor control of speech production	Anne Hermes, Jane Mertens and Doris Mücke
1234	Single-channel Speech Dereverberation via Generative Adversarial Training	Chenxing Li, Tieqiang Wang, Shuang Xu and Bo Xu
1237	Biophysically-inspired features improve the generalizability of neural network-based speech enhancement systems	Deepak Baby and Sarah Verhulst
1238	Deep Learning in Paralinguistic Recognition Tasks: Are Hand-crafted Features Still Relevant?	Johannes Wagner, Dominik Schiller, Andreas Seiderer and Elisabeth André
1239	Naturalness Improvement Algorithm for Reconstructed Glossectomy Patient's Speech Using Spectral Differential Modification in Voice Conversion	Hiroki Murakami, Sunao Hara, Masanobu Abe, Masaaki Sato and Shogo Minagi
1240	On Learning to Identify Genders from Raw Speech Signal using CNNs	Selen Hande Kabil, Hannah Muckenhirn and Mathew Magimai Doss
1241	Neural Language Codes for Multilingual Acoustic Models	Markus Müller, Sebastian Stüker and Alex Waibel
1242	An Attention Pooling based Representation Learning Method for Speech Emotion Recognition	Pengcheng Li, Yan Song, Ian McLoughlin, Wu Guo and Lirong Dai
1243	Unsupervised Temporal Feature Learning Based on Sparse Coding Embedded BoAW for Acoustic Event Recognition	Liwen Zhang
1244	Learning to adapt: a meta-learning approach for speaker adaptation	Ondrej Klejch, Joachim Fainberg, Peter Bell and Steve Renals
1245	Weighting Pitch Contour and Loudness Contour in Mandarin Tone Perception in Cochlear Implant Listeners	Qinglin Meng, Nengheng Zheng, Ambika Prasad Mishra, Jacinta Dan Luo and Jan W. H. Schnupp
1246	Co-whitening of i-vectors for short and long duration speaker verification	Longting Xu, Kong Aik Lee, Haizhou Li and Zhen Yang
1247	Training Augmentation using Adversarial Examples for Robust Speech Recognition	Sining Sun, Ching-Feng Yeh, Mari Ostendorf, Mei-Yuh Hwang and Lei Xie
1248	Multiple Concurrent Sound Source Tracking Based on Observation-Guided Adaptive Particle Filter	Hong Liu, haipeng lan, Bing Yang and Cheng Pang
1250	Data independent sequence augmentation method for acoustic scene classification	Zhang Teng, Kailai Zhang and Ji Wu
1251	Pitch-Adaptive Front-end Feature for Hypernasality Detection	Akhilesh Dubey, S R Mahadeva Prasanna and Samarendra Dandapat
1252	ZCU-NTIS Speaker Diarization System for the DIHARD 2018 Challenge	Zbynek Zajic, Marie Kunesova, Jan Zelinka and Marek Hrúz
1254	A first investigation of the timing of turn-taking in Ruuli	Tuarik Buanzur, Margaret Zellers, Saudah Namyalo and Alena Witzlack-Makarevich
1256	Exploring temporal reduction in dialectal Spanish: a large-scale study of lenition of voiced stops and coda-s	Ioana Vasilescu, Nidia Hernandez, Bianca Vieru and Lori Lamel
1258	Nebula: F0 Estimation and Voicing Detection by Modeling the Statistical Properties of Feature Extractors	Kanru Hua
1259	A Novel Approach for Effective Recognition of the Code-Switched Data on Monolingual Language Model	Sreeram Ganji and Rohit Sinha
1262	Building state-of-the-art distant speech recognition using the CHiME-4 challenge with a setup of speech enhancement baseline	Szu-Jui Chen, Aswin Shanmugam Subramanian, Hainan Xu and Shinji Watanabe
1264	Perceptual and automatic evaluations of the intelligibility of speech degraded by noise induced hearing loss simulation	Imed Laaridh, Julien Tardieu, Cynthia Magnen, Pascal Gaillard, Jérôme Farinas and Julien Pinquier
1265	Transfer Learning based Progressive Neural Networks for Acoustic Modeling in Statistical Parametric Speech Synthesis	Ruibo Fu, Jianhua Tao, Yibin Zheng and Zhengqi Wen
1266	Automatic Evaluation of Speech Intelligibility based on i-vectors in the context of Head and Neck Cancers	Imed Laaridh, Corinne Fredouille, Alain Ghio, muriel lalain and Virginie Woisard
1267	Automatic Pronunciation Evaluation of Singing	Chitralekha Gupta, Haizhou Li and Ye Wang
1269	Joint Localization and Classification of Multiple Sound Sources Using a Multi-task Neural Network	Weipeng He, Petr Motlicek and Jean-Marc Odobez
1270	Paired Phone-Posteriors Approach to ESL Pronunciation Quality Assessment	Yujia Xiao, Frank Soong and Wenping Hu
1271	Phoneme Resistance and Phoneme Confusion in Noise: Impact of Dyslexia	Noelia Do Carmo Blanco, Julien Meyer, Michel Hoen and Fanny Meunier
1272	Improving Sparse Representations in Exemplar-Based Voice Conversion with a Phoneme-Selective Objective Function	Shaojin Ding, Guanlong Zhao, Christopher Liberatore and Ricardo Gutierrez-Osuna
1280	A Generalization of PLDA for Joint Modeling of Speaker Identity and Multiple Nuisance Conditions	Luciana Ferrer and Mitchell McLaren
1281	Detection of Glottal Closure Instants from Speech Signals: A Convolutional Neural Network Based Method	Shuai Yang, Zhiyong Wu, Binbin Shen and Helen Meng
1283	Topic and Keyword Identification for Low-resourced Speech Using Cross-Language Transfer Learning	Wenda Chen, Mark Hasegawa-Johnson and Nancy Chen
1284	Should code-switching models be asymmetric?	Barbara Bullock, Wally Guzman, Jacqueline Serigos and Almeida Jacqueline Toribio
1285	Visual timing information in audiovisual speech perception: evidence from lexical tone contour	Hui Xie, Biao Zeng and Rui Wang
1286	A Weighted Superposition of Functional Contours model for modelling contextual prominence of elementary prosodic contours	Branislav Gerazov, gerard bailly and Yi Xu
1288	An Interlocutor-Modulated Attentional LSTM for Differentiating between Subgroups of Autism Spectrum Disorder	Yun-Shao Lin, Susan Shur-Fen Gau and Chi-Chun Lee
1291	Multi-resolution gammachirp envelope distortion index for intelligibility prediction of noisy speech	Katsuhiko Yamamoto, Toshio Irino, Narumi Ohashi, Shoko Araki, Keisuke Kinoshita and Tomohiro Nakatani
1293	A Case Study on the Importance of Belief State Representation for Dialogue Policy Management	Margarita Kotti, Vassilios Diakoloukas, Alexandros Papangelis, Michail Lagoudakis and Yannis Stylianou
1294	SPEECH ENHANCEMENT USING THE MINIMUM-PROBABILITY-OF-ERROR CRITERION	Jishnu Sadasivan, Subhadip Mukherjee and Chandra Sekhar Seelamantula
1295	Learning Structured Dictionaries for Exemplar-based Voice Conversion	Shaojin Ding, Christopher Liberatore and Ricardo Gutierrez-Osuna
1296	Single-Channel Dereverberation Using Direct MMSE Optimization and Bidirectional LSTM Networks	Wolfgang Mack, Soumitro Chakrabarty, Fabian-Robert Stöter, Sebastian Braun, Bernd Edler and Emanuël Habets
1297	Exploration of Compressed ILPR Features for Replay Attack Detection	Sarfaraz Jelil, Sishir Kalita, S R Mahadeva Prasanna and Rohit Sinha
1298	Learning Conditional Acoustic Latent Representation with Gender and Age Attributes for Automatic Pain Level Recognition	Jeng-Lin Li, Yi-Ming Weng, Chip-Jin Ng and Chi-Chun Lee
1299	A Compact and Discriminative Feature based on Auditory Summary Statistics for Acoustic Scene Classification	Hongwei Song, Jiqing Han and Shiwen Deng
1301	Multi-channel Attention for End-to-End Speech Recognition	Stefan Braun, Daniel Neil, Jithendar Anumula, Enea Ceolini and Shih-Chii Liu
1302	BUT system for low resource Indian language ASR	Bhargav Pulugundla, Murali Karthick Baskar, Santosh Kesiraju, Ekaterina Egorova, Martin Karafiat, Lukas Burget and Jan Černocký
1305	Deep Metric Learning for the Target Cost in Unit-Selection Speech Synthesizer	Ruibo Fu, Jianhua Tao, Yibin Zheng and Zhengqi Wen
1306	Acoustic-dependent phonemic transcription for text-to-speech synthesis	Kévin Vythelingum, Yannick Estève and Olivier Rosec
1308	Unsupervised Word Segmentation from Speech with Attention	Pierre Godard, Marcely Zanon Boito, Lucas Ondel, Alexandre Berard, François Yvon, Aline Villavicencio and Laurent Besacier
1309	Liulishuo's System for the Spoken CALL Shared Task 2018	Huy Nguyen, Lei Chen, Ramon Prieto, Chuan Wang and Yang Liu
1310	Harmonic-Percussive Source Separation of Polyphonic Music by Suppressing Impulsive Noise Events	Gurunath Reddy M, K Sreenivasa Rao and Partha Pratim Das
1312	Impact of ASR Performance on Free Speaking Language Assessment	Kate Knill, Mark Gales, Konstantinos Kyriakopoulos, Andrey Malinin, Anton Ragni, Yu Wang and Andrew Caines
1313	A Comparison of Speaker-based and Utterance-based Data Selection for Text-to-Speech Synthesis	Kai-Zhan Lee, Erica Cooper and Julia Hirschberg
1316	Data requirements, selection and augmentation for DNN-based speech synthesis from crowdsourced data	Markus Toman, Geoffrey Meltzner and Rupal Patel
1318	Semi-supervised learning for information extraction from dialogue	Anjuli Kannan, Kai Chen, Alvin Rajkomar and Diana Jaunzeikare
1319	Anomaly Detection Approach for Pronunciation Verification of Disordered Speech using Speech Attribute Features	Mostafa Shahin, Beena Ahmed, Jim Ji and Kirrie Ballard
1320	Prosodic Focus Acquisition in French Early Cochlear Implanted Children	Chadi Farah, Stephane Roman and Mariapaola D'Imperio
1326	Low-Resource Speech-to-Text Translation	Sameer Bansal, Herman Kamper, Karen Livescu, Adam Lopez and Sharon Goldwater
1327	Stochastic Shake-Shake Regularization for Affective Learning from Speech	Che-Wei Huang and Shrikanth Narayanan
1328	An Optimization Based Approach for Solving Spoken CALL Shared Task	Mohammad Ateeq, Abualsoud Hanani and Aziz Qaroush
1331	Vocalic, Lexical and Prosodic Cues for the INTERSPEECH 2018 Self-Assessed Affect Challenge	Claude Montacié and Marie-José Caraty
1333	Statistical Model Compression for Small-Footprint Natural Language Understanding	Grant Strimel, Kanthashree Mysore Sathyendra and Stanislav Peshterliev
1336	Automatically measuring L2 speech fluency without the need of ASR: a proof-of-concept study with Japanese learners of French	Lionel Fontan, Maxime Le Coz and Sylvain Detey
1339	A GPU-based WFST Decoder with Exact Lattice Generation	Zhehuai Chen, Justin Luitjens, Hainan Xu, Yiming Wang, Dan Povey and Sanjeev Khudanpur
1342	Adding New Classes Without Access to the Original Training Data with Applications to Language Identification	Hagai Taitelbaum, Ehud Ben-Reuven and Jacob Goldberger