Voice is one of the most casual modalities for natural and intuitive interactions Between humans as well as between humans and machines. Voice is also a central part of our identity. Voice-based solutions are currently deployed in a growing variety of applications, including person authentication: voice offers a low-cost biometric solution through automatic speaker verification (ASV). A related technology concerns digital cloning of personal voice characteristics for text-to-speech (TTS) and voice conversion (VC). In the last years, the impressive advancements of the VC/TTS field opened the way for numerous new consumer applications. Especially, VC is offering new solutions for privacy protection. However, VC/TTS also brings the possibility of misuse of the technology in order to spoof ASV systems (for example presentation attacks implemented using voice conversion). As a direct consequence, spoofing countermeasures raises a growing interest during the past years. Moreover, voice is also bringing other characteristics on the persons than their identity, which could be extracted with or without the consent of the speaker. This brings up the need to tackle in ASV and VC/TTS not only the technical challenges, but specific ethical considerations, as shown, for example, by the recent General Data Protection Regulation (GDPR).
Speaker Odyssey 2018 workshop took place in Les Sables d’Olonne, France, in June 2018 and grouped about 130 participants. The 55 accepted articles and the three keynotes showed the recent progresses made in terms of speaker modelling, a central topic in all the above topics. After two decades driven by Gaussian mixture modeling (associated more recently with subspace models), deep learning has clearly opened up new horizons. The Voice Conversion Challenge special session and several other sessions about spoofing, spoofing countermeasures and VC/TTS demonstrated the interest to study the interlinks of ASV, VC and TTS. Finally, one of the keynote talks and several presentations illustrated the growing interest of security, privacy and ethics questions.
Building on the success of Speaker Odyssey 2018 Workshop, we invite for this special issue novel research from the following non-exclusive list of topics:
- Speaker modelling and characterization (deep approaches and alternatives)
- Voice conversion and speaker-specific TTS
- Robustness to degraded channels, noise and low-bandwidth speech
- Vulnerability to spoofing attacks and advanced spoofing countermeasures
- Speaker de-identification, disguise, evasion, obfuscation and impersonation
- Beneficial links between ASV, VC, TTS and spoofing/anti-spoofing
- Objective and subjective measures of voice similarity and speech quality
- Speaker template protection and encrypted-domain ASV
- Limits and possibilities of VC and ASV in terms of security and privacy
- Ethics of ASV, VC and TTS and interrelation of technology with GDPR
Probabilistic Graphical Models (PGM) have become the de facto framework for representing and manipulating probabilistic knowledge in the Machine Learning and Artificial Intelligence communities. The specification of the numerical parameters of these models relies either on estimates obtained from data or on subjective knowledge elicited from experts. In either case, such parameters are prone to imprecisions and inaccuracies resulting from noisy, incomplete or scarce data, poor human judgment, unaccounted factors, or a combination of these situations. Such issues can be linked to a lack of model robustness, and addressing them is crucial if such models are to be used to support scientific theories or to automate tasks in a reliable manner.
This special issue aims at collecting papers concerned with achieving robustness in probabilistic graphical models. Topics of interest include, but are not limited to:
- Methods for sensitivity analysis in PGMs
- Empirical analysis of PGM robustness
- Non-trivial robustness issues in PGMs (e.g. adversarial examples)
- Design of inference, learning, or decision making techniques that are robust to perturbations and/or outliers in data, in elicited knowledge, or in model parameters
- Robustness analysis and design of robustness measurements
- Extensions of PGMs that account for imprecision and incompleteness in data and knowledge
- Reliable qualitative learning and reasoning
- Robust treatment of missing data
- Computational complexity of learning, inference and/or decision making with robust models
In the future autonomous robotic systems are expected to be common, not only in factories and on our roads, but in domestic and health-care situations. This new generation of intelligent machines will be required to act autonomously, yet function as part of our society. Societally integrated machines will encounter not just safety issues, but ethical issues as well. There has been a large amount of work in Philosophy on a range of ethical theories. We are also having an enthusiastic media debate on the relevance of having ethical machines and building autonomous systems ethically. This special issue focuses on the challenges of building ethical behaviour into autonomous systems. Key aspects of addressing these changes are explainability and verifiability of the implemented approach, precise and unambiguous formalisation of requirements for ethical behaviour, and special challenges arising from implementing ethical behaviour in systems that have adaptive components, especially learning.
This Special Issue aims to collect together high quality research in this area, combining robot/machine ethics, verification/logic, ethical challenges in machine learning and AI and law.
Topics of interest include, but are not limited to: - new formalisms (logics, algebras, argumentation, case-based reasoning etc) capturing individual and collective ethics, accountability, etc - formal modelling techniques for ethical/moral principles - engineering autonomous systems to incorporate ethical principles - verification and validation of ethical behaviour - mechanisms for ethical choice - explainable ethical behaviour solutions - human-computer interaction solutions related to machine ethics - normative multi-agent systems, including organisations, norms, institutions, and socio-cognitive technical systems - engineering ethics and explainability in machine and deep learning systems - AI and Law
人工智能
Computer Speech and Language
Special issue on Two decades into Speaker Recognition Evaluation - are we there yet?
Automatic speaker recognition is the task of identifying or verifying an individual’s identity from their voice samples using machine learning algorithms, without any human intervention. It has seen significant advancements over the past few decades, giving rise to the successful introduction of commercial products. The earliest paper reporting an investigation into the reliability of sound spectrograms, dubbed as “voiceprint” in analogous to fingerprint, was published in 1970 following a number of over-optimistic claims in the 60s. It was not until 1996 that the U.S. National Institute of Standards and Technology (NIST) began holding regular formal speaker recognition evaluations (SRE). The competitive evaluations provide a common platform and testbed for exploring promising new ideas in speaker recognition, as well as measuring the performance of the latest state of speaker recognition technology. Two decades of systematic and open competitive evaluations have undoubtedly helped provide credible indication of speaker recognition as a reliable and testable technology for person authentication.
With the advent of Big Data and the resurrection of data-hungry modeling techniques such as artificial neural networks, more recently the research focus has shifted from a more controlled scenario towards larger and more realistic speaker in the wild scenarios. The latest cycle of NIST evaluations (SRE’18), which in addition to traditional conversational telephony speech (CTS) involves voice over IP (VOIP) data as well as audio extracted from online videos, serves as a good checkpoint. This special issue aims to compile the latest technical advances and other similar efforts contributing towards such direction.
It is the goal of this special issue to bring together researchers in the speaker recognition and related fields, with the aim of providing the readership of the Elsevier Computer Speech and Language with up-to-date papers on recent advances in evaluations, databases, implementation, algorithms, and theoretical perspectives on the state-of-the-art in speaker recognition. Submissions of comprehensive description and analysis of large-scale implementations for benchmarking and commercial applications, with a focus on perspective of interest to the speaker recognition community, are encouraged. Please contact the Guest Editors if you have any questions about whether your proposed article would fit the scope of this special issue.
Topics of interest include (but are not limited to):
- Performance evaluation metrics
- Large-scale datasets for speaker recognition
- Large-scale implementation of speaker recognition systems
- Speaker embedding, theory and practice
- Domain adaptation in speaker recognition
- Unsupervised calibration
- Speaker recognition for multi-party conversation