Finding the WHO, WHAT & WHEN in audio
Oxford Wave Research (OWR) is a leading R&D company based in Oxford, UK, specialising in audio and speech processing, voice biometrics and deep learning-related product development. Our team has many years of experience developing solutions for law enforcement, military as well as other agencies both in the UK and around the world.
Oxford Wave Research publications at ODYSSEY 2020
Two of our publications at the ODYSSEY 2020 Speaker and Language Recognition Workshop
Two of our collaborative papers, one on voice spoofing detection, and the other on the effects of device variability on forensic speaker comparison, are appearing at this week’s virtual ODYSSEY 2020 Speaker and Language Recognition Workshop. Video presentations for both papers are now available on the workshop website: http://www.odyssey2020.org/
The full papers, along with the rest of the conference proceedings, can be found at: https://www.isca-speech.org/archive/Odyssey_2020/
In our paper with Bence Halpern (PhD student, University of Amsterdam), “Residual networks for resisting noise: analysis of an embeddings-based spoofing countermeasure,” we propose a new embeddings-based method of spoofed speech detection using Constant Q-Transform (CQT) features and a Dilated ResNet Deep Neural Network (DNN) architecture. The novel CQT-GMM-DNN approach, which uses the DNN embeddings with a Gaussian Mixed Model (GMM) classifier, performs favourably compared to the baseline system in both clean and noisy conditions. We also present some ‘explainable audio’ results, which provide insight into the information the DNN exploits for decision-making. This study shows that reliable detection of spoofed speech is increasingly possible, even in the presence of noise.
See a blog post from Bence (including some explainable audio examples) here: https://karkirowle.github.io/publication/odyssey-2020
In our paper with David van der Vloed (from the Netherlands Forensic Institute), “Exploring the effects of device variability on forensic speaker comparison using VOCALISE and NFI-FRIDA, a forensically realistic database,” we investigate the effect of recording device mismatch on forensic speaker comparison with VOCALISE. Using the forensically-realistic NFI-FRIDA database, consisting of speech simultaneously-recorded on multiple devices (e.g. close-mic, far-mic, and telephone intercept, as seen in the data collection image), we demonstrate that while optimal performance is achieved by matching the relevant population recording device to the case data recording device, it is not necessary to match the precise device; broadly matching the device type is sufficient. This study presents a research methodology for how a forensic practitioner can corroborate their subjective judgment of the ‘representativeness’ of the relevant population in forensic speaker comparison casework.
Wish we were in Japan instead of home for this one, but an exciting conference no less (ODYSSEY 2020)! Delighted that we have a couple of collaborative papers on voice spoofing detection & the effects of device variability on forensic speaker comparisons.https://t.co/GiBRJzUUY7 pic.twitter.com/NQ6QoV03fJ— Oxford Wave Research (@OxfordWave) November 4, 2020
Do face coverings affect identifying voices?
Vlog: Do face coverings affect identifying voices?
A small experiment using VOCALISE and PHONATE
In these recent months of 2020, like many others around the world, we have found ourselves adjusting to the new normal of wearing masks in various places like supermarkets and other public spaces. We found ourselves (minorly) annoyed that some biometric identification, like face recognition, doesn't quite work when wearing masks. This made us wonder how well voice biometric solutions could work when speakers are wearing masks, and we decided to perform a small experiment to analyse this.
Over the last few weeks, we have been performing some small-scale tests of our VOCALISE and PHONATE software against speech spoken from behind a mask. We have found our systems to be quite robust to masked speech - they are able to recognise speakers across different mask-wearing conditions well.
The video below explains our experiment and discusses our findings. We hope that you find it interesting![embed]https://www.youtube.com/watch?v=NUSD-TWTCQY&feature=youtu.be[/embed] Download Transcript
Speech Communication journal publication on voice similarity – joint work by Cambridge University and Oxford Wave Research
Exploring the relationship between voice similarity estimates by listeners and by an automatic speaker recognition system incorporating phonetic features
We are happy to announce that our latest paper has been accepted for publication in the prestigious 'Speech Communication' journal. This represents joint work between Cambridge University's 'Faculty of Modern and Medieval Languages and Linguistics' and Oxford Wave Research (OWR).
This paper is titled 'Exploring the relationship between voice similarity estimates by listeners and by an automatic speaker recognition system incorporating phonetic features' and is authored by Linda Gerlach (OWR, Cambridge), Dr Kirsty McDougall (Cambridge), Dr Finnian Kelly (OWR), Dr Anil Alexander (OWR), Prof. Francis Nolan (Cambridge).
Similar-sounding voices is of interest in many areas, be it for voice parades in a forensic setting, voice casting for film-dubbing or voice banking to save one's voice for future synthesis in case of a degenerative disease. However, it is a very time-consuming and expensive task. With the aim of finding an objective method that could speed up the process, we considered an automatic approach to rate voice similarity and explored the relationship between voice similarity ratings made by a total of 106 human listeners – some of whom may have been you – and comparison scores produced by an i-vector-based automatic speaker recognition system that extracts perceptually-relevant phonetic features. Our results showed a significant positive correlation between human and machine, motivating us to continue our developments in this space.
The main highlights of this work are that human judgements of voice similarity are seen to correlate with automatic speaker recognition assessments (using auto-phonetic features) (this trend was seen with both English and German speakers’ judgements of English voices). These automatic speaker recognition assessments therefore show potential for automatically selecting foil voices for voice parades.
This paper is based on Linda's Gerlach's master's thesis work (University of Marburg, Germany) at Oxford Wave Research last year and uses the phonetic mode of VOCALISE speaker recognition software.
The full paper is available for free download on the Journal's webpage. Please check the following link for the full abstract and paper, available for free using this link before 19th November 2020:
Adjusting to the new normal of wearing masks in various places like supermarkets and found ourselves (minorly) annoyed that biometrics like faceid don't quite work with masks. We wondered how well voice biometrics could cope with this. Our experiment: https://t.co/zeX23huWon— Oxford Wave Research (@OxfordWave) October 27, 2020
Our Spectrumview app features in BBC 4 documentary - Ocean Autopsy: The Secret Story of Our Seas
Cambridge and Oxford (Wave Research) working together on voice similarity estimation using automatic systems and human perception! Paper available for free download on the Speech Communication journal website till 19 Nov with this link. https://t.co/iuVBBb6R6F pic.twitter.com/nKQ2RZGgJx— Oxford Wave Research (@OxfordWave) October 2, 2020
Oxford Wave Research Appointed as Salient Sciences’ Exclusive Distributor in UK and Ireland
Oxford Wave Research Ltd. are pleased to announce our appointment as the exclusive distributor in the United Kingdom and the Republic of Ireland for Salient Sciences (legal name Digital Audio Corporation, known to many as “DAC”).
We are excited to have our colleagues at Oxford Wave Research now officially offering Salient Sciences’ products and services in the UK and Ireland. We have previously worked closely with them on several interesting projects; going forward, we anticipate an even closer collaboration to provide unique, innovative solutions to our shared base of audio and video forensics clients worldwide.Donald Tunstall, General Manager, Salient Sciences:
We also have many years of experience working with the DAC hardware-based audio processing solutions, such as the MicroDAC, PCAP, and CARDINAL AudioLab systems.
OWR will now be taking over all sales and support in the UK and Ireland, with immediate effect, for the VideoFOCUS and CARDINAL MiniLab Suite products, including all maintenance contracts and support.
Watch this space for training course announcements from DAC in the UK in 2020.