Speech Communication journal publication on voice similarity – joint work by Cambridge University and Oxford Wave Research

1st October 2020 Sam Kent

Exploring the relationship between voice similarity estimates by listeners and by an automatic speaker recognition system incorporating phonetic features

We are happy to announce that our latest paper has been accepted for publication in the prestigious ‘Speech Communication‘ journal. This represents joint work between Cambridge University’s ‘Faculty of Modern and Medieval Languages and Linguistics’ and Oxford Wave Research (OWR).

This paper is titled ‘Exploring the relationship between voice similarity estimates by listeners and by an automatic speaker recognition system incorporating phonetic features’ and is authored by Linda Gerlach (OWR, Cambridge), Dr Kirsty McDougall (Cambridge), Dr Finnian Kelly (OWR), Dr Anil Alexander (OWR), Prof. Francis Nolan (Cambridge).

Similar-sounding voices is of interest in many areas, be it for voice parades in a forensic setting, voice casting for film-dubbing or voice banking to save one’s voice for future synthesis in case of a degenerative disease. However, it is a very time-consuming and expensive task. With the aim of finding an objective method that could speed up the process, we considered an automatic approach to rate voice similarity and explored the relationship between voice similarity ratings made by a total of 106 human listeners – some of whom may have been you – and comparison scores produced by an i-vector-based automatic speaker recognition system that extracts perceptually-relevant phonetic features. Our results showed a significant positive correlation between human and machine, motivating us to continue our developments in this space.

The main highlights of this work are that human judgements of voice similarity are seen to correlate with automatic speaker recognition assessments (using auto-phonetic features) (this trend was seen with both English and German speakers’ judgements of English voices). These automatic speaker recognition assessments therefore show potential for automatically selecting foil voices for voice parades.

This paper is based on Linda’s Gerlach’s master’s thesis work (University of Marburg, Germany) at Oxford Wave Research last year and uses the phonetic mode of VOCALISE speaker recognition software.

The full paper is available for free download on the Journal’s webpage. Please check the following link for the full abstract and paper, available for free using this link before 19th November 2020:

https://authors.elsevier.com/a/1bqZu_3pyeDhKh

News