Gerlach et al (2020) Published in Speech Communication

Exploring the relationship between voice similarity estimates by listeners and by an automatic speaker recognition system incorporating phonetic features



We are happy to announce that our latest paper has been accepted for publication by Speech Communication. Selecting similar-sounding voices is of interest in many areas, be it for voice parades in a forensic setting, voice casting for film-dubbing or voice banking to save one’s voice for future synthesis in case of a degenerative disease. However, it is a very time-consuming and expensive task. With the aim of finding an objective method that could speed up the process, we considered an automatic approach to rate voice similarity and explored the relationship between voice similarity ratings made by a total of 106 human listeners – some of whom may have been you – and comparison scores produced by an i-vector-based automatic speaker recognition system that extracts perceptually-relevant phonetic features. Our results showed a significant positive correlation between human and machine, motivating us to continue our developments in this space

Please check the following link for the full abstract and paper: