It looks like it’s audio’s turn to get the AI spotlight shined on it this week. ElevenLabs nabbed another $19 million in a Series A round and launched its “audiobook in seconds” feature, Gladia unveiled how it’s building upon OpenAI’s Whisper and compiling audio intelligence, and now Voicemod is dropping some of the most accurate voice-changing services this former professional musician has ever heard.
Offering some 20 human-realistic options, according to the company, the AI was training on recordings from a variety of professional voice actors, giving users the ability to sound like “Jennifer”, a bright and optimistic sounding 20-something, right on through to Joe, an 80-year-old male voice that if I didn’t know any better, spent more than half his life next to a packet of Marlboro Reds.
While everything about Voicemod’s branding says gamer and/or metaverse and social media applications, the tech doesn’t stop there, and can easily be plugged into Facetime, Zoom, Google Meet, etc. Basically, if you’re using an audio interface anywhere online, Voicemod’s next-to-nothing lag can be between your natural voice and those on the other end of the headset.
Voicemod has been working on voice synthesis and interactive audio features since 2014 and has seen some 40+ million users take advantage of its offer. The company raised €7.1 million in a Bitkraft Ventures-led round back in 2020, and earlier this year welcomed $14.5 million more via Leadwind, The Mini Fund, K Fund, and Bitkraft. In late 2022 the company acquired Barcelona-based audio tech peer Voctro Labs.
“As a company dedicated to pushing the boundaries of voice transformation, we’re proud to launch this collection which showcases our cutting-edge technology, prioritising two things we understand are essential for our users: human realism and incredibly low latency,” explained Voicemod CEO and co-founder Jaime Bosch. “Users have enjoyed our realistic human voice filters in the past and we’re pleased to step up this offering with a collection of characters that are by far our most natural-sounding to date, enabling users to express themselves in new ways and with greater confidence, however, they want to be heard, all in real-time.”