The days of severe degenerative disease sufferers such as the late Stephen Hawking using a monotonic machine-sounding voice may be over.
In a breakthrough involving an Australian company, deep learning technology is being used to clone the original voice of those with motor neurone disease, allowing them to be heard in their own voice after they can no longer speak as the disease spreads.
While voice cloning is not new, researchers say the breakthrough is achieving this more readily than before. The technology involves two to three hours of taking voice samples in the early days of the disease and, using deep learning algorithms, identifying the individual attributes and tones that makes that voice unique.
Later stage MND sufferers who no longer speak can still have some limited muscle movement or can move their eyes. By synchronising small muscle movements with a light shining across a virtual keyboard, they can type. It may sound slow, but sufferers can learn to do this at a very fast rate.
Chris Griffith joins Biz Tech host Chloe James to discuss the latest consumer technology.
Text-to-speech technology then reproduces what they say, and this is spoken in their original voice.
While the innovation is aimed at MND sufferers, it could be used to reproduce any voice saying anything, which obviously is a serious concern. You could draft a speech and create a recording of the speaker delivering it without them having done so.
A demonstration website of the technology includes tweets of US President Donald Trump and former president Barack Obama spoken with a synthesised re-creation of their voices.
These voices still need work to sound authentic, and there is an absence of phrasing and pausing by the re-created voice. No doubt that will come.
Project Revoice was developed by creative agency BWM Dentsu Group in Australia alongside Canadian software partner Lyrebird, which developed the deep learning algorithms.
Lyrebird has been working on creating not only audio but also video of a speaker delivering a synthesised speech – here Barack Obama.
It’s not the only one doing this. Last year, the University of Washington produced a similar fake Obama video clip.
A lot is going on in this space. In December last year, Google’s parent Alphabet published a research paper on a text-to-speech system called Tactron 2, which it says can create synthesised speech that is almost indistinguishable from the human speaker.
“Generating very natural-sounding speech from text has been a research goal for decades,” the Google research blog post says. “Our approach does not use complex linguistic and acoustic features as input. Instead, we generate human-like speech from text using neural networks trained using only speech examples and corresponding text transcripts.”
The Google blog post says the feature uses an “80-dimensional audio spectrogram” with frames computed every 12.5 milliseconds that captures not only pronunciation of words but also various subtleties of human speech. This includes volume, speed and intonation. The result is converted to a 24 kHz waveform.
In Project Revoice’s case, the emphasis is on making it easy for disability sufferers to bank their voice. Project Revoice already has re-created the voice of MND sufferer Pat Quinn, who is credited as one of the initiators of the Ice Bucket Challenge.
There was one big problem — there were no voice samples of Quinn taken before he lost the ability to speak. So researchers synthesised his voice using video clips of him talking as an advocate for MND sufferers in earlier days.
Quinn says he is “blown away” by the new development.
“This technology gives me back a vital piece of myself that was missing,” he says. “For patients to know that they can still speak in their own voice after MND takes it away will transform the way people live with this disease.”
BWM Dentsu Group managing director Alex Carr says the company began working on the idea of using voice cloning to aid MND sufferers more than a year ago. “Voice tech is most commonly used for commercial purposes, so we really wanted to find a way to use this emerging tech in a more rewarding way,” Carr says.
“Since MND is a progressive and sometimes unpredictable disease, we believe it’s crucial to get the message out now and encourage more people to start thinking about voice banking while they still can, so they have the voice material necessary to create their ‘Revoice’ when the full application launches.”
Brian Frederick, executive vice-president for communications at the ALS Association in the US, says re-creating Quinn’s voice and hearing him use it for the first time with his friends and family is “truly inspirational”.
“The man who helped give ALS (amyotrophic lateral sclerosis, a motor neurone disease) a voice now has his own voice back,” Frederick says.
The Canadian firm that created the profile describes it as “the DNA of your voice”. In a blog post, it says it is aware of the ethical responsibility involved. “We want to ensure that your digital voice is yours,” Lyrebird says. “We are stewards of your voice, but you control its usage: no one can use it without your explicit consent.”
It says another firm would have developed the technology if it hadn’t. “Who knows if their intentions would be as sincere as ours: they could, for example, only sell the technology to a specific company or an ill-intentioned organisation. By contrast, we are making the technology available to anyone and we are introducing it incrementally so that society can adapt to it, leverage its positive aspects for good, while preventing potentially negative applications.”
Helping disability sufferers is not the only upbeat use of this technology. It also should make audiobooks incredibly easy to produce. Instead of authors or narrators having to read every chapter of the book, they could create a voice profile. Computers could create audiobooks by accessing the electronic e-book version of the text and applying that author voice profile. The downside is that professional narrators might be out of a job.
Users wanting to find out how to record and preserve their voices can join Project Revoice at www.projectrevoice.org. You can create and try out a lower-quality demonstration version of the technology at www.lyrebird.ai.
Published in The Australian newspaper.