AI bid to future-proof our ancient languages

Researchers in Queensland are harnessing artificial intelligence to document Australian indigenous languages, aiming to transcribe 40,000 hours of recordings by computers using machine learning.

Speaking at a Google forum on artificial intelligence in Sydney, University of Queensland professor Janet Wiles said her group was working with communities in more than 100 places documenting oral languages.

The group had developed speech-to-text technology for 12 languages in the Asia-Pacific region, including seven in Australia.

“It generates a lot of audio recording and over a century, 40,000 hours of recordings have been collected,” said Professor Wiles, of the university’s School of Information Technology and Electrical Engineering.

However, it was difficult to transcribe an oral language into a written form, she said.

This involved creating lists of words to make the language available to future generations. Such lists and samples of lan­guages expressed things important to those communities.

She said human transcription was a laborious task, particularly for linguists transcribing from a language they were also learning.

“Australia has over 300 indigenous languages. More than 145 are still spoken. But actually, only about 18 are spoken by all generations in the community,” she told the forum. “It’s a race against time to record this precious heritage, which can support transmission across generations.

“The elephant in the room is a really dull, boring problem called transcription.”

Professor Wiles said it would take almost two million hours or 10,000 PhD students to transcribe the 40,000 hours recorded.

“In the first year of a PhD, a linguist might record hundreds of thousands of hours of audio but typically they would transcribe one hour,” she said. “I thought there’s got to be a better way.”

She said she received support from the Googleplex centre in California to develop the transcription system. “Our centre works with some of the oldest languages in the world, and that’s ­really what’s bringing our partnership together,” she said.

As part of the project, she was building a system for linguists to use with their own data, and also building social robots for two to five year olds. The robot could understand their indigenous language.

“This can get kids excited about language, and give (them) lots of practice, lots of stories and lots of memory games without exhausting their elders at the same time,” she said.

“Wouldn’t it be awesome if every child could speak … in their own language?”

Such a system could be a launching pad for students then learning other languages. “Australian indigenous languages have something that is very special about them,” she said. “Every language is tied to a place … the language is a unique survival guide to that particular place.”

Australian researchers are also developing machine learning systems that can aid the diagnosis of prostate, breast and lung cancers.

Dr Elliot Smith from Australian start-up Maxwell Plus said the company was using thousands of pieces of anonymous medical data to build a machine learning model to predict the cancers. “This will lead to better patient outcomes for everybody,” he said.

It involved applying machine learning to anonymised clinical data such as images and blood test results, and for the model to apply what it learned from assessing that data to assessing new patients’ tests.

He believed that patients would more willingly undergo lifesaving tests if they were less invasive than, say, current testing for prostate cancer. In this case, the modelling would include data such as prostate specific antigen (PSA) blood testing done to assess the risk of prostate cancer. PSA results would be one input.

He said the AI system would launch later this year.

Published in The Australian newspaper.

Posted in News.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.