The world is only just getting used to the power and sophistication of virtual assistants made by companies like Amazon and Google, which can decode our spoken speech with eerie precision compared to what the technology was capable of only a few short years ago.
In truth, however, a far more impressive and mind-boggling milestone may be just around the corner, making speech recognition seem almost like child's play: artificial intelligence (AI) systems that can translate our brain activity into fully formed text, without hearing a single word uttered.
It's not entirely science fiction. Brain-machine interfaces have evolved in leaps and bounds over recent decades, proceeding from animal models to human participants, and are, in fact, already attempting this very kind of thing.
Just not with much accuracy yet, researchers from the University of California San Francisco explain in a new study.
To see if they could improve upon that, a team led by neurosurgeon Edward Chang of UCSF's Chang Lab used a new method to decode the electrocorticogram: the record of electrical impulses that occur during cortical activity, picked up by electrodes implanted in the brain.
In the study, in which four patients with epilepsy wore the implants to monitor seizures caused by their medical condition, the UCSF team ran a side experiment: having the participants read and repeat a number of set sentences aloud, while the electrodes recorded their brain activity during the exercise.
This data was then fed into a neural network that analysed patterns in the brain activity corresponding to certain speech signatures, such as vowels, consonants, or mouth movements, based on audio recordings of the experiment.
After this, another neural network decoded these representations – gleaned from repetitions of 30–50 spoken sentences – and used it to try to predict what was being said, purely based on the cortical signatures of the words.
At its best, the system produced a word error rate (WER) with one participant of just 3 percent in translating the brain signals into text – which may be about as close to reading somebody's mind as AI has ever gotten, at least in these strictly defined experimental conditions.
In their paper, the team details numerous examples of the reference sentences the participants said, along with the 'predictions' the network generated, sometimes erroneous, but not always. When the mistakes were apparent, though, they seem very different to the results of speech being misheard by human ears (which could be a side effect of the limited dataset introduced to the AI).
Examples of mistakes include: 'the museum hires musicians every evening', which was predicted as 'the museum hires musicians every expensive morning'; 'part of the cake was eaten by the dog' was predicted to be 'part of the cake was the cookie'; and 'tina turner is a pop singer', which became 'did turner is a pop singer'.
In the least accurate cases, the errors bear virtually no relation, semantically or phonetically, to what was being said: 'she wore warm fleecy woollen overalls' was interpreted to be 'the oasis was a mirage'.
Nonetheless, despite the weirdness of the obvious errors, overall the system may constitute a new benchmark for AI-based decoding of brain activity, and one that, at its best, is on par with professional human speech transcription, which has a WER of 5 percent, the team says.
Of course, professional transcribers dealing with ordinary human speakers have to contend with vocabularies that extend into the tens of thousands of words. By contrast, this system only had to learn the cortical signatures of about 250 unique words used in a limited set of short sentences, so it's not really a fair comparison.
While there are numerous hurdles to overcome, the team suggests the system might one day act as the basis of a speech prosthesis for patients who have lost the power to talk. If such a thing is possible, it could be a big deal – providing some people with a way to communicate with the world - and in ways that may go far beyond what the experiments show so far.
"In a chronically implanted participant, the amount of available training data will be orders of magnitude greater than the half hour or so of speech used in this study," the authors explain, "which suggests that the vocabulary and flexibility of the language might be greatly expandable."
The findings are reported in Nature Neuroscience.