Now, in an article published today in the Proceedings of the National Academy of Sciences, a team of researchers has shown how a machine can learn to recognize human speech in a challenging—but not impossible—way: by interacting with other machine systems. Their paper is titled “Speech Recognition Using Interaction Patterns.” 

The team’s approach is not unlike many of the AI systems we’ve seen that solve challenging tasks, including the voice assistant Siri. In Siri, the team shows that by designing an AI “agent” that can learn and modify its behavior based on interactions with other AI systems, it can surpass a human, a task that is made even more difficult by the fact that even humans are better at recognizing speech than computers. 

In the paper, the authors show how their new algorithm can do this “by interacting with other AI agents in a different kind of context,” a key concept introduced by the team in its paper.

The system learns to recognize the basic patterns of human speech, from the beginning of the sentence to the end. “We see that the agent learns to perform this kind of context-specific interaction,” Dr. Kevin Simms, who is lead author of the paper, said, “and once we show how it can do that in practice, we can then train this agent so that it knows the context of these other agents in the training data.”

A key challenge, though, is convincing the system that this is a good thing to do, and that it doesn’t just be a distraction to the system in the background.The researchers present their system as consisting of two parts. The first, called the teacher, is the system described in the paper’s abstract: it is a computer that is trained to recognize the basic patterns of human speech.

The second, called the student, is the system that is taught by the teacher. This second part is shown to recognize the same speech patterns as the teacher, but the system does so in a different way.