"We see three different biases that are developing in in the voice ecosystem — they are underrepresented demographics. So these are people in areas where the market is not powerful enough for bigger companies to reach. There is also gender biases and there's accent biases. So for instance, if you are a fluent, native English speaker in, say, Ireland, it's going to be a lot harder to use Siri than it is if you are a North American male," said Michael Henretty, Mozilla's Common Voice project lead.
Henretty is a part of a team at Mozilla that's working to eliminate the biases that can happen when there aren't a range of voices used to develop the technology. Mozilla's plan includes crowdsourcing voices from everyday people.
"I think really, right now, [companies are] sort of largely targeting North Americans or middle to upper class people. ... In a way, I mean, the bigger players ... right now, say like Google, IBM, what have you, in speech, they're all really focused on essentially the market power of the languages, even accents that are being targeted, and that sort of motivation. I mean, understandably so. I mean that they are businesses, however, it leaves a lot of people out in the cold," said Kelly Davis, Mozilla's head of machine learning.
That's where Mozilla's project, called Common Voice, is different — it's open source. Here's how it works: Mozilla put out a call for voices and has collected thousands in a variety of languages, English being the first. Anyone in the world can record themselves saying a prepared sentence for the supported languages. These voices are used to train Mozilla's own AI, called Deep Speech, and are also put up online as free, downloadable data sets.
Davis said: "One of the things we're hoping to kind of address with Common Voice and Deep Speech is to allow communities and allow us to actually provide speech recognition. And so for these underrepresented languages, and also for underrepresented communities that wouldn't necessarily, for financial reasons, be able to actually enjoy speech recognition, or actually use speech recognition in whatever products or services they want to release."
There aren't many women in this field of research. So one solution some have proposed to end bias against female voices is get more women in the field. And while that approach is well-intentioned, it may be an oversimplification. We met up with a leading female researcher, Karen Livescu.
"Depending on what data set you study, you might find a bias against female voices. And you might find a bias in favor of female voices for various reasons. And there's a lot more variation among different speakers between different speakers than there are between the genders. So on average, you'll typically see a gender difference in any given speech recognizer. But within each gender, you'll see a lot more variation between speakers," Livescu said.
"We see commercial products. So dictation products work really well, the kind where you put on a mic, and you dictate something. That works really well. Google Home or Siri or Alexa, they work somewhat, I would say. ... They work really impressively for what they are, but they're nowhere near replacing a human at the same tasks. And they're not nearly as good as we'd like them to be at recognizing different kinds of speakers, different dialects, different accents, different ages, different genders. So that's kind of where we are at, we're just barely able to … have some commercial success. But we're not really able to serve the whole population with all of their needs."
Experts see the benefits of speech recognition technology in a variety of situations. It can be used to help aid workers communicate during humanitarian disasters, help people search the web easier and offer people access to information in various languages. Researchers hope with more time and data, more communities will be empowered by this kind of tech.