Wednesday, November 29, 2017

Mozilla releases dataset to lower voice-recognition barriers

Mozilla recently released a data set called Common Voice collection, which contains almost 40,000 voice recordings from 20,000 people, making it the second-largest public voice dataset. This is the result of Mozilla's Common Voice project which allowed iOS and Android users to donate utterings through an app. This would help provide voice-enabled systems better data to learn from and provide improved service through devices. Recently, Microsoft claimed that they reached a voice-recognition error rate of 5.1% on the Switchboard corpus, which is the same average rate as a professional human transcriber. However, there is still a lot for these machine humans to learn. This dataset would improve those applications greatly. More information about this specific dataset can be found at http://www.zdnet.com/article/mozilla-releases-dataset-and-model-to-lower-voice-recognition-barriers/.

On a similar topic, I recently was talking to a professional from Humana this past weekend and was able to learn some interesting things about voice-enabled technologies (although the company doesn't really specialize in that certain kid of technology). According to him, voice-enabled technologies such as Alexa, Siri, Cortana, etc. are constantly learning and trying to prove better user experience by the minute. They are trying to learn from every question being asked and provide improved answers each time by learning from the experience of the user. For example, if I ask Alexa a question and she gives me an answer that is inaccurate, she would make note of that inaccuracy if I let her know. If this happens a few times, she will now know what not to answer for that question. Additionally, these technologies try to gather questions that are asked differently, but mean the same, into the same bucket and provide the same answer to those different questions in the future. For example, if Alexa is asked, "How old are you?" , "What is your age?" and "How long ago were you born?", it is going to process this as the same question and that it can asked differently, in spoken language. This is called Natural Language Processing and voice-enabled technologies are trying everyday to get better at this. Mozilla's new data set is trying, in some fashion, to help do the same.

No comments:

Post a Comment