
Speech recognition algorithms are now part of everyday life as they are used in various devices and applications such as smart speakers and smartphones. However, in an experiment using speech recognition algorithms introduced by Apple, Amazon, Google, IBM, and Microsoft, it is said that there is a problem in that the speech recognition algorithm cannot recognize black nails well compared to white voices.
Speech recognition algorithms are used in various apps such as smart assistant manipulation, voice input, and text service. The speech recognition system uses machine learning algorithms and trains machine learning algorithms on speech data and text data prepared by developers.
To investigate the accuracy of these speech recognition algorithms, the Stanford University research team conducted an experiment in which voices spoken by empowered people were converted into text on speech recognition algorithms of Apple, Amazon, Google, IBM, and Microsoft. The voices used in the experiment were 19.8 hours long and consisted of 2,141 voices spoken by 42 white people and 73 black people. Also, 44% of the speakers were male, and the average age was 45.
As a result of the experiment, the speech recognition algorithms of each company misunderstood 19% of words spoken by whites on average, but the rate of words spoken by blacks was 35%. In addition, the error rate was 41% of all black men and 30% of black women.

In any speech recognition algorithm, the black side’s error rate exceeds the white side. In the case of Apple’s speech recognition algorithm, the error rate was the highest, with 45% for black people and 23% for white people. Microsoft, the best performer, had an error rate of 27% for blacks and 15% for whites.
The research team found that the results were not specific to a specific company, and that all five showed similar patterns. In the past, there have been reports of racial bias in algorithms and software. In some cases, Google Photos recognized blacks as gorillas and tagged them, and blacks were evaluated unequally in medical systems where race-related data was not available.
This problem is likely to be caused by biases in the dataset for training machine learning algorithms. The data used for training itself contains abundant white speaker voices, and if the black speaker’s voice is not included, the speech recognition algorithm cannot learn the black speaker accent or tone well, and the error rate increases. The research team points out that it shows that developers need to use a wider variety of data to train speech recognition algorithms.
Regarding this result, Google said that fairness is one of the fundamental principles of Google AI and that it has been working to improve the accuracy of speech recognition algorithms for several years, and it will continue in the future. IBM said it would continue to develop, improve, and improve natural language and speech processing capabilities, and strive to improve the level of user capabilities through IBM Watson. Amazon is providing an explanation on its webpage about whether it is constantly improving its speech recognition algorithm. Related information can be found here .
Add comment