In a new paper presented at the European Conference on Computer Vision (ECCV), a Google research team revealed a method for recognizing and detecting sign language through video calls with low latency. Of course, sign language recognition is not yet practical, such as video delay. The research team’s goal is to make it light and stable.
The sign language detection system first plays a video using a model called PoseNet and estimates the body and limb positions for each frame. Visual information, simplified to a basic bar shape, is compared with movements that look like sign language on live images in a model trained according to pose data.
In this simple process, the accuracy of determining whether the other party is sign language is already about 80%, and if it is further optimized, the accuracy becomes 91.5%. Considering the extent to which you can tell whether the search party is talking or coughing in most calls, this is a considerable level.
It works without adding a signal that a person is in sign language, but specifically outside the auditory range but produces a 20 kHz tone that can be detected by a computer audio system. This signal recognizes that the person is speaking in a loud voice to the generated voice detection algorithm each time the person speaks sign language. There seems to be no reason why it cannot be integrated into existing video call systems and applications using them. Related information can be found here.