Noise cancellation, which eliminates ambient noise when making calls, is essential for a comfortable conversation. In fact, daily life itself is surrounded by a lot of ambient noise. When you make a call from the city or the airport, you hear the same ambient noise. The problem is that it is not always easy to find a product that completely blocks the noise. The 2Hz project, however, is taking the challenge of building practical noise cancellation capabilities using deep learning.
It is the noise cancellation system built through running. The demo video records horses in an environment where environmental noise is disturbing to the ear. The recorded voice is obviously noisy because it is noisy. Here, 2Hz noise cancellation makes the sound clear. Even if there is almost no noise in the background, it is not.
In addition, the voice recorded with the siren sound of the police patrol car is reproduced through the 2Hz system, too, and it can not even know whether the siren is raised from the back only by a clear voice. NVIDIA has achieved such a high level of noise cancellation, but building such a noise suppression system has been extremely difficult.
The noise cancellation function implemented by 2Hz suppresses the noise that is heard from the first person who makes a call first to the other person. In contrast, active noise cancellation ANC (Active Noise Cancellation) is the earphone. ANC is designed to prevent noise from being heard behind the earphones or headphones and prevent them from reaching their ears. On the other hand, the 2Hz side focuses on suppressing the noise heard by the other party.
The ability to suppress noise when talking has made considerable technological advances in recent years. The latest smartphones have been quiet louder than 10 years ago, and these features are implemented with two or more microphones. One is usually placed at a point where the user can speak well when talking on the phone, and the other is located at a point where the environment sounds better, such as far away from the mouth microphone. If the loudspeakers are mainly picking up sounds, the rear microphones collect noise from the surroundings and filter out noise through the software.
However, this technology does not work well if you can not place more than two small microphones away like a smart watch, or if the user or device is in the wrong position or is shaking. In addition, the cost of production also increases in terms of multiple microphones. For this reason, 2Hz has created a structure to implement noise cancellation using a single microphone rather than multiple microphones.
Digital signal processing algorithms are often used in noise cancellation that only removes the audible background noise. This DSP algorithm works well when blocking continuous steady noise. However, it can not cope with short and fast noises such as baby cries and sirens.
At 2Hz, we used deep-run to replace the noise we had with conventional noise cancellation. This is how to build a noise cancellation system using deep running. It provides two types of voice data, noise and clear voice, and produces a mixed voice. Then, clean speech data and artificial noise data are input to the DNN.
Then, the input noise is removed from the voice data to train clean voice data to be output. By creating a mask that can extract clean voice data, we will create a noise cancellation system that utilizes deep running. The 2Hz project will allow us to develop our own DNN architecture and create masks that can cope with various noises.
One of the problems when using noise cancellation in a voice call is voice delay. A person can tolerate a delay of up to 0.2 seconds in a real-time conversation, but when this happens, a smooth conversation can not be done. Three factors such as line, computer, and coding affect the call delay, but usually the line situation is the one that most significantly affects the delay time. However, noise cancellation using DNN can not deny the possibility of delaying the actual call.
For this reason, to support high-quality noise-cancellation calls, you need to increase the computer’s performance to handle them. However, it is not realistic to include high-end computing for noise cancellation on devices such as smartphones for calls. For this reason, NVIDIA has come up with the idea of clouding noise cancellation tools. The noise cancellation system is software based, so it does not have to be on the local device itself.
If you are a big VoIP provider, you have to handle bulk calls simultaneously. It is said that a media server usually handles 3,000 calls using G.711 voice code at the same time. If noise cancellation is integrated into a VoIP voice call system, server-side processing may be delayed and adversely affect service quality. The 2Hz project says it could not get cost-effective results when it tried to use the CPU. For this reason, the NVIDIA GPU GTX1080 Ti was used to test VoIP processing, including noise cancellation, resulting in simultaneous processing of 1,000 calls without server optimization, and optimization enabled 3,000 simultaneous calls.
The basic processing, such as voice transmission or coding, is done by the CPU and the GPU does the noise cancellation batch processing, which does not affect the existing VoIP processing and suppresses noise as much as possible. The GPU specializes in large-scale parallel processing for 3D graphics processing. NVIDIA has seen GPUs in batch processing, such as deep running, as evidenced by the fast-paced backdrop of the past few years. Nvidia says the GPU is well suited for batch processing noise cancellation using deep runs. In this regard, this 2Hz is also research and development aspect of expansion of the utilization market of the GPU, but it is obvious that the introduction of AI-related technology such as deep running in noise cancellation is quite eye-catching. For more information, please click here.