Low-Complexity Acoustic Echo Cancellation with Neural Kalman Filtering

Dong Yang^*, Fei Jiang^*, Wei Wu, Xuefei Fang, and Muyong Cao The GVoice Team @ Tencent Technology ^*Equal contribution

1. Abstract

The Kalman filter has been adopted in acoustic echo cancellation due to its robustness to double-talk, fast convergence, and good steady-state performance. The performance of Kalman filter is closely related to the estimation accuracy of the state noise covariance and the observation noise covariance. The estimation error may lead to unacceptable results, especially when the echo path suffers abrupt changes, the tracking performance of the Kalman filter could be degraded significantly. In this paper, we propose the neural Kalman filtering (NKF), which uses neural networks to implicitly model the covariance of the state noise and observation noise and to output the Kalman gain in real-time. Experimental results on both synthetic test sets and real-recorded test sets show that, the proposed NKF has superior convergence and re-convergence performance while ensuring low near-end speech degradation comparing with the state-of-the-art model-based methods. Moreover, the model size of the proposed NKF is merely 5.3 K and the RTF is as low as 0.09, which indicates that it can be deployed in low-resource platforms.

2. Results on the synthetic test set

(Averaged ERLE curves of the synthetic double-talk test set. Abrupt echo path change occurs at the shaded region.)

(Mel spectrograms of the first test sample below. Abrupt echo path change occurs at 4.2 s.) (RNN-AEC refers to the baseline model of Interspeech 2021 AEC challenge, which is a fully data-driven model with 1.3 M parameters.)