Low-Complexity Acoustic Echo Cancellation with Neural Kalman Filtering

Dong Yang*, Fei Jiang*, Wei Wu, Xuefei Fang, and Muyong Cao
The GVoice Team @ Tencent Technology
*Equal contribution

Contents

  1. Abstract
  2. Synthetics test set results
  3. ICASSP 2021 AEC challenge blind test set results
  4. Bonus test results

1. Abstract

The Kalman filter has been adopted in acoustic echo cancellation due to its robustness to double-talk, fast convergence, and good steady-state performance. The performance of Kalman filter is closely related to the estimation accuracy of the state noise covariance and the observation noise covariance. The estimation error may lead to unacceptable results, especially when the echo path suffers abrupt changes, the tracking performance of the Kalman filter could be degraded significantly. In this paper, we propose the neural Kalman filtering (NKF), which uses neural networks to implicitly model the covariance of the state noise and observation noise and to output the Kalman gain in real-time. Experimental results on both synthetic test sets and real-recorded test sets show that, the proposed NKF has superior convergence and re-convergence performance while ensuring low near-end speech degradation comparing with the state-of-the-art model-based methods. Moreover, the model size of the proposed NKF is merely 5.3 K and the RTF is as low as 0.09, which indicates that it can be deployed in low-resource platforms.

2. Results on the synthetic test set

(Averaged ERLE curves of the synthetic double-talk test set. Abrupt echo path change occurs at the shaded region.)
(Mel spectrograms of the first test sample below. Abrupt echo path change occurs at 4.2 s.)
(RNN-AEC refers to the baseline model of Interspeech 2021 AEC challenge, which is a fully data-driven model with 1.3 M parameters.)

Double-talk with abrupt echo path change

Mic GT PNLMS PFDKF TFDKF RNN-AEC Meta-AF NKF (proposed)

3. Results on the ICASSP 2021 blind test set

Double-talk without echo path change

Mic Ref PNLMS PFDKF TFDKF RNN-AEC Meta-AF NKF (proposed)

Double-talk with echo path change

Mic Ref PNLMS PFDKF TFDKF RNN-AEC Meta-AF NKF (proposed)

4. Bonus test results

The test samples in this section are synthesized by real-recorded RIRs. We will show that, although only clean speech data is used during training, NKF can generalize well to music and noise data.

Microphone signal Far-end signal Ground truth near-end signal DTLN AEC result NKF AEC result
Far-end:music, near-end:speech
Far-end:music, near-end:speech+noise