Robust multi-channel speech recognition with neural network supported statistical beamforming / von M.Sc. Jahn Heymann ; Erster Gutachter: Prof. Dr.-Ing. Haeb-Umbach, Zweiter Gutachter: Priv.-Doz. Dr. rer. nat. Ralf Schlüter. Paderborn, 2020
Inhalt
- Acknowledgement
- Abstract
- Zusammenfassung
- Introduction
- Neural networks
- Automatic speech recognition
- Speech signal processing
- Signal model
- Acoustic beamforming
- Spatial covariance matrix estimation
- Statistical mask estimation
- Dereverberation
- Summary
- Datasets, setup and baselines
- Contributions
- Robust multi-channel ASR with neural network supported beamforming
- Neural network mask estimation
- Wide Residual BLSTM Network acoustic model
- Training
- Evaluation
- Acoustic model
- cACGMM vs. neural network based mask estimator
- Performance over SNR
- Comparison of different beamformers
- Combination with WPE
- Comparison between BLSTM and U-Net
- Array independence
- Related work
- Summary
- Reducing latencies
- Unsupervised neural mask estimator training
- cACGMM likelihood loss
- cACGMM teacher
- Evaluation
- Optimization criterion
- Comparison with oracle target training
- Softmax vs. Sigmoid
- Comparison with teacher-student training
- Summary
- Joint optimization
- Backpropagating gradients
- Evaluation
- Initial experiment
- Research questions
- Impact of the training data
- Model analysis
- Performance on REVERB
- Answers
- Summary
- Summary
- Appendix
- Gradient for the Cholesky factorization
- Gradient for the complex valued eigenvalue decomposition
- Reproducibility
- Symbols and notation
- List of Figures
- List of Tables
- Acronyms
- Bibliography
- Own publications
