Discovering structure in speech recordings : unsupervised learning of word and phoneme like units for automatic speech recognition / von Dipl.-Ing. Oliver Walter ; Erster Gutachter: Prof. Dr.-Ing. Reinhold Häb-Umbach, Zweiter Gutachter: apl. Prof. Dr. Frank Kurth. Paderborn, 2021
Inhalt
- 1 Introduction
- 2 State of the Art
- 2.1 Automatic speech recognition
- 2.2 Unsupervised learning for speech
- 2.3 Heuristic approaches to unsupervised learning
- 2.4 Model based approaches to unsupervised learning
- 2.5 Further approaches
- 3 Contribution and Organization
- 4 Unsupervised Speech and Language Acquisition: A Heuristic Approach
- 4.1 An Example for and Approaches to Unsupervised Learning
- 4.2 Heuristic Dynamic Time Warping-based Approach
- 5 Model based approach to Unsupervised Learning for Automatic Speech Recognition
- 5.1 An example for and approaches to unsupervised learning
- 5.2 Hierarchical structure of Speech and Language
- 5.3 Automatic Speech Recognition
- 5.4 Unsupervised learning for ASR
- 6 Unsupervised word segmentation with n-gram Language Models
- 6.1 Unsupervised word segmentation from symbol sequences
- 6.1.1 NHPYLM
- 6.1.2 Gibbs Sampling based Optimization Algorithm
- 6.1.3 Forward-Filtering Backward-Sampling Algorithm
- 6.2 Unsupervised Word Segmentation of Phoneme Sequences
- 6.3 Word Segmentation from Speech Recordings
- 6.3.1 Contribution
- 6.3.2 Unsupervised Word Segmentation on HMM States
- 6.3.3 Unsupervised Word Segmentation on Phoneme Lattices
- 6.4 WFST based implementation of word segmentation
- 6.5 Full System with Acoustic Unit learning
- 6.5.1 Setup
- 6.5.2 Contribution
- 6.5.3 Step 1: Acoustic unit and feature transformation learning on MFCCs
- 6.5.4 Step 2: Acoustic unit learning on transformed MFCCs
- 6.5.5 Step 3: Unsupervised word segmentation on unsupervised acoustic units
- 6.6 Experimental results
- 7 Unsupervised Acoustic Model Training for Semantic Inference
- 7.1 Introduction
- 7.2 Vocal User Interface
- 7.3 Acoustic Representations
- 7.4 Domotica-3 Database
- 7.5 Experiments
- 8 Conclusion
- 9 Further Approaches and new Techniques
- A Appendix
- A.1 NHPYLM Model
- A.1.1 Sampling of discount and strength parameter for HPYLM
- A.1.2 Sampling word length
- A.1.3 Using of the NHPYLM as a generative model to draw a language model and generate a sample sequence of words
- A.2 Example: Gibbs sampling on a bivariate correlated Gaussian distribution
- A.3 List of own publications related to this work
- Acronyms
- Symbols
- List of Figures
- List of Tables
- Bibliography
