Soft clustering algorithms : theoretical and practical improvements / Kathrin Bujna ; [accepted at the recommendation of Prof. Dr. Johannes Blömer (Paderborn University) and Prof. Dr. Eyke Hüllermeier (Paderborn University)]. Paderborn, 2017
Inhalt
- Abstract
- Zusammenfassung
- Contents
- Cheat Sheet
- 1 Preface
- I Soft Clusterings
- 2 Basics
- 3 From Soft Clusters to Hard Clusters
- II Fuzzy K-Means Problems
- 4 Introduction
- 4.1 The Fuzzy K-Means Problem
- 4.2 A Comparison with the K-Means Problem
- 4.3 Related Work
- 4.4 More Related Work (The K-Means Problem)
- 4.4.1 The Bad News First
- 4.4.2 (Few Practical) Approximation Algorithms
- 4.4.3 Clustering is Difficult – Except when It Is Not
- 4.4.4 Constraints and Side Information
- 4.5 Overview
- 5 Basics
- 5.1 Problem Definition
- 5.2 Fuzzifier Functions
- 5.2.1 Definition
- 5.2.2 Basic Properties
- 5.2.3 Bounded Contribution
- 5.2.4 Bounded Increase
- 5.2.5 Reducing Probabilities
- 5.2.6 Induced r-Fuzzy Clusterings
- 5.3 Special Cases
- 6 Two Key Properties
- 7 Baselines
- 7.1 Contribution
- 7.2 2-Approximation Algorithm
- 7.3 (1+eps)-Approximation Algorithm
- 7.4 (const/minimumContribution)-Approximation Algorithm
- 8 Superset Sampling for Fuzzy Clusters
- 8.1 Related Work
- 8.2 Contribution
- 8.3 From Fuzzy Clusters to Hard Clusters
- 8.4 Applying Superset Sampling
- 8.5 Combining the Results
- 8.5.1 Approximation Factor
- 8.5.2 Removing the Restriction to Rational Weights
- 8.5.3 Removing the Restriction to Clusters with A Minimum Weight
- 8.6 Algorithms
- 9 A Discretization
- 9.1 Contribution
- 9.2 Preliminaries
- 9.3 Basic Construction
- 9.4 Distances and Costs
- 9.4.1 Outside the Search Space
- 9.4.2 Rings
- 9.4.3 A Point and Its Representative
- 9.4.4 Replace Means by Their Representatives (K-Means)
- 9.4.5 Replace Means by Their Representatives (r-Fuzzy K-Means)
- 9.5 A Discrete Search Space
- 10 An eps-Approximate Mean Set
- 11 Dimension Reduction
- 12 Coresets
- 13 Summary & Conclusion
- III Clustering with Gaussian Mixture Models
- 14 Introduction
- 14.1 Gaussian Mixture Models (GMMs)
- 14.2 Likelihood Approach
- 14.2.1 Likelihood
- 14.2.2 Likelihood Ratio
- 14.2.3 Scale Invariance of the Likelihood-Ratio
- 14.2.4 Maximum Likelihood Estimator for K>1
- 14.2.5 Maximum Likelihood Estimator for K=1
- 14.2.6 Constrained Maximum Likelihood Estimation
- 14.2.7 Remarks
- 14.3 Expectation-Maximization (EM)
- 14.4 Overview
- 15 A Non-Asymptotic Comparison of EM and SEM Algorithms
- 15.1 Introduction
- 15.2 Scope of Our Comparison
- 15.3 Related Work
- 15.4 Contribution
- 15.5 Theoretical Comparison
- 15.6 Some Concrete Examples
- 15.7 Discussion
- 16 Adaptive Seeding for Gaussian Mixture Models
- 16.1 Related Work
- 16.2 Our Contribution
- 16.3 Baseline Algorithms
- 16.4 Adaptive Seeding for GMMs
- 16.4.1 Choosing the Next Point
- 16.4.2 Construction of a k-GMM
- 16.4.3 Post-Processing of the K-GMM
- 16.4.4 Summary and Comparison
- 16.5 Evaluation
- 16.6 Conclusion and Future Work
- 17 On the Soft K-Means Problem
- 17.1 Related Work
- 17.2 Contribution
- 17.3 The Weighted Soft K-Means Problem
- 17.4 A Clustering-Centric Variant
- 17.4.1 Motivation
- 17.4.2 A First Clustering-Centric Variant
- 17.4.3 A Relaxation
- 17.4.4 A Relaxed Clustering-Centric Approximation Problem
- 17.5 Towards an Analysis
- 17.5.1 Applying Our Soft-to-Hard-Cluster Technique
- 17.5.2 Applying an Algorithm for the Constrained K-Means Problem
- 17.5.3 Determining the Soft Clustering
- 17.6 Conclusions
- IV Appendix
