de
en
Close
Detailsuche
Bibliotheken
Projekt
Imprint
Privacy Policy
Close
Publizieren
Besondere Sammlungen
Digitalisierungsservice
Hilfe
Impressum
Datenschutz
jump to main content
Search Details
Quicksearch:
OK
Result-List
Title
Title
Content
Content
Page
Page
Search Book
Soft clustering algorithms : theoretical and practical improvements / Kathrin Bujna ; [accepted at the recommendation of Prof. Dr. Johannes Blömer (Paderborn University) and Prof. Dr. Eyke Hüllermeier (Paderborn University)]. Paderborn, 2017
Content
Abstract
Zusammenfassung
Contents
Cheat Sheet
1 Preface
1.1 Outline
1.2 Publications & Credits
I Soft Clusterings
2 Basics
2.1 Notation: Indices, Vectors, Data Sets
2.2 Clusterings
2.2.1 Soft Clustering
2.2.2 Hard Clustering
2.2.3 Clustering Problems
2.3 Descriptive Statistics
2.3.1 Cluster Statistics
2.3.2 Data Set Statistics
2.3.3 Lemmata
2.3.4 Scaling Weights and Copying Data Points
3 From Soft Clusters to Hard Clusters
3.1 Related Work
3.2 Contribution
3.3 Imitating Softness by Randomness
3.3.1 Probabilistic Memberships
3.3.2 Algorithm
3.4 Concentration Bounds
3.4.1 Elementary Inequalities
3.4.2 Chernoff Inequalities
3.5 Analysis
3.5.1 Preliminaries
3.5.2 Weight
3.5.3 Mean Vector
3.5.4 Covariance Matrix
3.5.5 Cost and Variance
3.6 Conclusions
3.6.1 Existence of Similar Hard Clusters
3.6.2 Quality of an Imitation
3.6.3 Remarks
II Fuzzy K-Means Problems
4 Introduction
4.1 The Fuzzy K-Means Problem
4.1.1 Problem Definition
4.1.2 Fuzzy K-Means Algorithm
4.1.3 No Guarantees
4.2 A Comparison with the K-Means Problem
4.2.1 Similarities
4.2.2 Differences
4.2.3 Statistical Assumptions
4.3 Related Work
4.3.1 The Fuzzy K-Means Algorithm
4.3.2 Fuzzifier
4.3.3 Extensions
4.4 More Related Work (The K-Means Problem)
4.4.1 The Bad News First
4.4.2 (Few Practical) Approximation Algorithms
4.4.3 Clustering is Difficult – Except when It Is Not
4.4.4 Constraints and Side Information
4.5 Overview
5 Basics
5.1 Problem Definition
5.1.1 Cost and Clusters
5.1.2 Induced Solutions
5.1.3 Approximation
5.2 Fuzzifier Functions
5.2.1 Definition
5.2.2 Basic Properties
5.2.3 Bounded Contribution
5.2.4 Bounded Increase
5.2.5 Reducing Probabilities
5.2.6 Induced r-Fuzzy Clusterings
5.3 Special Cases
5.3.1 Identity – K-Means
5.3.2 Power Function – Classical Fuzzy K-Means
5.3.3 Quadratic-Linear – Between K-Means and Fuzzy K-Means
5.3.4 Exponential Fuzzifier
6 Two Key Properties
6.1 Relation to the K-Means Cost Function
6.2 Negligible Clusters
7 Baselines
7.1 Contribution
7.2 2-Approximation Algorithm
7.3 (1+eps)-Approximation Algorithm
7.4 (const/minimumContribution)-Approximation Algorithm
8 Superset Sampling for Fuzzy Clusters
8.1 Related Work
8.2 Contribution
8.3 From Fuzzy Clusters to Hard Clusters
8.4 Applying Superset Sampling
8.5 Combining the Results
8.5.1 Approximation Factor
8.5.2 Removing the Restriction to Rational Weights
8.5.3 Removing the Restriction to Clusters with A Minimum Weight
8.6 Algorithms
8.6.1 A Deterministic Approximation Algorithm (Algorithm 8)
8.6.2 A Randomized Algorithm (Algorithm 9)
9 A Discretization
9.1 Contribution
9.2 Preliminaries
9.3 Basic Construction
9.4 Distances and Costs
9.4.1 Outside the Search Space
9.4.2 Rings
9.4.3 A Point and Its Representative
9.4.4 Replace Means by Their Representatives (K-Means)
9.4.5 Replace Means by Their Representatives (r-Fuzzy K-Means)
9.5 A Discrete Search Space
10 An eps-Approximate Mean Set
10.1 Related Work
10.2 Contribution
10.3 Main Result
10.4 Application
10.5 Analysis
11 Dimension Reduction
11.1 The Johnson Lindenstrauss Lemma
11.1.1 Related Work
11.1.2 Main Result
11.1.3 Application
11.2 Principal Component Analysis
12 Coresets
12.1 Related Work
12.2 Contribution
12.3 Main Result
12.4 Application
12.5 Analysis
12.5.1 The Key Ideas
12.5.2 Outline of the Analysis
12.5.3 Preliminaries
12.5.4 Weaker Coreset for a Fixed Number of Arbitrary Solutions
12.5.5 Weak Coreset
12.5.6 Size of S and Runtime
12.5.7 These Weak Coresets Are Not Weak
13 Summary & Conclusion
13.1 Review
13.2 Overview of Our Algorithms
13.3 Discussion
13.4 Future Work
III Clustering with Gaussian Mixture Models
14 Introduction
14.1 Gaussian Mixture Models (GMMs)
14.1.1 Density Function
14.1.2 Generating Observations
14.1.3 Remarks
14.2 Likelihood Approach
14.2.1 Likelihood
14.2.2 Likelihood Ratio
14.2.3 Scale Invariance of the Likelihood-Ratio
14.2.4 Maximum Likelihood Estimator for K>1
14.2.5 Maximum Likelihood Estimator for K=1
14.2.6 Constrained Maximum Likelihood Estimation
14.2.7 Remarks
14.3 Expectation-Maximization (EM)
14.3.1 General Framework
14.3.2 EM Algorithm for GMMs
14.4 Overview
15 A Non-Asymptotic Comparison of EM and SEM Algorithms
15.1 Introduction
15.2 Scope of Our Comparison
15.3 Related Work
15.4 Contribution
15.5 Theoretical Comparison
15.5.1 A Non-Asymptotic Bound
15.5.2 Special Case: Gaussian Mixture Models (GMMs)
15.6 Some Concrete Examples
15.7 Discussion
16 Adaptive Seeding for Gaussian Mixture Models
16.1 Related Work
16.2 Our Contribution
16.3 Baseline Algorithms
16.4 Adaptive Seeding for GMMs
16.4.1 Choosing the Next Point
16.4.2 Construction of a k-GMM
16.4.3 Post-Processing of the K-GMM
16.4.4 Summary and Comparison
16.5 Evaluation
16.5.1 Preliminaries
16.5.2 Artificial Data Sets
16.5.3 Results: Real-World Data Sets
16.6 Conclusion and Future Work
17 On the Soft K-Means Problem
17.1 Related Work
17.2 Contribution
17.3 The Weighted Soft K-Means Problem
17.3.1 Preliminaries
17.3.2 Problem Statement
17.3.3 Approximation
17.4 A Clustering-Centric Variant
17.4.1 Motivation
17.4.2 A First Clustering-Centric Variant
17.4.3 A Relaxation
17.4.4 A Relaxed Clustering-Centric Approximation Problem
17.5 Towards an Analysis
17.5.1 Applying Our Soft-to-Hard-Cluster Technique
17.5.2 Applying an Algorithm for the Constrained K-Means Problem
17.5.3 Determining the Soft Clustering
17.6 Conclusions
IV Appendix
A Three Handy Lemmata
The search-operation requires javascript to be activated.