Algorithms for the Bregman k-Median problem / Marcel R. Ackermann. 2009
Content
1 Introduction
2 Bregman divergences
3 The k-median problem
3.1 The generalized k-median problem
3.2 Bregman k-median clustering
3.3 A simple optimal algorithm for d=1
4 (1+ε)-approximate clustering by uniform sampling
4.1 Algorithm for [γ,δ]-sampleable dissimilarity measures
4.1.1 Superset sampling
4.1.2 The algorithm
4.1.3 Analysis for k=2
4.1.4 Analysis for k≥2
4.1.5 Adaptation for weighted input sets
4.2 Sampling for large classes of dissimilarity measures
4.2.1 Sampling for Mahalanobis distances
4.2.2 Sampling for μ-similar Bregman divergences
4.2.3 Sampling for the Hellinger distance
4.2.4 Sampling for arbitrary metrics with bounded doubling dimension
4.2.5 Sampling for the Hamming distance
4.3 Generalization of the sampling property
4.4 Discussion
5 A practical O(log k)-approximate algorithm
6 Strong coresets for Mahalanobis distances
6.1 Har-Peled-Mazumdar coresets
6.1.1 Euclidean k-means clustering
6.1.2 Mahalanobis k-median clustering
6.1.3 Properties of Har-Peled-Mazumdar coresets
6.2 A new coreset construction based on non-uniform sampling
6.3 Discussion
7 Weak coresets for μ-similar Bregman divergences
8 On the limits of using uniform sampling
8.1 Sampleable Bregman divergences avoid singularities
8.2 Explicit domain bounds for some Bregman divergences
8.3 Discussion
A Applications
A.1 Estimating mixtures of identical Gaussian sources
A.2 Model reduction for data compression
A.3 Codebook generation for vector quantization in speech processing
B Mathematical fundamentals
B.1 The vectorspace ℝd
B.1.1 Vector arithmetic and inner product
B.1.2 Distances and metrics
B.1.3 Convex sets
B.1.4 Relative interior
B.2 Inequalities
B.2.1 Triangle inequality of the reals
B.2.2 Bounds to the binomial coefficient
B.2.3 Bounds on the harmonic number
B.2.4 Chebyshev's sum inequality
B.3 Calculus
B.3.1 Partial derivatives
B.3.2 Mean value theorem
B.3.3 Taylor expansion
B.3.4 Convex functions
B.3.5 Positive definiteness
B.4 Probability theory