1 / 15

Proximity algorithms for nearly-doubling spaces

Proximity algorithms for nearly-doubling spaces. Lee-Ad Gottlieb Robert Krauthgamer Weizmann Institute. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A. Proximity problems. In arbitrary metric space, some proximity problems are hard

kirra
Download Presentation

Proximity algorithms for nearly-doubling spaces

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Proximity algorithms for nearly-doubling spaces Lee-Ad Gottlieb Robert Krauthgamer Weizmann Institute TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAA

  2. Proximity problems • In arbitrary metric space, some proximity problems are hard • For example, the nearest neighbor search problem requires Θ(n) time • The doubling dimension parameterizes the “bad” case… q ~1 ~1 ~1 ~1 ~1 Proximity algorithms for nearly-doubling spaces

  3. Doubling Dimension • Definition: Ball B(x,r) = all points within distance r from x. • The doubling constant(of a metric M) is the minimum value ¸>0such that every ball can be covered by ¸balls of half the radius • First used by [Ass-83], algorithmically by [Cla-97]. • The doubling dimension is dim(M)=log ¸(M) [GKL-03] • A metric is doubling if its doubling dimension is constant • Packing property of doubling spaces • A set with diameter D and min. inter-point distance a, contains at most (D/a)O(log¸)points Here ≤7. Proximity algorithms for nearly-doubling spaces

  4. Applications • In the past few years, many algorithmic tasks have been analyzed via the doubling dimension • For example, approximate nearest neighbor search can be executed in time ¸O(1) log n • Some other algorithms analyzed via the doubling dimension • Nearest neighbor search [KL-04, BKL-06, CG-06] • Clustering [Tal-04, ABS-08, FM-10] • Spanner construction [GGN-06, CG-06, DPP-06, GR-08] • Routing [KSW-04, Sil-05, AGGM-06, KRXY-07, KRX-08] • Travelling Salesperson [Tal-04] • Machine learning [BLL-09, GKK-10] • Message: This is an active line of research… Proximity algorithms for nearly-doubling spaces

  5. Problem • Most algorithms developed for doubling spaces are not robust • Algorithmic guarantees don’t hold for nearly-doubling spaces • If a small fraction of the working set possesses high doubling dimension, algorithmic performance degrades. • This problem motivates the following key task • Given an n-point set S and target dimension d* • Remove from S the fewest number of points so that the remaining set has doubling dimension at most d* Proximity algorithms for nearly-doubling spaces

  6. Two paradigms • How can removing a few “bad” points help? Two models: • 1. Ignore the bad points • Outlier detection. • [GHPT-05] cluster based on similarity, seek a large subset with low intrinsic dimension. • Algorithms with slack. Throw bad points into the slack • [KRXY-07] gave a routing algorithm with guarantees for most of the input points. • [FM-10] gave a kinetic clustering algorithm for most of the input points. • [GKK-10] gave a machine learning algorithm – small subset doesn’t interfere with learning Proximity algorithms for nearly-doubling spaces

  7. Two paradigms • How can removing a few “bad” points help? Two models: • 2. Tailor a different algorithm for the bad points • Example: Spanner construction. A spanner is an edge subset of the full graph • Good points: Low doubling dimension sparse spanner with nice properties (low stretch and degree) • Bad points: Take the full graph • If the number of bad points is O(n.5), we have a spanner with O(n) edges Proximity algorithms for nearly-doubling spaces

  8. Results • Recall our key problem • Given an n-point set S and target dimension d* • Remove from S the fewest number of points so that the remaining set has doubling dimension at most d* • This problem is NP-hard • Even determining the doubling dimension of a point set exactly is NP-hard! • Proof on the next slide • But the doubling dimension can be approximated within a constant factor… • Our contribution: bicriteria approximation algorithm • In time 2O(d*) n3, we remove a number of points arbitrarily close to optimal, while achieving doubling dimension 4d* + O(1) • We can also achieve near-linear runtime, at the cost of slightly higher dimension Proximity algorithms for nearly-doubling spaces

  9. Warm up • Lemma: It is NP-hard to determine the doubling dimension of a set S • Reduction: from vertexcover with bounded degree Δ = n½. • the size of any vertex cover is at least n½. • Construction: A set S of n points corresponding to the vertex set V. • Let d(u,v) = ½ if the cor. vertices are connected by an edge • Let d(u,v) = 1 if the cor. vertices aren’t connected • Analysis: • Any subset of S found in a ball of radius ½ has at most n½ points - degree of original graph • S is a ball of radius 1. The minimum covering of all of S with balls of radius ½ is equal to the minimum vertex cover of V. • Note: reduction preserves hardness of approximation • Corollary: It is NP-hard to determine if removing k points from S can leave a set with doubling dimension d*. • So our problem is hard as well. ½ ½ 1 Proximity algorithms for nearly-doubling spaces

  10. Bicriteria algorithm • Recall that he doubling constant(of a metric M) is • the minimum value ¸>0such that every r-radius ball can be covered by ¸balls of half the radius • Define the related notion of density constant as • the minimum value m>0 such that every r-radius ball contains at mostmpoints at mutual interpoint distancer/2 • Nice property: The density constant can only decrease under the removal of points, unlike the doubling constant. • We can show that • √m(S) ≤ ¸(S) ≤ m(S) • it’s NP-hard to compute the density constant (ratio-preserving reduction from independent set) l=2, m=3 Proximity algorithms for nearly-doubling spaces

  11. Bicriteria algorithm • We will give a bicriteria algorithm for the density constant. Problem statement: • Given an n-point set S and target density constant m* • Remove from S the fewest number of points so that the remaining set has density constant at most m* • A bicriteria algorithm for the density constant is itself a bicriteria algorithm for the doubling constant • within a quadratic factor Proximity algorithms for nearly-doubling spaces

  12. Witness set • Given a set S, a subset S’ is a witness set for the density constant if • All points are at interpoint distance at least r/2 • Note that S’ is a concise proof that the density constant of S is at least |S’| • Theorem: Fix a value m’< m(S). A witness set of S of size at least √m‘ can be found in time 2O(m*) n3 • Proof outline: • For each point p and radius r define the r-ball of p. • Greedily cover all points in the r-ball with disjoint balls of radius r/2. • Then cover all points in each r/2 ball with disjoint balls of radius r/4. • Since there exists in S a witness set of size m(S), there exists a p and r so that • either there are √m(S)r/2 balls, and these form a witness set, or • one r/2 ball covers √m(S)r/4 balls, and these form a witness set. Proximity algorithms for nearly-doubling spaces

  13. Bicriteria algorithm • Recall our problem • Given an n-point set S and target density constant m* • Remove from S the fewest number of points so that the remaining set has density constant at most m* • Our bricriteria solution: • Let k be the true answer (the minimum number of points that must be removed). • We remove kc/(c-1) points and the remaining set has density constant c2m*2 Proximity algorithms for nearly-doubling spaces

  14. Bicriteria algorithm • Algorithm • Run the subroutine to identify a witness set of size at least cm* • Remove it • Repeat • Analysis • The density constant of the resulting set is not greater than c2m*2 • since we terminated without finding a witness set of size at least cm* • Every time a witness set of size w>cm* is removed by our algorithm, the optimal algorithm must remove at least w-m* points • or else the true solution would have density constant greater than m* • It follows that are algorithm removes k w/(w-m*) < kc/(c-1) points Proximity algorithms for nearly-doubling spaces

  15. Conclusion • We conclude that there exists a bicriteria algorithm for the density constant • We remove kc/(c-1) points and the remaining set has density constant c2m*2 • It follows that there exists a bricriteria algorithm for the doubling constant • We remove kc/(c-1) points and the remaining set has doubling constant c4¸*4 Proximity algorithms for nearly-doubling spaces

More Related