1 / 29

Network Applications of Bloom Filters: A Survey

Network Applications of Bloom Filters: A Survey. Andrei Broder and Michael Mitzenmacher Presenter: Chen Qian Original presenter: Hongkun Yang. Outline. Bloom Filter Overview Standard Bloom Filters Counting Bloom Filters Historical Applications Network Applications

gezana
Download Presentation

Network Applications of Bloom Filters: A Survey

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Network Applications of Bloom Filters: A Survey Andrei Broder and Michael Mitzenmacher Presenter: Chen Qian Original presenter: HongkunYang

  2. Outline • Bloom Filter Overview • Standard Bloom Filters • Counting Bloom Filters • Historical Applications • Network Applications • Distributed Caching • P2P/Overlay Networks • Resource Routing • Conclusion

  3. Standard Bloom Filters: Notations • S the set of n elements {x1, x2, …, xn} • k independent hash functions h1, …, hkwith range {1, …, m}. • Assume: hash functions map each item in the universe to a random number uniformly over the range {1, …, m} • MD5 • An array B of m bits, initially filled with 0s

  4. Standard Bloom Filters: How It Works • Hash each xi in Sk times. If Hj(xi) = 1, set B[=1. • To check whether y is in S, check B at H_j(y), j = 1,2,…,k • If all k values are set to 1, y is assumed to be in S, • If not, yis clearly not in S. No False Negative Possible False Positive

  5. Standard Bloom Filters: An Example 0 0 0 0 0 0 B INTIAL STATE

  6. Standard Bloom Filters: An Example x1 x2 0 1 0 0 0 1 0 1 0 B INSERTION

  7. Standard Bloom Filters: An Example y1 y2 1 0 0 1 0 1 B CHECK

  8. Overview • Burton Bloom introduced it in 1970s • Randomized data structure • Representing a set to support membership queries • Dramatic space savings • Allow false positives

  9. Bloom Filter Principle “Wherever a list or set is used, and space is at a premium, consider using a Bloom filter if the effect of false positives can be mitigated.” “Network Applications of Bloom Filters: A Survey”, A. Broder and M. Mitzenmacher

  10. Standard Bloom Filters: False Positive Rate (1) • Pr[a given bit in B is 0]= • The probability of a false positive is • Let rbe the proportion of 0 bits after all elements are inserted in the Bloom filter • Conditioned on r, the probability of a false positive is

  11. Standard Bloom Filters: False Positive Rate (2) • The fraction of 0 bits is extremely concentrated around its expectation • Therefore, with high probability,

  12. Standard Bloom Filters: Optimal Number of Hash Functions (1) • Two competing forces: • More hash functions gives more chances to find a 0 bit for an element that is not a member of S • Fewer hash functions increases the fraction of 0 bits in the array

  13. Standard Bloom Filters: Optimal Number of Hash Functions (2)

  14. Standard Bloom Filters: Space Efficiency • A lower bound • Let e be the false positive ratio, then • The optimal case • The false posive rate for the optimal Bloom filter is • Let f>e

  15. Standard Bloom Filters: Operations (1) • Union • Build a Bloom filter representing the union of A and B by taking the OR of BF(A) and BF(B) • Shrinking a Bloom filter • Halving the size by taking the OR of the first and the second half of the Bloom filter • Increase false positive rate • The intersection of two sets

  16. Counting Bloom Filters: Motivation • Standard Bloom filters • Easy to insert elements • Cannot perform deletion operations • Counting Bloom filters • Each entry is not a single bit but a small counter • Insert an element: increment the corresponding counters • Delete an element: decrement the corresponding counters

  17. Counting Bloom Filters: An Example 0 0 0 0 0 0 B INTIAL STATE

  18. Counting Bloom Filters: An Example x1 x2 0 1 0 0 0 1 0 1 0 2 B INSERTION

  19. Counting Bloom Filters: An Example x1 1 0 0 0 1 2 0 1 B DELETION

  20. Historical Applications • Dictionaries • Hyphenation programs • UNIX spell-checkers • Dictionary of unsuitable passwords • Databases • Semi-join operations • Differential files

  21. Distributed Caching: Scenario

  22. Distributed Caching: Summary Cache • Motivation • Sharing of caches among Web proxies to reduce Web traffic and alleviate network bottlenecks • Directly sharing lists of URLs has too much overhead • Solution • Use Bloom filters to reduce network traffic • Use a counting Bloom filter to track cache contents • Broadcast the corresponding standard Bloom filter to other proxies

  23. P2P/Overlay Networks: Content Delivery • Problem • Peer A has a set of items SA, peer B has SB, B wants useful items from A (SA-SB) • Solution • B sends A its Bloom filter BF(B) • A sends B its items that is not in SB according to BF(B) • Implications of false positives • Not all elements in SA-SBwill be sent • A large fraction of SA-SBis sufficient (not necessarily the entire set)

  24. P2P/Overlay Networks: Efficient P2P Keyword Searching (1) • Problem • Peer A has a set of items SA, peer B has SB, A wants to determine • Solution • A sends B its Bloom filter BF(A) • B sends A its items that appears to be in SAaccording to BF(A) • B eliminates false positives and determines exactly • Fewer bits transmitted than A sending the entire set SA

  25. P2P/Overlay Networks: Efficient P2P Keyword Searching (2) ServerA ServerB (2) BF(A) 3 4 6 1 2 3 4 3 4 5 6 SA SB 3 4 (1) request Client

  26. Resource Routing (1) • Network is in the form of a rooted tree • Nodes hold resources • Each node keeps Bloom filters representing • A unified list of resources that it holds or reachable through one of its children • Individual lists of resources for it and each child. • When receiving a request for a resource • Check the unified list to see whether the node or its descendants hold the resource • Yes: check the individual lists • No: forward the request up the tree toward the root

  27. Resources Routing (2)

  28. Conclusion • Simple space-efficient representation of a set or a list that can handle membership queries • Applications in numerous networking problem • Bloom filter principle

  29. THANK YOU!

More Related