1 / 34

Maximizing Communication for Spam Fighting

CS294, Lecture # 19 Fall, 2011 Communication-Avoiding Algorithms www.cs.berkeley.edu/~odedsc/CS294. Maximizing Communication for Spam Fighting. Oded Schwartz. Based on: Cynthia Dwork , Andrew Goldberg, Moni Naor . On Memory-Bound Functions for Fighting Spam.

rod
Download Presentation

Maximizing Communication for Spam Fighting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS294, Lecture #19 Fall, 2011 Communication-Avoiding Algorithms www.cs.berkeley.edu/~odedsc/CS294 Maximizing Communication for Spam Fighting OdedSchwartz Based on: Cynthia Dwork, Andrew Goldberg, MoniNaor. On Memory-Bound Functions for Fighting Spam. Cynthia Dwork, MoniNaor, HoeteckWee. Pebbling and Proofs of Work Many slides borrowed from:http://www.wisdom.weizmann.ac.il/~naor/PAPERS/spam.ppt

  2. Motivation Spams in: Email, IM, Forums, SMS, … What is spam? (and who is a spammer?) From the Messaging Anti-Abuse Working Group (MAAWG) 2010 report: “The definition of spam can vary greatly from country to country and as used in local legislation” (e.g., opt-in vs. opt-out). “The percentage of email identified as abusive has oscillated (mid 2009 to end of 2010) between 88% and 91%” California business and professions code, 2007: US spam cost estimate (direct and indirect):  $13 billion (productivity , manpower, software,…)

  3. Techniques for (email) spam-fighting Email filtering White lists, black lists, greylisting Text-based Trainable filters Human assisted filters … By receiver By server

  4. Techniques for (email) spam-fighting Making the sender pay $ The system must make sending spams obtrusively unproductive for the spammer, but should not prevent legitimate users from sending their messages. Micropayments: Pay (a small amount) for each email you send.

  5. Techniques for (email) spam-fighting Pay with human attention – Reverse Turing test. Tasks that are: Easy for humans Hard for machines Examples Gender recognition Facial expression Find body parts Decide nudity Naïve drawing understanding Handwriting understanding Filling in words Disambiguation Captcha: Completely Automated Public Turing test to tell Computers and Humans Apart"

  6. Techniques for (email) spam-fighting Pay with computer time: Proof of Work (POW). Computation DworkNaor92 Back 97 AbadiBurrows ManasseWobber03 Communication Dwork, Goldberg, Naor 03 Dwork, Naor, Wee 05

  7. Proof of Work If I don’t know you, prove you spent significant computational resources, just for me, and just for this message. And make it easy for me to verify.

  8. Proof of Work Variants: Interactive: Challenge-response protocols Would you like to talk with me? Hmmm. Only if you answer my riddle. Why is a raven like a writing-desk? What's the answer? I haven't the slightest idea A. Request B. Challenge A. Respond B. Approve/reject Is interactive protocol good enough?

  9. Proof of Work Variants: One round protocols: solution-verification The challenge is self-imposed before a solution is sought by the sender, and the receiver checks both the problem choice and the found solution. Would you like to talk with me? Here is a token of my sincerity • automated for the user • non-interactive, single-pass • no need for third party or payment infrastructure A. Compute A. Solve A. Send B. Verify

  10. Choosing the function f Message m, Sender S, Receiver R and Date and time d • Hard to compute; f(m,S,R,d) • lots of work for the sender • cannot be amortized • Easy to check“z = f(m,S,R,d)” - little work for receiver • Parameterized to scale with Moore's Law • easy to exponentially increase computational cost, while barely increasing checking cost Example: computing a square root mod a prime vs. verifying it; x2 =y mod P

  11. Worst case vs. Average case Message m, Sender S, Receiver R and Date and time d • Hard to compute; f(m,S,R,d) • We want f to be hard on average: cannot be amortized (over the inputs) • It is OK if some inputs are easy(a spammer can send few emails) • It is not enough if f is hard in the worst case but not on average.Otherwise, the spammer would not be able to send everything she wants, but still would be able to send many email at a low average cost.

  12. Basing hardness: resource Goal: design a proof of work function which requires a large number of flops / memory access/ network access CPU (flops count) • Back 97, AbadiBurrows ManasseWobber03 • High variance, lower cost. Memory access • DworkNaor92, AbadiBurrows ManasseWobber03, Dwork Goldberg Naor03, DworkNaorWee 05 • Lower variance, higher cost. Network access • Abliz, Znati, 2009.

  13. Basing hardness: resource Goal: design a proof of work function which requires a large number of flops / memory access/ network access We should assume that a spammer has better resources. The exponential gaps of annual hardware improvements doesn’t matter, as long as we have a scaling parameter of effort e for the function.

  14. Hardness based on: Goal: design a proof of work function which requires a large number of flops / memory access/ network access Information theoretic bound: • Example: “Where is Waldo?”atpaajlssgijgnfrzkbvcwvjzbsubwsrderxfdybhrdrmsabmvrsyszcbgkvnhuppdponqawgrouhpycsstuklwfskbmbnvbsfhydoazsvhywsuhzqagwaldoanftqlbdloxhypfmovnbcmannlfytrvjsbwIgjhrdkeigjbmtibingojgnhxiwotphjcjsuqjdnfjtjnsWhere the truly random input is too large to fit in local/fast memory. Proved time/space separation Complexity assumption • Example: P ≠ NP Cryptographic assumption • Example: Discrete log A problem for which there is a fast verification scheme, but no sufficiently efficient algorithm is known • Example: matrix multiplication vs. matrix verification

  15. USER SPAMMER • CACHE • small but fast • CACHE • cache size at most ½ user’s main memory • MAIN MEMORY • large but slow • MAIN MEMORY • may be very very large • may exploit locality memory-bound model

  16. memory-bound model USER SPAMMER • CACHE • small but fast • CACHE • cache size at most ½ user’s main memory • charge accesses to main memory • must avoid exploitation of locality • computation is free • except for hash function calls • watch out for low-space crypto attacks • MAIN MEMORY • large but slow • MAIN MEMORY • may be very very large

  17. Example of PoW: General scheme Consider a huge graph G, so large, that a vertex name barely fits into the fast memory. The graph is implicit, i.e., is its edges are defined by functions on the vertices names. The objective function f is a path p in G with certain (rare) property that depend on (m,S,R,d). Searching for p should be hard: its running time should depend on the number of paths in G. Verifying that a given path p has the property should depend on the length of the path |p|.

  18. Successful Path Collection P of paths. Depends on (m,S,R,d) L

  19. Abstracted Algorithm Sender and Receiver share large random Table T. To send message m, Sender S, Receiver R date/time d, Repeat trial for k = 1,2, … until success: Current state specified by A auxiliary table Thread defined by (m,S,R,d,k) • Initialization:A = H0(m,S,R,d,k) • Main Loop: Walk for L steps (L=path length): c = H1(A) A = H2(A,T[c]) • Success: if lastebit of H3(A) = 00…0 Attach to (m,S,R,d) the successful trial number k and H3(A) Verification: straightforward given (m, S, R, d, k,H3 (A))

  20. H1 H2 Animated Algorithm – a Single Step in the Loop A C C = H1(A) A = H2(A,T[C]) T T[C]

  21. Full Specification E = (expected) factor by which computation cost exceeds verification = expected number of trials = 2e If H3 behaves as a random function L = length of walk Want, say, ELt = 10 seconds, where t = memory latency = 0.2 sec Reasonable choices: E = 24,000, L = 2048 Also need: How large is A? A should not be very small… abstract algorithm • Initialize: A = H0(m,S,R,d,k) • Main Loop: Walk for L steps: • c  H1(A) • A  H2(A,T[c]) • Success if H3(A) = 0log E • Trial repeated for k = 1,2, … • Proof = (m,S,R,d,k,H3(A))

  22. Path-following approach [Dwork-Goldberg-Naor Crypto 03] [Remarks] • lower bound holds for spammer maximizing throughput across any collection of messages and recipients • model idealized hash functions using random oracles • relies on information-theoretic unpredictability of T [Theorem] fix any spammer: • whose cache size is smaller than |T|/2 • assuming T is truly random • assuming H0,…,H3 are idealized hash functions the amortized number of memory accesses per successful message is (2eL).

  23. Using a succinct table [DNW 05] GOAL use a table T with a succinct description • easy distribution of software (new users) • fast updates (over slow connections) PROBLEM lose information theoretic unpredictability • spammer can exploit succinct description to avoid memory accesses IDEA generate T using a memory-bound process • Use time-space trade-offs for pebbling • Studied extensively in 1970s User builds the table T once and for all

  24. Choosing the H’s A “theoretical” approach: idealized random functions • Provide a formal analysis showing that the amortized number of memory access is high A concrete approach inspired by RC4 stream cipher • Very Efficient: a few cycles per step • Don’t have time inside inner loop to compute complex function • A is not small – changes gradually

  25. RC4 - RivestCipher 4 • Generates a pseudorandom stream of bits • Used in: • Secure Sockets Layer (SSL) • Wired Equivalent Privacy (WEP) • Wi-Fi Protected Access (WPA) • Secure Shell (SSH) • Many other applications… • Biased Outputs, allows several types of attacks, e.g.,104-bit RC4 used in 128-bit WEP in under a minute

  26. Pebbling a graph GIVEN a directed acyclic graph RULES: • inputs: a pebble can be placed on an input node at any time • a pebble can be placed on any non-input vertex if allimmediate parent nodes have pebbles • pebbles may be removed at any time GOAL find a strategy to pebble all the outputs while using few pebbles and few moves INPUT OUTPUT

  27. Succinctly generating T GIVEN a directed acyclic graph • constant in-degree input node i labeled H4(i) non-input node i labeledH4(i, labels of parent nodes) entries of T =labels of output nodes OBSERVATION good pebbling strategy  good spammer strategy Lj Lk Li = H4(i, Lj, Lk) INPUT OUTPUT

  28. Converting spammer strategy to a pebbling EX POST FACTO PEBBLING computed by offline inspection of spammer strategy • PLACING A PEBBLE place a pebble on node i if • H4 used to compute Li = H4(i, Lj, Lk), and • Lj, Lk are the correct labels • INITIAL PEBBLES place initial pebble on node j if • H4 applied with Lj as argument, and • Lj not computed via H4 • REMOVING A PEBBLE remove a pebble as soon as it’s not needed anymore • computing a label using hash function • lower bound on # moves lower bound on # hash function calls • using cache + memory fetches • lower bound on # pebbles lower bound on # memory accesses IDEA limit # of pebbles used by the spammer as a function of its cache size and # of bits it brings from memory

  29. Succinctly generating T Need a graph that is hard to pebble on average, i.e.,:Every large set of outputs requires many initial pebbles (correspond to reading from fast/slow memory) Example:Superconcentrators INPUT OUTPUT

  30. Open problems WEAKER ASSUMPTIONS • Unconditional result? • Use the red-blue pebbling model? • Use the edge expansion approach? • Other no-amortization proofs, for solving many instances in parallel/serial?

  31. CS294, Lecture #19 Fall, 2011 Communication-Avoiding Algorithms www.cs.berkeley.edu/~odedsc/CS294 Maximizing Communication for Spam fighting OdedSchwartz Based on: Cynthia Dwork, Andrew Goldberg, MoniNaor. On Memory-Bound Functions for Fighting Spam. Cynthia Dwork, MoniNaor, HoeteckWee. Pebbling and Proofs of Work Many slides borrowed from:http://www.wisdom.weizmann.ac.il/~naor/PAPERS/spam.ppt

  32. Expansion (3rd approach) [Ballard, Demmel, Holtz, S. 2011b], in the spirit of [Hong & Kung 81] Let G = (V,E) be a d-regular graph A is the normalized adjacency matrix, with eigenvalues: 1 =  1≥ 2≥…≥ n   1 - max{2, | n|} Thm: [Alon-Milman84, Dodziuk84, Alon86] For small sets:

  33. Expansion (3rd approach) The Computation Directed Acyclic Graph WS V S S: subset of computation RS: reads WS: writes Input / OutputIntermediate valueDependency RS Communication-cost is Graph-expansion

  34. CS294, Lecture #19 Fall, 2011 Communication-Avoiding Algorithms www.cs.berkeley.edu/~odedsc/CS294 Maximizing Communication for Spam fighting OdedSchwartz Based on: Cynthia Dwork, Andrew Goldberg, MoniNaor. On Memory-Bound Functions for Fighting Spam. Cynthia Dwork, MoniNaor, HoeteckWee. Pebbling and Proofs of Work Many slides borrowed from:http://www.wisdom.weizmann.ac.il/~naor/PAPERS/spam.ppt

More Related