1 / 77

Making Diffusion Work for You

Making Diffusion Work for You. B. Aditya Prakash Computer Science Virginia Tech. GraphEx Symposium, MIT Endicott House, Aug 21, 2014. Thanks!. Ali Pinar Ben Miller. Networks are everywhere!. Facebook Network [2010]. Gene Regulatory Network [ Decourty 2008].

Download Presentation

Making Diffusion Work for You

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Making Diffusion Work for You B. Aditya Prakash Computer Science Virginia Tech. GraphExSymposium, MIT Endicott House, Aug 21, 2014

  2. Thanks! • Ali Pinar • Ben Miller Prakash 2014

  3. Networks are everywhere! Facebook Network [2010] Gene Regulatory Network [Decourty 2008] Human Disease Network [Barabasi 2007] The Internet [2005] Prakash 2014

  4. Dynamical Processes over networks are also everywhere! Prakash 2014

  5. Why do we care? • Social collaboration • Information Diffusion • Viral Marketing • Epidemiology and Public Health • Cyber Security • Human mobility • Games and Virtual Worlds • Ecology ........ Prakash 2014

  6. Why do we care? (1: Epidemiology) • Dynamical Processes over networks [AJPH 2007] SI Model CDC data: Visualization of the first 35 tuberculosis (TB) patients and their 1039 contacts Diseases over contact networks Prakash 2014

  7. Why do we care? (1: Epidemiology) • Dynamical Processes over networks • Each circle is a hospital • ~3000 hospitals • More than 30,000 patients transferred [US-MEDICARE NETWORK 2005] Problem: Given k units of disinfectant, whom to immunize? Prakash 2014

  8. Why do we care? (1: Epidemiology) ~6x fewer! [US-MEDICARE NETWORK 2005] CURRENT PRACTICE OUR METHOD Hospital-acquired inf. took 99K+ lives, cost $5B+ (all per year) Prakash 2014

  9. Why do we care? (2: Online Diffusion) > 800m users, ~$1B revenue [WSJ 2010] ~100m active users > 50m users Prakash 2014

  10. Why do we care? (2: Online Diffusion) • Dynamical Processes over networks Buy Versace™! Followers Celebrity Social Media Marketing Prakash 2014

  11. Why do we care? (3: To change the world?) • Dynamical Processes over networks Social networks and Collaborative Action Prakash 2014

  12. High Impact – Multiple Settings epidemic out-breaks Q. How to squash rumors faster? Q. How do opinions spread? Q. How to market better? products/viruses transmit s/w patches Prakash 2014

  13. Research Theme ANALYSIS Understanding POLICY/ ACTION Managing/Utilizing DATA Large real-world networks & processes Prakash 2014

  14. Research Theme – Public Health ANALYSIS Will an epidemic happen? POLICY/ ACTION How to control out-breaks? DATA Modeling # patient transfers Prakash 2014

  15. Research Theme – Social Media ANALYSIS # cascades in future? POLICY/ ACTION How to market better? DATA Modeling Tweets spreading Prakash 2014

  16. In this talk Q1: How to ‘zoom-out’ of graphs? Q2: How to control out-breaks? POLICY/ ACTION Utilizing Prakash 2014

  17. In this talk Q3: How does ‘activity’ evolve over time? DATA Large real-world networks & processes Prakash 2014

  18. Outline • Motivation • Part 1: Policy and Action (Algorithms) • Part 2: Learning Models (Empirical Studies) • Conclusion Prakash 2014

  19. Part 1: Algorithms • Q1: How to zoom-out of a network? • Q2: How to control out-breaks? (Broad theme: Network Topology Manipulation) Prakash 2014

  20. “Zoom-out” of the network • “Zoom-out” of the cascade graph to get a quick picture (= summarization) A D D A Zoom-out C C B B F E F E Smaller representation of the network Big graph Coarsening [Purohit, Prakash, et, al. SIGKDD 2014] Prakash 2014

  21. Challenges • C1: How do we maintain diffusive characteristics when coarsening networks? • C2: How do we merge node to get the coarse network? • C3: how do we find the best node to merge fast? Prakash 2014

  22. C1: Modeling diffusion • Information spreads over networks • e.g.:, rumor/meme spreads over Twitter following network • Independent cascade model (IC) [Kempe+, KDD03] • Weights pij: propagation prob. from i to j • Each node has only one chance to infect its neighbors Meme spreading Prakash 2014

  23. Diffusive characteristics • First eigenvalue λ1(of adjacency matrix) is sufficient for most diffusion models. [Prakash et al. ICDM’12 selected for best papers] λ1 is the epidemic threshold (will there be an epidemic?) “Safe” “Vulnerable” “Deadly” Increasing λ1 , Increasing vulnerability Prakash 2014

  24. C1: maintain diffusive characteristics • Goal: maintain the diffusive characteristics of the original network in the coarsened network Make the coarsened network have the least change in the first eigenvalue A D D A coarsen C C B B F E F E Coarsened network Original network Prakash 2014

  25. C2: How to merge nodes • Goal: Merge nodes of graph G to get the coarsened graph that “approximates” G with respect to diffusion Original network • Merge b and a can get the least change of λ1 0.375! Is this correct? Influence from d to b: 0.5 Influence from d to a: 0.25 Average: 0.375 Prakash 2014

  26. C2: How to merge nodes • In general: Merging a,b Prakash 2014

  27. Problem Definition Graph Coarsening Problem (GCP) • Given:a large graph G, and the reduction factor • Find:the best set of adjacent nodes • To minimize |λG-λH| where H is the coarsened graph • i.e.: H has the least change in the first eigenvalue • we use to λG represent the first eigenvalue of G Prakash 2014

  28. C3: which nodes to merge • Goal: • Find the best nodes to merge • Fast, scalable to large network A D D A coarsen C C B B F E F E Coarsened network Original network Prakash 2014

  29. Naive Greedy Heuristic Step: • Score every edge by the change in eigenvalue • Greedily choose the edge (a,b) with the least score, and merge (a,b) • Re-evaluate the scores of every edge and repeat • Too slow! O(m2) time to score all edges • Lose time benefits of analyzing the smaller graph Prakash 2014

  30. CoarseNet: idea • Can we approximate the edge scores faster? • Yes! • Use matrix perturbation arguments to estimate (up to first order terms) the score of an edge in constant time (skipping details) • Score all edges in O(m)time • Naive Heuristic: O(m2) time Prakash 2014

  31. CoarseNet: Complete algorithm • Step 1: compute scores for all edge pairs 2: Merge nodes with smallest score 3. Goto step 1 until αn nodes left Assigning scores Merging edges Original Network (weight=0.5) Coarsened Network Prakash 2014

  32. How do we perform? DBLP Amazon Higher is better The first eigenvalue gets preserved well up to large coarsening factors! (See more results in the paper) Prakash 2014

  33. Application 1: Influence Maximization • Methodology: Step 1: Coarsen the large social network using CoarsenNet Step 2: Solve influence maximization on the coarsened network Step 3: Randomly select one node from each selected “supernode” D A Step 2: Solve influence maximization Step 1: Coarsen C B D F A E C B F E Step 3: Randomly select one node from C We call it CSPIN Prakash 2014

  34. Quality of CSPIN w.r.t We can merge up to 95% of the vertices are merged without significantly affecting the influence spread! Prakash 2014

  35. Application 2: Diffusion Characterization • Goal: use Graph Coarsening to understand information cascades • Dataset: Flixster • a fridendship network with movie ratings • Cascade: the same movie rating from friends • Methodology • coarsen the network using CoarseNet with the reduction factor α=0.5 • study the formed groups (supernodes) • Can get non-network surrogates Prakash 2014

  36. Diffusion observation • Stats: • 1891 groups • mean group size: 16.6 • the largest group: 22061 nodes (roughly 40% of nodes) (See more results in the paper) Observation 1: a very large fraction of movies propagate in a small number of groups Observation 2: a multi-modal distribution Prakash 2014

  37. Future work… • How is it related to community structure? • More applications, like Visualization… • Parallelization Prakash 2014

  38. Part 1: Algorithms • Q1: How to zoom-out of a network? • Q2: How to control out-breaks? Prakash 2014

  39. Immunization (= Interventions) • Different Flavors: • Pre-emptive • Data-aware Prakash 2014

  40. Pre-emptive: Vulnerability (Again!) • First eigenvalue λ1(of adjacency matrix) is sufficient for most diffusion models. [Prakash et al. ICDM’12 selected for best papers] λ1 is the epidemic threshold “Safe” “Vulnerable” “Deadly” Increasing λ1 , Increasing vulnerability Prakash 2014

  41. Goal • Decrease λ1as much as possible • Node based [Tong, P., + ICDM 2010] • Edge-based [Tong, P., Eliassi-Rad+ CIKM 2012, Best Paper Award] • Edge-Manipulation [P., Adamic+ SDM 2013] Prakash 2014

  42. Latest results • First (provable) approximation algorithms for edge-based problem (under submission [Saha, Adiga, P., Vullikanti2014]) • O(log^2 n)--factor (can be improved to O(log n)) • Based on the idea of removing closed walks • Semi-Definite Programming Rounding-based O(1) factor Prakash 2014

  43. Data-aware Immunization [Zhang and Prakash, SDM 2014] Given: Graph and Infected nodes Find: ‘best’ nodes for immunization • Complexity • NP-hard • Hard to approximate within an absolute error • DAVA-tree • Optimal solution on the tree • DAVA and DAVA-fast • Merging infected nodes • Build a “dominator tree”, and run DAVA-tree • Running time: subquadratic • DAVA: O(k(|E|+ |V|log|V|)) • DAVA-fast: O(|E|+|V|log|V|) Graph with infected nodes Dominator tree Prakash 2014

  44. Extensions • Can be extended to Uncertain and noisy initial data as well! [Zhang and Prakash, CIKM 2014] Twitter Firehose API 1% sample Prakash 2014

  45. Outline • Motivation • Part 1: Policy and Action (Algorithms) • Part 2: Learning Models (Empirical Studies) • Conclusion Prakash 2014

  46. Part 2: Empirical Studies • Q3: How does activity evolve over time? Prakash 2014

  47. Google Search Volume e.g., given (1) first spike, (2) release date of two sequel movies (3) access volume before the release date (1) First spike (2) Release date (3) Two weeks before release ? ? Prakash 2014

  48. Patterns Y X Prakash 2014

  49. Patterns Y More Data X Prakash 2014

  50. Patterns Y Anomaly ? X Prakash 2014

More Related