410 likes | 443 Views
Explore the world of big data analytics with graphs and tensors through the lens of anomaly detection. Dive deep into the motivation behind data mining, graph analysis, and neuro-semantics. Discover how tensor factorization and concept discovery enhance anomaly detection methods. Learn about scalability challenges and fraud detection techniques in online reviews.
E N D
Big (graph) data analytics Christos Faloutsos CMU
CONGRATULATIONS! Welcome to CMU! C. Faloutsos
Outline • Q+A • Problem definition / Motivation • Graphs, tensors and brains • Anomaly detection • Conclusions C. Faloutsos
Q+A • Are you recruiting? How many? • How many do you have? • How frequently you meet them? • What is your advising style? • How do you feel about summer internships? C. Faloutsos
Q+A • 1 or 2 • 6 (+5pdocs) • 1/week • results • Yes/Maybe (FB, MSR, IBM, ++) • Are you recruiting? How many? • How many do you have? • How frequently you meet them? • What is your advising style? • How do you feel about summer internships? C. Faloutsos
Outline • Q+A • Problem definition / Motivation • Graphs, tensors and brains • Anomaly detection • Conclusions C. Faloutsos
Motivation • Data mining: ~ find patterns (rules, outliers) • How do real graphs look like? Anomalies? • Time series / Monitoring Measles @ PA, NY, … C. Faloutsos
Graphs - why should we care? C. Faloutsos
Graphs - why should we care? Food Web [Martinez ’91] ~1B users $10-$100B revenue Internet Map [lumeta.com] C. Faloutsos
Outline • Q+A • Problem definition / Motivation • Graphs, tensors and brains • Anomaly detection • Conclusions C. Faloutsos
NELL & concepts (=groups) • Predicates (subject, verb, object) in knowledge base Vagelis Papalexakis CMU-CS Tom Mitchell CMU/CS-MLD “Eric Claptonplays guitar” (48M) NELL (Never Ending Language Learner) data Nonzeros =144M “Barack Obamaisthe president of U.S.” (26M) (26M) C. Faloutsos
Answer : tensor factorization • Recall: (SVD) matrix factorization: finds blocks ‘meat-eaters’ ‘steaks’ ‘kids’ ‘cookies’ ‘vegetarians’ ‘plants’ M products N users + + ~ C. Faloutsos
Answer : tensor factorization • PARAFAC decomposition artists athletes politicians verb + + subject = object C. Faloutsos
Answer : tensor factorization • PARAFAC decomposition • Results for who-calls-whom-when • 4M x 15 days ?? ?? ?? time + + caller = callee C. Faloutsos
Concept Discovery • Concept Discovery in Knowledge Base C. Faloutsos
Concept Discovery • Concept Discovery in Knowledge Base NP1: Internet, file, data NP2: Protocol, software, suite C. Faloutsos
Neuro-semantics • Brain Scan Data* • 9 persons • 60 nouns • Questions • 218 questions • ‘is it alive?’, ‘can you eat it?’ *Mitchell et al. Predicting human brain activity associated with the meanings of nouns. Science,2008. Data@ www.cs.cmu.edu/afs/cs/project/theo-73/www/science2008/data.html C. Faloutsos
Neuro-semantics • Brain Scan Data* • 9 persons • 60 nouns • Questions • 218 questions • ‘is it alive?’, ‘can you eat it?’ Patterns? C. Faloutsos
Neuro-semantics • Brain Scan Data* • 9 persons • 60 nouns • Questions • 218 questions • ‘is it alive?’, ‘can you eat it?’ questions Patterns? airplane dog … nouns persons voxels C. Faloutsos
Neuro-semantics = C. Faloutsos
Neuro-semantics Small items -> Premotor cortex = C. Faloutsos
Neuro-semantics Small items -> Premotor cortex Evangelos Papalexakis, Tom Mitchell, Nicholas Sidiropoulos, Christos Faloutsos, Partha Pratim Talukdar, Brian Murphy, Turbo-SMT: Accelerating Coupled Sparse Matrix-Tensor Factorizations by 200x, SDM 2014 C. Faloutsos
Scalability Google: > 450,000 processors in clusters of ~2000 processors each [Barroso+, “Web Search for a Planet: The Google Cluster Architecture” IEEE Micro 2003] Yahoo: 5Pb of data [Fayyad, KDD’07] Google-NY, Aug’14: ‘graph with 1T edges, 300B nodes’ Problem: machine failures, on a daily basis How to parallelize data mining tasks, then? A: map/reduce – hadoop (open-source clone) http://hadoop.apache.org/ C. Faloutsos
Outline • Q+A • Problem definition / Motivation • Graphs, tensors and brains • Anomaly/fraud detection • Conclusions C. Faloutsos
App-store fraud Opinion Fraud Detection in Online Reviews using Network Effects Leman Akoglu, Rishi Chandy, CF ICWSM’13 (NSF grant, with Alex Beutel) C. Faloutsos
Problem • Given • user-product review network • review sign (+/-) • Classify • objects into type-specific classes: users: `honest’ / `fraudster’ products: `good’ / `bad’ reviews: `genuine’ / `fake’ No side data! (e.g., timestamp, review text) C. Faloutsos
Formulation: BP – User Product honestbad honest good + Before After C. Faloutsos
Top scorers Users Products + positive (4-5) rating o negative (1-2) rating C. Faloutsos
Top scorers Users Products + positive (4-5) rating o negative (1-2) rating C. Faloutsos
‘Fraud-bot’ member reviews Same day activity! Same developer! Duplicated text! C. Faloutsos
Outline • Q+A • Problem definition / Motivation • Graphs, tensors and brains • Anomaly/fraud detection • Time series, monitoring / forecasting • Conclusions C. Faloutsos
‘Tycho’ – epidemics analysis Yasuko Matsubara 50 states x 46 diseases C. Faloutsos
‘Tycho’ – epidemics analysis Prof. Yasuko Matsubara C. Faloutsos
‘Tycho’ – epidemics analysis Prof. Yasuko Matsubara Flu? Measles? August? No periodicity? C. Faloutsos
‘Tycho’ – epidemics analysis Prof. Yasuko Matsubara Flu? Measles? August? No periodicity? C. Faloutsos
‘Tycho’ – epidemics analysis Prof. Yasuko Matsubara Flu? Measles? August? No periodicity? C. Faloutsos
‘Tycho’ – epidemics analysis Prof. Yasuko Matsubara Flu? Measles? August? No periodicity? C. Faloutsos
‘Tycho’ – epidemics analysis Prof. Yasuko Matsubara Flu? Measles? August? No periodicity? C. Faloutsos
‘Tycho’ – epidemics analysis Prof. Yasuko Matsubara https://www.tycho.pitt.edu/resources.php from U. Pitt (epidemiology dept.) Yasuko Matsubara, Yasushi Sakurai, Willem van Panhuis, and Christos Faloutsos, FUNNEL: Automatic Mining of Spatially Coevolving Epidemics, KDD 2014, New York City, NY, USA, Aug. 24-27, 2014. C. Faloutsos
Open research questions • Patterns/anomalies for time-evolving graphs (Call graph, 3M people x 6mo) • Spot fraudsters in soc-net (eg., Twitter ‘$10 -> 1000 followers’) • How is the human brain wired C. Faloutsos
Contact info • www.cs.cmu.edu/~christos • GHC 8019 • Ph#: x8.1457 • www.cs.cmu.edu/~christos/TALKS/14-09-ic/ • FYI: Course: 15-826, Tu-Th 3:00-4:20 • and, again WELCOME! C. Faloutsos