260 likes | 394 Views
Retweeting Behavior and Spectral Graph Analysis in Social Media. Xintao Wu Jan 18, 2013 . Social Media Customer Analytics . Entity resolution Patterns Temporal/spatial Scalability Visualization Sentiment Privacy. Network topology. Structured profile.
E N D
Retweeting Behavior and Spectral Graph Analysis in Social Media Xintao Wu Jan 18, 2013
Social Media Customer Analytics • Entity resolution • Patterns • Temporal/spatial • Scalability • Visualization • Sentiment • Privacy • Network topology • Structured profile • Customer profile • Customer transaction • Inventory • Product desc and review • … • Retweet sequence • Unstructured text (e.g., blog, tweet)
Outline • Examining retweeting behavior to understand information propagation • Multi-factor interaction analysis • Coverage prediction • Burst detection • Spectral graph analysis • Community partition • Fraud detection
Multi-factor interaction analysis • For each following relationship , what factors affect the user A’s decision on whether to forward messages from B to A’ s followers? • We examine users’ retweet behaviors by using various features • Power ratio (A) • Link structure (B) • Location factor (C) • Gender factor (D) • … • We apply a fitted Log-linear model to capture and interpret interaction patterns among features A-D and retweet E.
Interpretation example • Neither gender nor location has any significant effect on retweeting solely. • However, considering link structure, • Females are more conservative and have a lower tendency to retweet messages from non-friend (especially female) users, but have a higher tendency to retweet messages from friends or superstars. • Males are more open-minded and have a higher tendency to retweet messages from non-friend (especially female) users.
Outline • Examining retweeting behavior to understand information propagation • Multi-factor interaction analysis • Coverage prediction • Burst detection • Spectral graph analysis • Community partition • Fraud detection
Retweet Sequence • Information dynamically flows through the network. Alice … … Cathy Bob … … … David Ellen Fred … … … D3 D1 D2
Retweet Sequence • Information dynamically flows through a social network. Alice … … Cathy Bob … … … David Ellen Fred … … … D3 D1 D2
Flow Through Tree Structure • Information dynamically flows through a social network. Alice … … Cathy Bob … … … David Ellen Fred … … … D3 D1 D2
Flow Through Tree Structure • Information dynamically flows through a social network. Alice … … Cathy Bob … … … David Ellen Fred … … … D3 D1 D2
WISE12 Challenge • SinaWeibo • # of user: 5,636,858 • # of tweets: 46,584,914 • # of retweets: 190,920,026 • 33 test messages • each with 100 initial retweets • composed by 27 users • from 6 events • For each message, predict • M1: the number of retweets in 30 days • M2: the number of possible-views in 30 days
Idea • We treat retweeting activities of each original message in the training data as a time series • Each value corresponds to the number of times that the original message during time period t • For each message in the test data Known from 100 retweets Use ARMA to predict
Prediction Result Death of Steve Jobs Yao Jiaxin Murder Case Xiaomi Release Xiaomi Release Runner-up award (2nd place) on WISE 2012 Challenge – Mining Track.
Outline • Examining retweeting behavior to understand information propagation • Multi-factor interaction analysis • Coverage prediction • Burst detection • Spectral graph analysis • Community partition • Fraud detection
Bursts Peak Time Duration Time
Burst Analysis : Users • Top 100 users tend to have: shorter path length, shorter peak time, shorter duration time.
Burst Prediction • Extract features • User related including profile and history information • Tweet-related including time series and retweet tree • Run classifiers • Logistic regression • Random forest • Decision tree • Naïve bayes • SVM • KNN • Achieve 83.2% accuracy
Outline • Examining retweeting behavior to understand information propagation • Multi-factor interaction analysis • Coverage prediction • Burst detection • Spectral graph analysis • Community partition • Fraud detection
Spectral graph analysis Spectral coordinate: Polbook Network
Accuracy of AdjCluster • Lap [Miller and Teng 1998]: Laplacian based • Ncut[Shi and Malik, 2000]: Normalized cut • HE’ [Wakita and Tsurumi, 2007]: Modularity based agglomerative clustering • SpokEn[Prakashet al., 2010]: EigenSpoke Accuracy: where :the i-th community produced by different algorithms Refer to IJCAI 11 for details
SPCTRA fraud detection • Evaluation on Web spam challenge data 100-1000 times faster GREEDY: based on outer-triangles [Shrivastava, ICDE, 2008] Refer to ICDE11details.
Thank You! Questions? Acknowledgments This work was supported in part by U.S. National Science Foundation CNS-0831204 and CCF-1047621, and UNC Charlotte Chancellor’s Special Fund .