1 / 18

Automated Social Hierarchy Detection through Email Network Analysis

Automated Social Hierarchy Detection through Email Network Analysis. (SNAKDD07) Ryan Rowe, Germ´an Creamer, Shlomo Hershkop, Salvatore J Stolfo. Advisor: Dr. Koh Jia-Ling Reporter: Che-Wei, Liang Date: 2008/12/11. Outline. Introduction SNA algorithm Results and Discussion

marci
Download Presentation

Automated Social Hierarchy Detection through Email Network Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Automated Social Hierarchy Detection through Email Network Analysis (SNAKDD07) Ryan Rowe, Germ´an Creamer, Shlomo Hershkop, Salvatore J Stolfo Advisor: Dr. Koh Jia-Ling Reporter: Che-Wei, Liang Date: 2008/12/11

  2. Outline • Introduction • SNA algorithm • Results and Discussion • Conclusions and Future Work

  3. Introduction • The recent bankruptcy scandals in US companies such as Enron and WorldCom have increased the need to analyze electronic information • In order to define risk and identify any conflict of interest among the entities of a corporate household • Identifying the relationships between entities, or corporate hierarchy is not a straightforward task • Can be extracted by analyzing the email communication data

  4. SNA Algorithm • For each mail user • Analyze and calculate several statistics for each feature of each user • Construct an email network graph • Vertices represent accounts, edges represent communication between two accounts • Analysis cliques and other graph theoretical qualities • CombinedtoSocialscore

  5. SNA Algorithm • Two sets of statistics about user’s “importance” • Average response time • The average time elapsed between a user sending an email and later receiving an email from that same user • Considered a “response” if a received mail succeeds a sent mail within three days • Cliques(maximal complete subgraphs) • find all cliquesinagraph • Assumptions: users associated with a larger set and frequency of cliques will be ranked higher

  6. Cliques

  7. Communication Networks • Numberofcliques • Thenumberofcliquesthattheaccountiscontainedwithin • Rawcliquescore • Ascorecomputedusingthesizeofcliqueset • Weightedcliquescore • Ascorecomputedusingthe“importance”ofthepeopleineachclique

  8. Communication Networks • Degreecentrality • Deg(vi)= ∑ jaij(aij entryofadjacentmatrixAofG) • Clusteringcoefficient • how close the vertex and its neighborsare to being a clique

  9. Communication Networks • MeanofshortestpathlengthfromaspecificvertextoallverticesinthegraphG wheredijD,DisthegeodesicdistancematrixofG • Betweenesscentrality • Proportionofallgeodesicdistancesofallothervertexthatincludevertexvi

  10. Communication Networks • “Hubs-and-authorities”importance • Calculatesthe“hubs-and-authorities”importanceofeachvertex • J. Kleinberg. Authoritative sources in a hyperlinkedenvironment. Journal of the ACM, 46, 1999.

  11. Social Score • Social score • Rank users from most important to least important • Group users which have similar social scores and clique connectivity • Determine n different levels of social hierarchy within which to place all the users

  12. Compute Social Score • Scale and normalize each statistics • Social score • A score between 0 and 100

  13. Results and Discussion • Using EMT • Java based email analysis engine built on a database back-end • JUNG library is used for the degree and centrality measures • Present the analysis of the North American West Power Traders division of Enron Corporation

  14. Conclusions and Future Work • Enron dataset provides an excellent starting point of real world data • By varying the feature weights, it is possible to • Pick out the most important individual • Group individuals with similar social qualities • Graphically draw an organization chart which approximately simulates the real social hierarchy

  15. Conclusions and Future Work • The concept of average response time can be reworked by considering the order of response • Consider common email usage times for each user and to adjust the received time of email • New grouping and division algorithms are being considered • Graph edges should be considered into arrange users into different level

More Related