300 likes | 617 Views
北京大学计算机科学技术研究所 Institute of Computer Science and Technology of Peking University. Graph Data Management. Instructor: ZOU’ lei zoulei@icst.pku.edu.cn. Outline. Applications and Challenges of Graph Data Exiting Graph Database Systems About the course. Outline.
E N D
北京大学计算机科学技术研究所 Institute of Computer Science and Technology of Peking University Graph Data Management Instructor: ZOU’ lei zoulei@icst.pku.edu.cn
Outline • Applications and Challenges of Graph Data • Exiting Graph Database Systems • About the course
Outline • Applications and Challenges of Graph Data Management • Exiting Graph Database Systems • About the course
Graph Data (a) Protein Network (b) Social Network
Some Challenges in Large Graph Data Management • An Example: Considering a SNS website, there are more than 1 billion active users. Query: I want to know whether “Tom is a friend of Jack, or a friend of his friends…?” Possible Solutions: (Storage) Store the connections between individuals in a relational table (Query) Perform Self-join Recursively….
Some Challenges in Large Graph Data Management recursivequeries
Network Motifs: Simple Building Blocks of Complex Networks (R. Milo, et al.@SCIENCE03)
Network Motifs: Simple Building Blocks of Complex Networks (R. Milo, et al.@SCIENCE03) • Network motifs are patterns (sub-graphs) that recur within a network much more often than expected at random. Network motifs always correspond to some functional patterns in different networks. Questions: • How to find such motifs efficiently ? • Given a motif, how to find all embeddings of this motif efficiently?
Frequent Subgraph Pattern Mining Graph Dataset (A) (B) (C) Frequent Patterns (min support is 2) (2) (1)
query graph graph database Subgraph Search Query: Which compounds contain “benzene ring” ?
Reachablility Query 15 • ?Query(1,11) • Yes • ?Query(3,9) • No 14 11 13 10 12 6 7 8 9 3 4 5 1 2
Shortest Path Distance Query What’s the distance between two specified individuals ?
RDF Data Management The Resource Description Framework (RDF) is a family of World Wide Web Consortium (W3C) specifications originally designed as a metadata data model. WWW Web of Pages Semantic Web Web of Data
An RDF Data Example –Yago Project Structural Data
SPARQL Query Query: Find all individuals who were born on Feb. 12, 1809 and died on April. 15, 1865. SPARQL Syntax Select ?name Where { ?m <hasName> ?name. ?m <BornOnDate> “1809-02-12”. ?m <DiedOnDate> “1865-04-15”. } Query Graph
Outline • Applications and Challenges of Graph Data • Exiting Graph Database Systems • About the course
Some Existing Graph Database Systems The following is a list of several well-known graph database projects: • HyperGraphDB - an open-source (LPGL) graph database supporting generalized hypergraphs where edges can point to other edges • InfoGrid - an open-source / commercial (AGPLv3, free for small entities)graph database with web front end and configurable storage engines (MySQL, PostgreSQL, Files, Hadoop)
Some Existing Graph Database Systems • Neo4j - an open-source / commercial (AGPLv3)graph database • DEX - A high-performance graph database and so on… International Graph Database Workshops: http://www.icst.pku.edu.cn/IWGD2010/index.html http://www.cse.unsw.edu.au/~gdm2011/
An Example of Neo4j Finding friends of “Thomas Anderson” and the friends of the friends too • Neo4j http://wiki.neo4j.org/content/The_Matrix
Neo4j API---An Example private void printFriends( Node person ) { Traverser traverser = person.traverse( Order.BREADTH_FIRST, //Traverse图的模式 StopEvaluator.END_OF_GRAPH, // Traverse图的停止条件 ReturnableEvaluator.ALL_BUT_START_NODE, // 哪些图节点被返回 MyRelationshipTypes.KNOWS, //按照那些边来进行Traverse Direction.OUTGOING ); // Traverse的方向 for ( Node friend : traverser ) { System.out.println( friend.getProperty( "name" ) ); } }
Outline • Applications and Challenges of Graph Data • Exiting Graph Database Systems • About the course
Course Content • Graph Mining - frequent subgraph mining • Indexing & Query Processing - reachablility query - shortest path query - subgraph query - keyword search • RDF Data Management - Indexing & SPARQL Query Processing
课程网站 • 网址: http://www.icst.pku.edu.cn/course/Graphdb/index.html • 教材(作者、书名、出版社及出版年): 1. 《数据挖掘概念与技术》 Jiawei Han & Micheline Kamber 著, 范明&孟小峰 译,机械工业出版社 (第二版) 2.《MANAGING AND MINING GRAPH DATA》, edited by CHARU C. AGGARWAL, HAIXUN WANG, Kluwer Academic Publishers, 2009 3. 《语义网基础》 Grigoris Antoniou;Frank van Harmelen 著, 机械工业出版社, 2008
课程考核 • 课堂报告 (30%) 每位学生报告一篇数据库领域(含数据挖掘,信息检索相关领域)顶级论文(20分钟+5分钟提问) • 作业(30%) 3 项作业,完成3项指定的课题 • 课上表现(10%)
课程考核 • 课程研修报告 (30%): 课程研修报告包括两种形式,学生任选其一: 1) 文献综述型:介绍该课题的研究背景和相关已有工作。并对不同已有研究结果给出自己的评论。 2)论文型报告:鼓励学生就某个特定课题的从事创新性研究,并撰写论文。
课程目标 • 掌握图数据库的几种基本的查询算法和挖掘算法 • 了解图数据库技术在不同领域的应用情况 • 培养学生的独立思考和开展研究的能力。
zoulei@icst.pku.edu.cn www.leizou.net Let’s begin!