210 likes | 227 Views
Dataspace: a new concept of data management. Li Yukun. Outline. From database to dataspace PDS/PIM Related work Challenge issues Our work on dataspace. Traditional RDBMS. Query1 : Please tell me all the information in my dataspace about a conference
E N D
Outline • From database to dataspace • PDS/PIM • Related work • Challenge issues • Our work on dataspace
Query1:Please tell me all the information in my dataspace about a conference Query2: please tell me the emails and persons on a event
From Database to Dataspace The advantages of traditional model should be kept. New characters of data should be mapped. Focus
Outline • From database to dataspace • PDS/PIM • Related work on dataspace • Challenge issues • Our work on dataspace
Outline • From database to dataspace • PDS/PIM • Related work on dataspace • Challenge issues • Our work on dataspace
Related work on PIM/PDS • Memex——1945 (Vannevar Bush ) • Lifestreams——1996 • From Database to Dataspaces——2005 • SIGIR PIM workshop 2005/2006 • iDM——2006 (JensPeter Dittrich, Marcos Antonio Vaz Salles ) • Indexing dataspace • Resource space model
Outline • From database to dataspace • PDS/PIM • Related work on dataspace • Challenge issues • Our work on dataspace
Challenge issues on the topic INPUT Profile OUTPUT
Challenge issues on the topic Searching\Encountering\keeping\Extraction\ObjectIdentity\Evaluation INPUT Profile OUTPUT
Challenge issues on the topic Model/Index /Store/Query/System INPUT OUTPUT
Challenge issues on the topic INPUT OUTPUT Finding/Refining /Reminding/HCI/QL
Outline • From database to dataspace • PDS/PIM • Related work on dataspace • Challenge issues • Our work
Our work and proposal1. Read related papers2. Survey From Database to dataspace, from for enterprise to for people. (IDKE Report2006) PIM: 一个新的研究焦点(IDKE Report2006)数据空间:一种新的数据管理技术,(计算机通讯, 07.8)张相於毕业论文3. Automatic content extraction from paper of PDF style.4. Proposalfor research of our group.
About the Proposal General Topic: Related technology on Email content management Subtopic: Model of Email content management (classify\content-formalization\Query\importance\urgency) EMIEX: Object extraction based on email content (Personal name\Location name\Event\Time\...). EMSN: Socal network construction and mining on email log Intelligent reminding based on email log (from email to schedule) From email to blog \ chatting\ phone-note log Demo development tasks: • Read papers on content extraction\ personal recommendation \ user profile • Read papers on Email management • Prepare dataset (English email\Chinese email) and classify • Arithmetic and Policy
Motivation & Challenge • Motivation Email has become more popular and play an important role in work and daily life. We can get data for experiment. It has a more formal stytle. It’s characters is similar to Blog\BBS\Chating data. • Challenge IR is a new area to us. Data collection is a hard process. A more detailed plan will be formed later
U N H A O Y K T