570 likes | 599 Views
Information Carnivores vs. Semantic Web. Today’s Outline. Information Aggregation Jango Information Integration Database Review Query Reformulation. User enters query. Formulate queries. Remove duplicates. Lycos. Excite. Post-process + rank. Collate results. Download?.
E N D
Today’s Outline • Information Aggregation • Jango • Information Integration • Database Review • Query Reformulation 2
User enters query Formulate queries Remove duplicates . . . Lycos Excite Post-process + rank Collate results Download? Present to user 3
Meta-? • Web Search • Shopping • Product Reviews • Chat Finder • Columnists (e.g. jokes, sports, ….) • Email Lookup • Event Finder • People Finder • Restaurant Reviews • Job Listings • Classifieds • Apartment + Real Estate 4
Personal Shopping Assistant • You Name the Product • Agent Visits Stores, Review Sites... • Makes Summary Report… • If Requested, Buys Items for You Inevitable New Category 5
The Netbot Story: Part I Bargain Finder (1995) Proof of concept; CD stores only; hardcoded UW Shopbot Prototype (Jan ‘96) Focus on scalability to many stores, categories Netbot Founded (May ‘96) Five Developers, 4 licenses, Seed funding 6
Hired Eric Zocher (Sept ‘96) Weld/Etzioni back to school Second Round Financing (Jan ‘97) Netbot now at 25 people Jango Client Launch Beta (April ‘97) 1.0 (July ‘97) The Netbot Story: Continued 7
Jango Client Architecture The Client Requests information Jango Server: Routes query Checks spelling Sends adapters 1 2 3 Multiple SourcesContacted in Parallel 4 The Client filters results Presents reports 8
Core Technology I: Information Adapters • Low cost, flexible, “glue” • Connects Jango to 100s of sites • Development & QA • Proprietary language & compiler • Semiautomatic construction (learning) • Continuous monitoring 16
Core Technology II:Query Routing • Selects WWW sites given query • Limited natural language abilities • Spell checking • Taxonomic knowledge base 17
Core Technology III: Parallel Pull • Conventional Pull • Typing a URL yields a single web page • Parallel Pull • Typing high-level request yields info compiled from dozens of web pages • Parallel Aggregation Engine • Efficient parallel access: aggregation, relevance, duplicate elimination 18
Consumer Benefits of Jango Comprehensive. Knows about 100s of stores Quick.Contacts stores simultaneously Current. Gathers up-to-the-minute details Convenient. Creates custom shopping guide Product prices & availability... Reviews... Links to each online store... 19
Jango Benefits for Retailers • More Traffic. Jango pulls from store with every query to a category. • Targeted Shoppers. Jango users are looking for specific products. • One click buy. Jango speeds sales process. • Reduced cost of customer acquisition. Jango will catalyze more shopping • Shopping Reports. What’s selling 20
Jango Business Model Give client away for free Get 1,000,000 users by Xmas ‘97 Sell to merchants: Advertising (targeted users in act of buying!) Real time promotions Eventually aim for transaction cut 21
Reaction to Jango Early adopters loved it Awards Substantial “buzz” in the press Two imitators (Sept ‘97) Contract with ATT Worldcom But only 30,000 downloads • All efforts shifted to server implementation 8/97 22
New Business Model Shopping Infrastructure Company License Server to High-traffic Sites Search Engines: Yahoo, Excite, ... ISPs: ATT Worldcom, AOL, MSN, … Specialty sites: ESPN, ... Split revenue from merchants 23
Enter Excite Very Aggressive Company #7 -> #2 search engine in 18 months 26 Million pages views / day Believed in Internet Shopping Already had leading shopping channel More users/day than we got in 6 months Sales force of 40 Acquires Netbot for $35 M (NPV of $300M) 24
The Excite Vision Commerce Community Navigation Content Connectivity • First Web-based Online Service Commodities • Become a New Retail Channel • 1-1 Marketing in a Mass Medium 25
Copycats • Junglee • Snagged Yahoo contract for Xmas ’97 • Bought by Amazon for $185M in Aug ‘98 • ZShops? • Eliminate technology? • C2B • Bought by Inktomi for $90M in Sept ‘98 • Infospace • Activeshopper June ‘99 • MySimon • Bought by Cnet for $700M in Jan ‘00 26
The Future • Price Tracking • Auctions • Category Wizards • E.g. cars, long distance carriers • Cross Selling • E.g. batteries, concert tickets • Gift Advisor • Virtual Registry • Cross Vendor Loyalty Program 27
Market Clearing Effect • Price is only one factor • Payment for service will change • Now bundled into cost of product • Unstable • Transition to personal buyer model • Micropayments 28
Motivation: Info Integration IMDB Sidewalk Ebert Spot • Info aggregation … on Steroids! • Want agent such that • User says whatshe wants • Agent determines how & whento achieve it • Example: • Show me all reviews of movies starring Matt Damon that are currently playing in Seattle 29
Info. Aggregation vs. Integration movies in Seattle starring … • More complex queries • Dynamic generation/optimization of execution plan • Applicable to wider range of problems • Much harder to implement efficiently prices of laptop with … IMDB sidewalk join store1 store2 storeN … rev1 rev2 revN … sort Join, sort aggregate 30
Problems User must know which sites have relevant info User must go to each one in turn Slow: Sequential access takes time Confusing: Each site has a different interface User must manually integrate information Before an agent can solve these problems it must be able to perceive WWW content... 31
Tukwila Architecture Info Source Info Source … Execution Optimizer Engine query (Re-) Event exec logical answer Optimizer Handler plan plan Reformulator MemAlloc- Query exec Fragmenter Operators results source mappings Catalog Temp Store 32
Representation I • World Ontology • Defines predicates of relational schemata • E.g., • actor-in (Movie, Part, Name), • review-of (Movie, Part) • year-of (Movie, Year) • shows-in (Movie, City, Theatre) • User uses this language to specify queries • Implementor uses language to specify content of info sites 33
Representation II: • Queries Find-all (M, Review) Such Thatactor-in(M, Part, damon) & shows-in(M, seattle, T) & review-of(M, Review) :- vs. vs. • Writen in Datalog: query(M, R) :- actor-in(M, Part, damon) & shows-in(M, seattle, T) & review-of(M, R) 34
Interlude: Datalog Yet another database query language Subset of First Order Predicate Calculus More powerful than relational algebra Enables expressing recursive queries Hence more powerful than “core SQL” Select, Project, Join But no way to express aggregation, sorting, etc. More convenient for analysis 35
Terminology Datalog represents a tuple with a relational atom Attribute names Tuples Product (Arity=4) Name Price Category Manufacturer gizmo $19.99 gadgets GizmoWorks Power gizmo $29.99 gadgets GizmoWorks SingleTouch $149.99 photography Canon MultiTouch $203.99 household Hitachi Product(gizmo, $19.99, Gadgets, GizmoWorks) 36
Datalog Rules and Queries • A pure datalog rule (e.g. first-order horn clause with a positive literal) • has the following form: • head :- atom1, atom2, …., atom,… • where all the atoms are non-negated and relational. • BritishProduct(X) :- Product(X,Y,P) & Company(P, “UK”, SP) • A datalog program is a set of datalog rules. • A program with a single rule is a conjunctive query. • We distinguish EDB predicates and IDB predicates • EDB’s are stored in the database, appear only in the bodies • IDB’s are intensionally defined, appear in both bodies and heads. 37
Correspondence: Datalog ~ Relational Algebra Given: EDBs ED(Name, SSN, Dname) :- Employee(Name, SSN) & Dependents(SSN, Dname) Define: IDB 38
Representation III [Rajaraman95] • Information Source Functionality • Info Required? $ Binding Patterns • Info Returned? • Mapping to World Ontology 39
For Example IMDBActor($Actor, M) actor-in(M, Part, Actor) Spot($M, Rev, Y) review-of(M, Rev) & year-of(M, Y) Sidewalk($C, M, Th) shows-in(M, C, Th) Source may be incomplete: (not ) 40
Unsafe Rules • All variables on left-hand-side mustappear on right-hand-side • Otherwise rule is “unsafe” • Converse not necessary • RHS var (e.g. Part) is existentially quantified Sidewalk($C, M, Th, Time) shows-in(M, C, Th) IMDBActor($Actor, M) actor-in(M, Part, Actor) 41
A Plan to Solve the Query query(M, R) actor-in(M, Part, damon) & shows-in(M, seattle, T) & review-of(M, R) IMDBActor($Actor, M) actor-in(M, Part, Actor) Spot($M, Rev, Y) review-of(M, Rev) & year-of(M, Y) Sidewalk($C, M, Th) shows-in(M, C, Th) 42
plan(M, R) IMDBActor(damon, M) & Sidewalk(seattle, M, Th) & Spot(M, Rev, Y) • How verify plan answers query? • How find this solution? 43
Two Questions • How verify this plan answers query? 1. Verify information content of plan • Same as DB problem of “rewriting queries using views” • Show expansion of plan equivalent to query • Technique of query containment 2. Verifying binding pattern constraints • How find a valid solution plans? • Search... • Bucket algorithm • Search-free synthesis of maximal recursive plan • Minicon Algorithm 44
Query Containment Let q1, q2 be datalog rules E.g. q1(X) :- p(X) & r(X) • Containment • q1 q2 iff q1(D) q2(D) • for every database instance, D • Can test whether q1 q2 • By seeing if there is a containment mapping 45
Expand Plan plan(M, R) IMDBActor(damon, M) & Sidewalk(seattle, M, Th) & Spot(M, Rev, Y) IMDBActor($Actor, M) actor-in(M, Part, Actor) Spot($M, Rev, Y) review-of(M, Rev) & year-of(M, Y) Sidewalk($C, M, Th) shows-in(M, C, Th) plan'(M, R) actor-in(M, P, A) & review-of(M, R) & year-of(M, Y) & shows-in(M, C, T) 46
A Plan to Solve the Query query(Mov, Rev) actor-in(Mov, Part, damon) & shows-in(Mov, seattle, T) & review-of(Mov, Rev) plan'(M, R) actor-in(M, P, damon) & review-of(M, R) & year-of(M, Y) & shows-in(M, seattle, T) f: Mov -> M Part -> P Rev -> R 47
How verify this plan answers query? 1.Verify information content of plan 2. Verifying binding pattern constraints IMDBActor($Actor, M) actor-in(M, Part, Actor) Spot($M, Rev, Y) review-of(M, Rev) & year-of(M, Y) Sidewalk($C, M, Th) shows-in(M, C, Th) query plan(M, R) IMDBActor(b, M) & Sidewalk(s, M, Th) & Spot(M, R, Y) 48
Finding Solution Plans: Bucket Algorithm (phase 1) r1(x,y) V1(x),V3(x) r2(y,x) V2(x), V3(x) • For each subgoal in the query, place relevant views in the subgoal’s bucket Q(x):- r1(x,y) & r2(y,x) V1(a):-r1(a,b) V2(d):-r2(c,d) V3(f):- r1(f,g) & r2(g,f) Inputs: Buckets: 49
Phase 2: Combining Buckets For every combination in the Cartesian products from the buckets, check containment in the query Candidate rewritings: Q’1(x) :- V1(x) & V2(x) Q’2(x) :- V1(x) & V3(x) Q’3(x) :- V3(x) & V2(x) Q’4(x) :- V3(x) & V3(x) r1(x,y) V1(x),V3(x) r2(y,x) V2(x), V3(x) Bucket Algorithm will check all possible combinations Buckets: r1(x,y) r2(y,x) 50