1 / 32

Incremental Aggregation on Multiple Continuous Queries

Incremental Aggregation on Multiple Continuous Queries. Chun Jin Carnegie Mellon University 09/28/2006 ISMIS, Bari Italy. Stream Processing. Intelligence monitoring Fraud detection Onset epidemic patterns Network intrusion detection GeoSpacial changes. Transactions

eileen
Download Presentation

Incremental Aggregation on Multiple Continuous Queries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Incremental Aggregationon Multiple Continuous Queries Chun Jin Carnegie Mellon University 09/28/2006 ISMIS, Bari Italy

  2. Stream Processing • Intelligence monitoring • Fraud detection • Onset epidemic patterns • Network intrusion detection • GeoSpacial changes • Transactions • Senor network readings • Network traffic data

  3. Problem • Aggregate queries • Continuous evaluation • Multiple concurrent queries

  4. Solutions • Incremental aggregation • Incremental multiple aggregate query optimization (incremental sharing)

  5. Roadmap • System overview • Query examples • Incremental Aggregation • Incremental sharing • Evaluation

  6. System Architecture • New Query Insertion: • Index query network • Identify common computation • Select optimal sharing path • Expand query network Common Computation Identifier (CCI) Engine Query Network Sharing Optimizer (SO) Oracle Coordinator Query Projection Manager (PM) System Catalog Generator Network Operation Manager (NOM) • Query Network Execution: • Code assembly • Incremental aggregation • Periodical execution Code Assembler

  7. Query Examples SH AH S A SN AN SELECT dis_cat, hospital, vdate, COUNT(*), AVERAGE(fee) FROM Med GROUP BY CAT(disease) AS dis_cat, hospital, DAY(visit_time) AS vdate (a) Query A SELECT hospital, vdate, AVERAGE(fee) FROM Med GROUP BY hospital, DAY(visit_time) AS vdate (b) Query B S A B

  8. Roadmap • System overview • Query examples • Incremental Aggregation • Incremental sharing • Evaluation

  9. Aggregate Function Types • Distributive: aggregate function itself. Sum, count. • Algebraic: a finite set of aggregate functions. Average. • Holistic: no such finite set. Quantiles. Incremental Aggregation

  10. Holistic Aggregation • Revisiting the entire history. • Usage: • For holistic aggregates. • For post-non-incrementally-evaluated aggregates. • Baseline to incremental aggregation. Incremental Aggregation

  11. Algorithm 4: Drop Duplicates t1: AH SH 0: PreUpdate State 5: Insert New Results 2: Merge Groups t2.COUNTA = t1.COUNTA + t2.COUNTA t2.SUMA = t1.SUMA + t2.SUMA SN t2: AN 1: Aggregate AN 3: Compute Algebraic Aggregate Incremental Aggregation

  12. Complexity • Aggregate SN. T1 = O(|SN|) • Merge groups in AH to AN. Tcurr2 = O(|AH| + |AN|), Thash2 = O(|AH| + |AN|), Tprefetch2 = O(|AN|) • Compute algebraic aggregates in AN. T3 = O(|AN|) • Drop duplicates. Tcurr4 = O(|AN|*|ANH|) = O(|AN|2), Thash4 = O(|AH|+|AN|), Tprefetch4 = O(|AN|) • Insert new results. T5 = O(|AN|) Incremental Aggregation

  13. Implementation • System catalog: • AggreRules • AggreBasics • Incremental aggregation instantiation Incremental Aggregation

  14. System Catalog AggreRules AggreBasics Incremental Aggregation

  15. AggreBasics: AVERAGE: SUM(X): SUMX AVERAGE: COUNT(W): COUNTW AVERAGE SUM(X) SUM(fee) SUMX SUMX New Query A: AVERAGE(fee) COUNT(W) COUNT(*) Name Mapping: COUNTW COUNTW SUMX fee SUM(fee) SUMA COUNTW COUNT(*) COUNTA AVERAGE(fee) AVGA AggreRules: retrieve rules substitute parse substitute insert columns GroupColumns: SUM(fee): SUMA COUNT(*): COUNTA AVERAGE(fee): AVGA Instantiation Incremental Aggregation

  16. Roadmap • System overview • Query examples • Incremental Aggregation • Incremental sharing • Evaluation

  17. Incremental Multiple Query Optimization (Incremental Sharing) • Index existing query plan information R. • Given a new query Q, identify the sharable computations from R. • Select the optimal sharing path. • Expand R to compute Q. Incremental Sharing

  18. Expanding Query Network • Limited sharing on holistic aggregates • Sharing on distributive/algebraic aggregates through vertical expansion Incremental Sharing

  19. Vertical Expansion Vertical Expansion B A BH AH 2: 1: Further Aggregate COUNTB=SUM(COUNTA) SUMB=SUM(SUMA) GROUP BY BID 1: Further Aggregate: COUNTB=SUM(COUNTA) SUMB=SUM(SUMA) GROUP BY BID Incremental Sharing

  20. A B Vertical Expansion 4: Drop Duplicates BH AH 2: Merge Groups t2.COUNTA = t1.COUNTA + t2.COUNTA t2.SUMA = t1.SUMA + t2.SUMA 5: Insert New Results AN BN 3: Compute Algebraic Aggregate 1: Further Aggregate COUNTB=SUM(COUNTA) SUMB=SUM(SUMA) GROUP BY BID

  21. Vertical Expansion Complexity • TVcurr = O(|AN|2 + |BH|) • TVhash = O(|AN| + |BH|) • TVprefetch = O(|AN|) Incremental Sharing

  22. System Catalog GroupColumns GroupTopology GroupExprSet GroupExprIndex Incremental Sharing

  23. Select Optimal Sharing Path • Select least-size node for sharing Incremental Sharing

  24. Rerouting S B S B S B A B A S A B Animation Evolution Incremental Sharing

  25. Roadmap • System overview • Query examples • Incremental Aggregation • Incremental sharing • Evaluation

  26. Evaluation • Databases: • Synthesized FedWire money transfers • Anonymized Medical patient admission records • Queries: • Seed queries • Generate sharable queries from seeds • A wild range of queries (aggregates in this paper) • Simulation: • Historical data (300000 on Fed, and 600000 on Med) • Chunks of new data (4000 per chunk) Evaluation

  27. Incremental Aggregation Total execution time in seconds Evaluation

  28. Execution Time (s) Number of FED queries (a) Fed Evaluation

  29. Execution Time (s) Number of MED queries (a) Med Evaluation

  30. Conclusion • Multiple aggregates over streams • Solutions: • Incremental aggregation • Incremental MQO (incremental sharing) • Built atop DBMSs for direct practical utility • Big performance improvement • Future work: • A broad range of queries • Built atop DSMSs.

  31. Acknowledgement • Work with Professor Jaime Carbonell. • Part of ARGUS by CMU and Dynamix. • Team: Phil Hayes, Santosh Ananthraman, Bob Frederking, Eugene Fink, Dwight Dietrich, Ganesh Mani, Johny Mathew. • Thanks to Professor Chris Olston for helpful discussion.

  32. FED Query Pair 1 Non-VE IBT VE IBT ITT: Average Individual-Tuple Execution Time (s) IBT: Incremental-Batch Execution Time (s) NonVE ITT VE ITT Incremental Size: |SN| (a) Pair 1 Evaluation

More Related