1 / 30

Hadoop Your ETL: Using Big Data Technologies to Enhance Today’s Data Warehouses

Hadoop Your ETL: Using Big Data Technologies to Enhance Today’s Data Warehouses.

kamran
Download Presentation

Hadoop Your ETL: Using Big Data Technologies to Enhance Today’s Data Warehouses

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hadoop Your ETL: Using Big Data Technologies to Enhance Today’s Data Warehouses

  2. The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

  3. Thoughts Things Processes Thoughts Things Processes

  4. Today’s Challenges Produce Data More sources of data Use Data

  5. Big Data Usage Pattern ETL and Batch Processing Workloads on Hadoop SQL DW & BI Analytics Web Data Factory • Scalable • Flexible • Cost Effective SQL NoSQL

  6. Data Factory: Basic Use Cases • Offload mainframe batch processing to Hadoop, with lower cost and higher levels of performance • Offload ETL staging processing to Hadoop to decrease ETL costs, and enable more DW bandwidth • Create centralized repository of data to serve multiple applications and data warehouses

  7. Data Warehouse Reference Architecture

  8. Oracle Big Data Solution Decide Oracle Database Cloudera Hadoop Oracle Advanced Analytics Oracle NoSQL Database Oracle BI Foundation Suite Endeca Information Discovery Oracle Big Data Connectors Oracle Spatial & Graph Oracle R Distribution Oracle Event Processing Oracle GoldenGate Apache Flume Oracle DataIntegrator Oracle Real-TimeDecisions Stream Acquire – Organize – Analyze

  9. Big Data Appliance X3-2 Sun Oracle X3-2L Servers with per server: • 2 * 8 Core Intel Xeon E5 Processors • 64 GB Memory • 36TB Disk space Integrated Software: • Oracle Linux • Oracle Java JDK • Cloudera Distribution of Apache Hadoop (CDH) • Cloudera Manager • Oracle R Distribution • Oracle NoSQL Database All integrated software (except NoSQL DB CE) is supported as part of Premier Support for Systems and Premier Support for Operating Systems

  10. Platform Strengths • Low-cost Scalability • Flexible Schema on Read • Abstract Storage Model • Open • Rapid Evolution • Extreme Performance • Highly Secure • Analytic SQL • Rich Tool Set • Vast Expertise Big Data Appliance+Hadoop Exadata+Oracle Database

  11. ETL That Eats At the Bottom Line Long-running ETL jobs: Lots of resources Less Value Less horsepower for innovative analysis

  12. Data Factory ETL Increases Savings One factory to be accessed at any time More resources for more insights

  13. Save the Bottom Line, Serve Innovation Data Factory Big Data Appliance+Hadoop Exadata+Oracle Database

  14. Customer Example: Mobile Telecom Provider Before Filter & Split Alerting Event Monitoring Telecom Services Complex Correlation Streaming ETL Data Warehouse Streaming ETL • Exponential growth in data, generated by new consumer devices • ETL and storage constraints limited analytics to 1% sample • Now combined Oracle Exadata and Cloudera Hadoop delivers analytics on 100% of data • Query times reduced dramatically (i.e. from 4 days to 53 minutes) • 90% reduction of ETL code base • From 1% sampling to 100% analysis Archive Storage After Alerting Filter & Split Event Monitoring Telecom Services Hadoop Archive Storage ETL Correlation Stage 1 DWH Data Warehouse

  15. Benefits: Faster access to 6x more data Lower cost, simplified architecture Implemented in a matter of months Challenges: Reduce IT costs Comply with regulations requiring more data to support stress testing Consolidate and streamline data processing Customer Example: Full Service Bank Before After Mainframe Exadata Mainframe Big Data Appliance

  16. Big Data Connectors

  17. Big Data Connectors and Data Integrator 15TB / hour 10x Faster Exadata+Oracle Database Big Data Appliance+Hadoop

  18. Big Data Connectors Optimized integration of Hadoop with Oracle Database and Oracle Exadata • Oracle Loader for Hadoop • Oracle SQL Connector for Hadoop Distributed File System (HDFS) • Oracle Data Integrator Application Adapter for Hadoop • Oracle R Connector for Hadoop • Oracle XQuery Connector for Hadoop • Does not require Big Data Appliance – can be licensed for Hadoop running on non-Oracle hardware

  19. Oracle Loader for Hadoop Oracle Loader for Hadoop MAP Last stage in MapReduce workflow Offloads data pre-processing from the database server to Hadoop Works with a range of input data formats MAP Shuffle/Sort Reduce MAP Oracle Database Reduce MAP Reduce Shuffle/Sort MAP Reduce MAP Reduce

  20. Oracle Loader for Hadoop:Connectivity to Hadoop Technologies Oracle Loader for Hadoop JSON SerDe JSON files Shuffle/Sort MAP Reduce MAP Reduce MAP Hive’s HBase Storage Handler Oracle Data Warehouse MAP Shuffle/Sort Reduce MAP Reduce Hive external tables MAP Reduce

  21. Oracle SQL Connector for HDFS HDFS Oracle Database Use Oracle SQL to Load or Access Data on HDFS Features Load into the database using SQL Option to access and analyze data in place on HDFS Access Hive (internal and external) tables and HDFS filesAutomatic load balancing to maximize performance SQL Query OSCH External Table OSCH OSCH OSCH HDFS Client

  22. XQuery Connect for Hadoop • XQuerylanguage executed on the Map/Reduce framework Map/Reduce XQuery Map/Reduce Worker Nodes Execution Plan OXH for $ln in M/R Engine text:collection() let $f := tokenize($ln) M/R where $f[1] = 'x' HDFS return M/R M/R text:put($f[2])

  23. Supports Hadoop standards Reverse Engineer Hadoop metadata Check, Validate and Ensure Data Integrity with Hadoop Load Data into HDFS/Hive Generate HiveQL and execute in Hadoop Leverage existing Hadoop transformations Heterogeneous Integration with Hadoop Environments Oracle Data Integrator for Big Data Access Transform Oracle Data Integrator Loads

  24. Oracle Data Integrator – Hive Control Knowledge Module Big Data Transformation Services Metadata Focused Approach Selectable Hive Transformations Easy to use & Guided Hive Function Support

  25. Heterogeneous Integration with Hadoop Environments Oracle Data Integrator for Big Data

  26. Thoughts Things Processes Thoughts Things Processes

  27. Oracle Big Data Solution Decide Oracle Database Cloudera Hadoop Oracle Advanced Analytics Oracle NoSQL Database Oracle BI Foundation Suite Endeca Information Discovery Oracle Big Data Connectors Oracle Spatial & Graph Oracle R Distribution Oracle Event Processing Oracle GoldenGate Apache Flume Oracle DataIntegrator Oracle Real-TimeDecisions Stream Acquire – Organize – Analyze

More Related