580 likes | 1.05k Views
IBM Big Data Platform Overview. Martin Pavl í k +420 731 435 691 martin_pavlik@cz.ibm.com. Big Data is a Hot Topic Because Technology Makes it Possible to Analyze ALL Available Data. Cost effectively manage and analyze all available data in its native form
E N D
IBM Big Data Platform Overview Martin Pavlík +420 731 435 691 martin_pavlik@cz.ibm.com
Big Data is a Hot Topic Because Technology Makes it Possible to Analyze ALL Available Data Cost effectively manage and analyzeall available data in its native form unstructured, structured, streaming Social Media Website Billing Network Switches ERP CRM RFID
BIG DATA is not just HADOOP Understand and navigate federated big data sources Federated Discovery and Navigation Hadoop File System MapReduce Manage & store huge volume of any data Data Warehousing Structure and control data Stream Computing Manage streaming data Text Analytics Engine Analyze unstructured data Integrate and govern all data sources Integration, Data Quality, Security, Lifecycle Management, MDM
Business-Centric Big Data Enables You to Start With a Critical Business Pain and Expand the Foundation for Future Requirements • “Big data” isn’t just a technology—it’s a business strategy for capitalizing on information resources • Getting started is crucial • Success at each entry point is accelerated by products within the Big Data platform • Build the foundation for future requirements by expanding further into the big data platform
Monthly sales reports Profitability analysis Customer surveys Merging the Traditional and Big Data Approaches Big Data Approach Iterative & Exploratory Analysis Traditional Approach Structured & Repeatable Analysis IT Delivers a platform to enable creative discovery Business Users Determine what question to ask Business Explores what questions could be asked IT Structures the data to answer that question Brand sentiment Product strategy Maximum asset utilization
Hadoop • Open-source software framework from Apache • Inspired by • Google MapReduce • GFS (Google File System) • HDFS • Map/Reduce
InfoSphere BigInsights Can run also on top of Platform for volume, variety, velocity • Enhanced Hadoop foundation Analytics • Text analytics & tooling • Application accelerators Usability • Web console • Spreadsheet-style tool • Ready-made “apps” Enterprise Class • Storage, security, cluster management Integration • Connectivity to Netezza, DB2, JDBC databases, etc Enterprise Edition Licensed Application accelerators Pre-built applications Text analytics Spreadsheet-style tool RDBMS, warehouse connectivity Administrative tools, security Eclipse development tools Performance enhancements . . . . Basic Edition Enterprise class Free download Integrated install Online InfoCenter BigData Univ. Apache Hadoop Breadth of capabilities
Spreadsheet-style Analysis Web-based analysis and visualization Spreadsheet-like interface Define and manage long running data collection jobs Analyze content of the text on the pages that have been retrieved
Build a Big Data Program – MapReduce example • Eclipse tools • For Jaql, Hive, Pig Java MapReduce, BigSheets plug-ins, text analytics, etc.
JAQL – IBM’s programming language in hadoop world • Jaql is a complete solutions environment supporting all other BigInsights components • Integration point for various analytics • Text analytics • Statistical analysis • Machine learning • Ad-hoc analysis • Integration point for various data sources • Local and distributed file systems • NoSQL data bases • Content repositories • Relational sources (Warehouses, operational data bases) BigInsights Text Analytics Statistical Analysis (R module) Machine learning (SystemML) Ad-Hoc analysis (BigSheets) (Integration) DB2, Netezza, Streams, … Jaql Jaql Modules Jaql Core Operators Jaql I/O File System RDBMS DFS NoSQL
BigInsights and the data warehouse Traditional analytictools Big Data analytic applications Data warehouse BigInsights Filter Transform Aggregate
OK. We have to evaluate a lot of statistics, set the correct db indexes and db partitioning. It will take us 5 days. I need to evaluate the possible relationship between client salary and overdrafts IT Analyst
Done. You can run your analytical query. Great. Thanks a lot. I’m going to check the results. IT Analyst After 5 days ...
Noooo!!! It’s not possible to work here! Ohhh, welcome dear friend. Understand. So, it’s …. another 5 days of our work Great. I can see here some nice correlations.Now I need to look at it from the different perspective. IT Analyst After 10 minutes ...
I need to evaluate the possible relationship between client salary and overdrafts. I will use Netezza. IT Analyst
Great. I can see here some nice correlations.Now I need to look at it from the different perspective. With NetezzaI can run the query immediately. The response will be in the same time IT Analyst IT can do something else – much more useful After 12 minutes ...
Go to 'View > Header and Footer' to change this footer text to the event title Built-In Expertise Makes This as Simple as an Appliance • Dedicated device • Optimized for purpose • Complete solution • Fast installation • Very easy operation • Standard interfaces • Low cost
In October 2012 IBM Netezza was renamed to IBM PureData System for Analytics
Netezza Genesis in T-Mobile CZ Proof-Of-Concept Project New EnterpriseDataWarehouseplatform selection Comparison of existing and other platforms SelectionCriteria Performance OperationalSavings ….andthewinnerwas: Netezza
Netezza Genesis in T-Mobile CZ Expectations Significant response improvement: Faster platform means better reports response Direct Data Availability Higher trust in data , one version of truth Aggregation reduction Any attribute available Operational Benefits Storage savings (no data replicas) Administration costs reduction(DBA) Infrastructure Simplification Lower environment complexity
Netezza Genesis in T-Mobile CZ Project Implementation EDW platformmigration Netezzaplatformimplementation ETL graphs/processesredesign BI Front-EndToolMigration SAP Business Objectimplementation Allreportsredesign MainIntegration Partner: T-System CZ
Netezza Genesis in T-Mobile CZ Actual Status All relevant ETL procecessing redesigned Actual parallel run to Original and Netezza platform finished Netezza as only primary platform
Real Netezza experience from T-Mobile Czech Rep. RESPONSE TIME MASSIVELY IMPROVED
BigInsights and the data warehouse Traditional analytictools Big Data analytic applications From Cognos BI via Hive JDBC BigInsights • Query-ready archive for “cold” warehouse data Data Warehouse
SQL Language JDBC / ODBC Driver JDBC / ODBC Server Future: The SQL interface . . . . Application • Rich SQL query capabilities • SQL '92 and 2011 features • Correlated subqueries • Windowed aggregates • SQL access to all data stored in InfoSphere BigInsights • Robust JDBC/ODBC support • Take advantage of key features of each data source • Leverage MapReduce parallelismORachieving low-latency SQL interface Engine Data Sources HiveTables HBase tables CSV Files InfoSphere BigInsights
Why and when to use InfoSphere Streams? Applications needing on-fly processing, filtering and analyzing streaming data At least 2 criteria from the list bellow should be fulfilled
Streams and BigInsights - Integrated Analytics on Data in Motion & Data at Rest Visualization of real-time and historical insights Data Integration, data mining, machine learning, statistical modeling InfoSphere Streams 1. Data Ingest Data InfoSphere BigInsights, Database & Warehouse 2. Bootstrap/Enrich Data ingest, preparation, online analysis, model validation Control flow 3. Adaptive Analytics Model
Analytic Applications BI / Reporting Exploration / Visualization FunctionalApp IndustryApp Predictive Analytics Content Analytics Visualization & Discovery Application Development Systems Management Accelerators HadoopSystem Data Warehouse Stream Computing Information Integration & Governance The Platform Advantage BI / Reporting IBM Big Data Platform
IBM big data • IBM big data • IBM big data THINK IBM big data • IBM big data IBM big data • IBM big data IBM big data • IBM big data • IBM big data