1 / 23

Revolution Analytics

Revolution Analytics. Overview of Revolution R Enterprise. Joseph B. Rickert , Marketing Manager. For the Dallas R User’s Group. Agenda. Revolution Analytics Today Revolution R Enterprise Revolution Analytics in the Enterprise Big Data with RevoScaleR

edythe
Download Presentation

Revolution Analytics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Revolution Analytics • Overview of Revolution R Enterprise • Joseph B. Rickert, Marketing Manager • For the Dallas R User’s Group

  2. Agenda • Revolution Analytics Today • Revolution R Enterprise • Revolution Analytics in the Enterprise • Big Data with RevoScaleR • Deploying R Throughout the Enterprise with RevoDeployR

  3. Corporate Overview & Quick Facts • “Revolution Analytics is the leading commercial provider of software and support for theopen-source R statistical computing language.”

  4. Open Source Analytics for the Enterprise • Most advanced statistical analysis software available The professor who invented analytic software for the experts now wants to take it to the masses • Half the cost of commercial alternatives Power • 2M+ Users • 2,500+ Applications Finance Statistics Productivity Life Sciences Predictive Analytics Manufacturing Enterprise Readiness Retail Data Mining Telecom Social Media Visualization Government

  5. Revolution R Enterprise Productivity

  6. Revolution R Enterprise has Open-Source R Engine at the core 2,500 community packages and growing exponentially Community Packages Technical Support Multi-ThreadedMath Libraries Web ServicesAPI Big DataAnalysis ParallelTools DeveloperIDE BuildAssurance R Engine Language Libraries

  7. A network of partners for integrated, large-scale data analysis • Deployment / Consumption • Advanced Analytics • Data Infrastructure

  8. Revolution R Enterprise Performance

  9. Performance: Intel MKL Math Libraries 1. http://www.revolutionanalytics.com/why-revolution-r/benchmarks.php 2. http://r.research.att.com/benchmarks/ Open Source R Revolution R Enterprise

  10. Revolution R Enterprise Big Data Analysis

  11. A common analytic platform across big data architectures Hadoop File Based In-database

  12. Two Big Data problems: capacity and speed • Capacity: problems handling the size of data sets or models • Data too big to fit into memory • Even if it can fit, there are limits on what can be done • Even simple data management can be extremely challenging • Speed: even without a capacity limit, computation may be too slow to be useful

  13. RevoScaleR: Big Data Analysis for Revolution R Enterprise Addresses performance by distributing computations between cores and computers Addresses capacity through a collection of functions for chunking through massive data files External Memory Programming Framework DistributedStatisticalAlgorithms A novel high-speed file format designed specifically to support statistical analyses Familiar, high-prodictivity programming paradigm for R users R Language Interface XDF File Format

  14. The basis for a solution for capacity, speed, distributed and streaming data – PEMA’s • Parallel external memory algorithms (PEMA’s) allow solution of both capacity and speed problems, and can deal with distributed and streaming data • External memory algorithms are those that allow computations to be split into pieces so that not all data has to be in memory at one time • It is possible to “automatically” parallelize and distribute such algorithms

  15. RevoScaleR on a Multicore Server Shared Memory Data Data Data Disk Core 0 (Thread 0) Core 1 (Thread 1) Core 2 (Thread 2) Core n (Thread n) Multicore Processor (4, 8, 16+ cores) RevoScaleR • A RevoScaleR algorithm is provided a data source as input • The algorithm loops over data, reading a block at a time. Blocks of data are read by a separate worker thread (Thread 0). • Other worker threads (Threads 1..n) process the data block from the previous iteration of the data loop and update intermediate results objects in memory • When all of the data is processed a master results object is created from the intermediate results objects

  16. RevoScaleR for Distributed Computing Clusters Compute Node (RevoScaleR) Data Partition • Portions of the data source are made available to each compute node • RevoScaleR on the master node assigns a task to each compute node • Each compute node independently processes its data, and returns it’s intermediate results back to the master node • master node aggregates all of the intermediate results from each compute node and produces the final result Compute Node (RevoScaleR) Data Partition Master Node (RevoScaleR) Compute Node (RevoScaleR) Data Partition Compute Node (RevoScaleR) Data Partition

  17. Platform-agnostic Big Data Analytics • Set “compute context” to define hardware (one line of code) • Native job-scheduler handles distribution, monitoring, failover etc. • Same code runs on other supported architectures • Just change compute context • Supported architectures: • Windows: Microsoft HPC Server • Linux: Platform Computing LSF (coming 2012) 42 seconds instead of 6 minutes

  18. R and Hadoop • Hadoopoffers a scalable infrastructure for processing massive amounts of data • Storage – HDFS, HBASE • Distributed Computing - MapReduce • R is a statistical programming language for developing advanced analytic applications • Currently, writing analytics for Hadoop requires a combination of Java, pig, Python, … • The Rhadoop project makes it possible to write PEMAs for Hadoop using the R language alone.

  19. Massively parallel/distributed analytics:RevoConnectR for Hadoop • rhdfs - R and HDFS • rhbase - R and HBASE • rmr- R and MapReduce Write Map-Reduce analytics using only R code with these R packages: HDFS HBASE R Thrift Map or Reduce rhbase Task Node rhdfs More information at: bit.ly/r-hadoop Revolution R Client Job Tracker rmr

  20. In-Database Execution with IBM Netezza

  21. Revolution R Enterprise Enterprise Deployment

  22. Revolution R Web Services: RevoDeployR Data Sources & Creation of Analytics Consumption of Analytics & Results Data Analysis Revolution “RevoDeployR” R / Statistical Modeling Expert DeploymentExpert Business Intelligence Interactive Web Apps Cloud / SaaS

  23. Thank you. The leading commercial provider of software and support for the popular open source R statistics language. www.revolutionanalytics.com 650.646.9545 Twitter: @RevolutionR

More Related