1 / 13

A Brief Overview of Hadoop Eco-System

A Brief Overview of Hadoop Eco-System. Hive. SQL-like language to query data stored on HDFS Example – “Select c.ID, c.Name , c.AGE , o.Amount From Customers c JOIN Orders o on (c.ID = o.CUSTOMER ) Data Model

Download Presentation

A Brief Overview of Hadoop Eco-System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Brief Overview of Hadoop Eco-System

  2. Hive • SQL-like language to query data stored on HDFS • Example – “Select c.ID, c.Name, c.AGE, o.AmountFrom Customers c JOIN Orders o on (c.ID = o.CUSTOMER) • Data Model • Tables – Column types (int, float, string, data, Boolean) • Supports array / map / struct for Json like data • Meta-Store • Name-space containing set of tables, list of columns and their types and SerDe info • CLI • Other languages – Jaql, Pig

  3. HBase • Hadoop performs only Batch processing. Data will be accessed only in a sequential manner. • One has to search the entire dataset for the simplest of jobs. • HBase provides random read/write access to data in HDFS • Data Model – • A table is a collection of rows • A row is a collection of column families • A column family is a collection of columns • A column is a collection of key-value pairs

  4. HBase • Reading – Get and Scan. Reader will always read the last written values • Rows are ordered. • Hbase is not • an SQL database, relational, joins, secondary-indices, • Horizontally Scalable

  5. Oozie • Workflow management and coordination of these workflows • Workflow consist of Action nodes (MR, Pig, Hive) and Control Nodes. Specified through an xml file

  6. Cascading and Scalding

  7. Word-Count in Java

  8. Apache Mahaout

  9. Cascading • A simple, high-level java API for MR easy to understand and work with

  10. Scalding • The power of scala over cascading • No boilerplate code

  11. Sqoop • Apache Sqoop is designed for efficiently transferring bulk data between Apache Hadoop and RDBMS • Imports data from external structured datastores into HDFS or related systems like Hbase

  12. Mahout

More Related