1 / 18

Multi-Data-Center Hadoop in a Snap

Multi-Data-Center Hadoop in a Snap. Dr. Konstantin Boudnik Vice President, Open Source Development. My background. 15 years Sun Microsystems veteran: JVM, distributed systems Vice President, Apache Bigtop Committer, PMC & contributor to various ASF projects Member of Apache IPMC

connie
Download Presentation

Multi-Data-Center Hadoop in a Snap

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multi-Data-Center Hadoop in a Snap Dr. Konstantin Boudnik Vice President, Open Source Development

  2. My background • 15 years Sun Microsystems veteran: JVM, distributed systems • Vice President, Apache Bigtop • Committer, PMC & contributor to various ASF projects • Member of Apache IPMC • Early Hadoop committer

  3. WANdisco Background • WANdisco: Wide Area Network Distributed Computing • Enterprise ready, high availability software solutions that enable globally distributed organizations to meet today’s data challenges of secure storage, scalability and availability • Leader in tools for software engineers – Subversion • Apache Software Foundation sponsor • Highly successful IPO, London Stock Exchange, June 2012 (LSE:WAND) • US patented active-active replication technology granted, November 2012 • Global locations • San Ramon (CA) • Chengdu (China) • Tokyo (Japan) • Boston (MA) • Sheffield (UK) • Belfast (UK)

  4. Customers

  5. Non-Stop Hadoop Non-Intrusive Plugin Provides Continuous Availability In the LAN / Across the WAN Active/Active

  6. 3 Key Problems For Multi Cluster Hadoop LAN / WAN

  7. Enterprise Ready Hadoop Characteristics of Mission Critical Applications • Require 100% Uptime of Hadoop • SLA’s, Regulatory Compliance • Require HDFS to be Deployed Globally • Share Data Between Data Centers • Data is Consistent and Not Eventual • Ease Administrative Burden • Reduce Operational Complexity • Simplify Disaster Recovery • Lower RTO/RPO • Allow Maximum Utilization of Resource • Within the Data Center • Across Data Centers

  8. Breaking Away from Active/Passive What’s in a NameNode Single Standby Active / Active All resources utilized Only NameNode configuration Scale as the cluster grows All NameNodes active Load balancing Set resiliency (# of active NN) Global Consistency • Inefficient utilization of resource • Journal Nodes • ZooKeeper Nodes • Standby Node • Performance Bottleneck • Still tied to the beeper • Limited to LAN scope

  9. Breaking Away from Active/Passive What’s in a Data Center Standby Datacenter Active / Active DR Resource Available Ingest at all Data Centers Run Jobs in both Data Centers Replication is Multi-Directional active/active Absolute Consistency Single HDFS spans locations ‘N’ Data Center support Global HDFS allows appropriate data to be shared • Idle Resource • Single Data Center Ingest • Disaster Recovery Only • One way synchronization • DistCp • Error Prone • Clusters can diverge over time • Difficult to scale > 2 Data Centers • Complexity of sharing data increases

  10. Multiple Clusters One Cluster Aproach • Example Applications • HBASE • RT Query • Map Reduce • Poor Resource Management • Data Locality Issues • Network Use • Complex

  11. Multiple Clusters Creating Multiple Clusters • Example Applications • HBASE • RT Query • Map Reduce • Need to share data between clusters • DistCp / Stale Data • Inefficient use of storage and or network • Some clusters may not be available

  12. Cluster Zones Zoning for Optimal Efficiency 1 100% HDFS Consistency

  13. Multi Datacenter Hadoop Disaster Recovery Absolute Consistency Maximum Resource Use Lower Recovery Time/Point WAN REPLICATION Replicate Only What You Want Better Utilization of Power/Cooling Lower TCO LAN Speed Performance

  14. Architecture of a Non-Stop Hadoop

  15. Technical Use Cases • Eliminate Performance Bottleneck • HBASE issues • Multi Data-Center Ingest • Information doesn't need to be sent to one DC and then copied back to the other using DistCP • Parallel ingest methods don’t require redirected data streams • Ingest data at, or close to the source • Global Analysis (Logs, Click Streams, etc…) • Cluster Zones • Efficient use of resource based on application profile • HBASE, MapReduce, SPARK, etc… • Maximize Data Center Resource Utilization • All datacenters can be used to run different jobs concurrently • Disaster Recovery • Data is as current as possible (no periodic synchs) • Virtually zero downtime to recover from regional data center failure • Regulatory compliance

  16. Non-Stop Hadoop Demonstration

  17. Q & A

  18. Thank you

More Related