1 / 32

SAMSON Platform Architecture

SAMSON Platform Architecture. Streaming Big Data. TELEFÓNICA I+D. Index. 01. SAMSON Platform File-System Streaming MapReduce Eco- system Architecture. 02. 03. 04. 05. 01. SAMSON Platform. Overview.

ganya
Download Presentation

SAMSON Platform Architecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SAMSON Platform Architecture Streaming Big Data TELEFÓNICA I+D

  2. Index 01 SAMSON Platform File-System Streaming MapReduce Eco-system Architecture 02 03 04 05

  3. 01 SAMSON Platform

  4. Overview • Samson is a distributed processing engine especially designed for efficient analytics of stream-processing. • Internal distributed file-system optimized for shared data processing • Provides an extension of the MapReduce framework • More efficient MapReduce • Joins • Streaming MapReduce that allows for the incremental processing of data feeds • Uses existing BigData Storage solutions such as Apache HDFS or MongoDB for fetching and storing data • Built to be deployed on Ubuntu, Redhat Linux and in virtual machines

  5. Samson 0.6 DEB Samson 0.6 RPM User guide Availablefor VM

  6. SAMSON Platform Architecture

  7. Key Platform Components • File-system • Streaming MapReduce • Eco-system • Architecture

  8. 02 File-system

  9. HD HD HD HD Cores Cores Cores Cores Distributedbig-data platformforhigh-performance Processingoverunboundedstreamsof data SAMSON distributedfile - system MapReduce forstreamedprocessing

  10. SAMSON distributedfile - system

  11. SAMSON distributedfile - system We periodically receive a set of documents. We want to compute the accumulated word-count each time we receive an update. First input Secondinput Thirdinput Reduce MapReduce 6 Redistribution 6 Reduce MapReduce 12 Redistribution 6 Reduce MapReduce 18 Redistribution 6

  12. 03 Streaming MapReduce

  13. SAMSON Upload Download Run operations SAMSON DELILAH 4 Gb SAMSON

  14. SAMSON Run operations SAMSON DELILAH Upload new operations & data types SAMSON Open API for 3rd party developers

  15. SAMSON SAMSON SAMSON

  16. Map Operation SAMSON Operation SAMSON Operation SAMSON

  17. Reduce Operation SAMSON Operation SAMSON Operation SAMSON State Output Input

  18. 04 Eco-system

  19. Top levelview… delilah samsonPop samsonPush samsonPush delilah samsonPop delilah samsonPop samsonPush samsonPush samsonClient samsonClient samsonClient module module module module module module • 3rd Party C++ sharedlibrary • New data types • New operations • Toolsprovidedforsimplified • development!! • Console-based client • Upload data • Download data • Runcommands • Platform monitor • C++ librarytodevelop • new plugins • Examples: • samsonPush • samsonPop Binariestostream data into and out of SAMSON

  20. Delilahclient delilah

  21. SAMSON Module example… classparser_cdrs : public samson::system::SimpleParser { std::vector<char*> words; // Vector used to store words parsed at each line void parseLine( char * line , samson::KVWriter *writer ) { // Split line in words split_in_words( line, words ); // Expected format USER_ID CDR X Y time if( words.size() < 5 ) return; // No content for a valid instruction if( strcmp( words[1] , "CDR" ) != 0 ) return; // Non valid format // Set the key key.value = atoll( words[0] ); // Set the position value.set( atoll( words[2] ) , atoll( words[3] ) , atoll( words[4] ) ); // Emit the key-value writer->emit( 0 , &key, &value ); } }; Module simple_mobility { title "Simple mobility example" author "Andreu Urruela" version "0.1.1" } data UserArea { system.String name; system.UInt x; system.UInt y; system.UInt radius; } data Position { system.UInt x; system.UInt y; system.TimeUnix time; } … parser parser_cdrs { out system.UIntsimple_mobility.Position helpLine "Parse input CDRs to get user-position" } module

  22. SAMSON ecosystemtools

  23. Stream MapReduce Ecosystem Demo File system Architecture

  24. 05 Architecture

  25. CommunicationProtocols… delilah Goal Solution Why ? Flexibility Back compatibility Platform messages Maximum data compression ( no field separator ) Best for fast-sequential processing Easy job distribution Data serialization Proprietary serialization format No recompilation needed Best tools for querying ( XPATH ) Monitoring

  26. Worker • Runtimeengine and notificationsystem Engine library Independent development • Process • Manager • Network • Manager • Memory • Manager • Disk • Manager --- cores --

  27. Enginelibrary • Disk Manager // Network Manager • Controller to access local disk and network connections • Asynchronus notifications using engine notification system • If required multiple threads are used • Memory Manager: ( our retain-release model ….. similar toObjective-C ) • Simple system to control memory usage • Used to optimize memory allocation when under heavy load • Process Manager ( similar to Apple’s Grand Central dispatch library ) • System to control independent “heavy” task to be executed • Automatic creation / destruction of threads • Optional “fork” mode with shared-memory system to get output • Runtime Engine & Notification system • Inspired in message-passing system implemented in Objective-C • Single loop to run all state-update operations • Thread protection to interact with Disk/Network/Memory/Process Managers

  28. Worker Block Manager Block Manager Disk–Memory balancer • Runtimeengine and notificationsystem Engine library Independent development multi-core • Process • Manager • Network • Manager • Memory • Manager • Disk • Manager --- cores --

  29. Block Manager • Maintains a reference of all blocks of data contained in a Worker ( in disk or memory ) • It keeps a sorted list based on when they will be used ( future operations ) • Low priority blocks are flushed to disk first • High priority blocks are loaded from disk first • Connected to the Disk Manager inside the engine using the EngineNotificationSystem Block Manager Schedule write operations To DiskManager Schedule read operations To DiskManager • Important: Since the order of blocks changes continuously based on the scheduling of new processing operations, the Block Manager is made aware of the new order and is able to react accordingly.

  30. Worker Queues Manager StreamOperations Manager Input data txt_cdrs Operation A cdrs Operation B users priority Operation C Block Manager Block Manager Disk–Memory balancer • Runtimeengine and notificationsystem Engine library Independent development multi-core • Process • Manager • Network • Manager • Memory • Manager • Disk • Manager --- cores --

  31. Queue & StreamOperations Manager Queues Manager StreamOperations Manager txt_cdrs Operation A cdrs Operation B users priority Operation C Contains reference to all the blocks contained in queues and stream operations Both systems are connected to Block Manager to inform about the priority of blocks Stream Operation Manager is connected with ProcessManager to schedule 3rd party operations at Engine Subsystem

More Related