1 / 25

Big Data: I Microsoft ima slona za utrku

MICROSOFT HRVATSKA. Big Data: I Microsoft ima slona za utrku. Luka Lovošević, Antonio Faletar Microsoft Hrvatska. Sadržaj Uvod u Big Data Pregled MS platforme Hadoop Demo. Što je Big Data?. Što je Big Data?. Podaci koji su vam bitni, ali ih tradicionalnim alatima ne možete procesirati.

misty
Download Presentation

Big Data: I Microsoft ima slona za utrku

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MICROSOFT HRVATSKA Big Data: I Microsoft ima slona za utrku Luka Lovošević, Antonio Faletar Microsoft Hrvatska

  2. Sadržaj Uvodu Big Data Pregled MS platforme Hadoop Demo

  3. Što je Big Data?

  4. Što je Big Data? Podaci koji su vam bitni, ali ih tradicionalnim alatima ne možete procesirati. VOLUME (Količina) VARIETY (Struktura) VELOCITY (Brzina, real-time)

  5. Izvori podataka Vrijeme i lokacija RFID Logovi Text Telemetrija Društvene mreže Pametne kuće Senzori

  6. Big Data algoritmi Slični artikli (npr. web shop) Real-time analiza Analiza povezanih pojmova Česti skupovi artikala c Klastering (grupiranje) Reklamiranje na webu Sustavi preporuka Analiza na društvenim mrežama

  7. Microsoft Big Data platforma

  8. Microsoft Big Data platforma Self-service BI alati SQL Server StreamInsight SQL Server 2012 Parallel Data Warehouse Hadoop – HDInsight (Windows ili Azure)

  9. Malo više o Hadoopu

  10. Što je Hadoop? Platforma za procesiranje velike količine podataka Apache, open source Google GFS i MapReduce Visoko skalabilan i distribuiran Commodity hardver Apache projekt Yahoo! EnterpriseHadoop 2013 2004 2006 2008 2010 2012

  11. Hadoop arhitektura Task tracker Task tracker MapReduceLayer (distribuirano procesiranje) Job tracker Name node HDFS Layer (distribuirana pohrana) Data node Data node

  12. MapReduce Node Node Podaci Node Node

  13. MapReduce Program // Map Reduce function in JavaScript varmap = function (key, value, context) { var words = value.split(/[^a-zA-Z]/); for (var i = 0; i < words.length; i++) { if (words[i] !== "") context.write(words[i].toLowerCase(), 1);} }}; varreduce = function (key, values, context) { var sum = 0; while (values.hasNext()) { sum += parseInt(values.next()); } context.write(key, sum); }; Node Node Node Node

  14. Primjer za MapReduce

  15. Alati za uspješno Hadoopiranje

  16. Pig Procesiranje i oblikovanje podataka ETL tool MapReduce

  17. Hive Strukturiranje podataka SQL sintaksa ODBC, Excel … MapReduce

  18. Mahout Biblioteka gotovih algoritama Strojno učenje (npr. clustering, recommendation, …) MapReduce

  19. HDInsight Hadoop Hadoopza Windows Server Hadoopza Windows Azure Programiranje u .NET-u Security, HA & management Podrška za virtualizaciju Integracija s Microsoft BI alatima Isto iskustvo za on-premise i cloud

  20. Demo Windows Azure HDInsight

  21. Hadoop 2.0 HortonWorks Stinger inicijativa Tez (interactive) vs. batch Streaming (Storm project), itd.

  22. Zaključak Big data trend Hadoop de facto standard Windows Azure HDInsight Open source

  23. Pitanja?

  24. Hvala!

More Related