1 / 22

Other formats for data

Other formats for data. Linked lists, Hash tables, JSON, Big Data, Hadoop & MapReduce. REST. Parallel processing exercise Homework: Plans for group sorting. Prepare for RSA talk. Postings. Linked list. Big array for data Array of arrays: think of rows

Download Presentation

Other formats for data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Other formats for data Linked lists, Hash tables, JSON, Big Data, Hadoop & MapReduce. REST. Parallel processing exercise Homework: Plans for group sorting. Prepare for RSA talk. Postings

  2. Linked list • Big array for data • Array of arrays: think of rows • Each row has information + one or more pointers to other rows. Various ways: • Forward pointing list: next item • Forward and back: next and previous item • Tree: first child item and next sibling • or first child, next sibling, parent • or first child, next sibling, parent1, parent2

  3. Family example: name, a parent, 1st child, next sibling

  4. Exercise • Make your family tree • each row has a name, parent1, (optionally include second parent), first child, next sibling • you need to start somewhere • Put down Not defined for things not in the table. • Put down -1 for cases of no children, no next sibling

  5. Hash tables • Problem: how to find elements in a table? • no intrinsic order. If there was, you could use binary search. • Binary search: Compare value (or the key) to the middle value, if less than, search the lower half, if greater than, search the upper half, keep going… • Aside: Meyer family geography game

  6. Hash table approach • Have key-value pairs. • Have task of finding if current key is in the table. • Assume there is a hash function that inputs the key and outputs the hash which corresponds to a slot in the table. • fixed time to compute the function • go to that spot. If empty, then store key-value there. If not empty, compare the keys, if it matches, then …. If not, check the next position, continue. • http://en.wikibooks.org/wiki/Data_Structures/Hash_Tables

  7. Associative array • Normal arrays use indices, typically starting with 0. • An associative array uses values. Consider a set of 4 products: table, desk, chair, lamp. An associative array could be used to store the prices: table=>100, desk=>150, chair=>50, lamp=>20

  8. key-value pairs • so called key-value pairs is generalization of associative array and used in other systems. • At its most general, there can be more than one key-value for a given key and the basic software OR your program needs to take care of this situation.

  9. JSON • http://www.json.org/ • Format (syntax) for information • smaller than XML • available in many language • name / value pairs • create using brackets. Use dot notation to access and modify • arrays • create using square brackets. Square brackets with indices to access and modify.

  10. Example var course = {"name":"Topics", "teacher": "Jeanine Meyer", "days": "MR"}; course.name =>"Topics" course.teacher => "Jeanine Meyer" course.days => "MR"

  11. Example var list = { "class_list": [ {"firstname":"Groucho", "lastname": "Marx"}, {"firstname":"Harpo", "lastname": "Marx"}, {"firstname":"Zeppo", "lastname": "Marx"}, {"firstname":"Curly", "lastname": "Stooge"} ]}; list[2].firstname => "Zeppo"

  12. Big Data • buzz word more than specific product • Data that is • large in Volume • changes rapidly [or application requires up-to-date values] Velocity • different formats Variable • PLUS not necessarily all owned by the organization attempting to use it. • in this case, can only query, no changes/updates, deletions or additions

  13. Note • A company / organization can store data in its own CLOUD (on servers) or cloud service offered by a vendor and still have total control. • Could even be relational database • Very large data bases, may be just key-value pairs

  14. Cloud … can refer to one, some or all of the following • where the programs are • where the data is • where the processors (aka computers) are for doing the calculations

  15. REST • Representational State Transfer • a "standard" / framework / style of communicating with Web services • typically, get information in the form of XML or JSON or something else • Posting opportunity: find a specific service that provides REST connections….

  16. Parallel processing / distributed processing • Large amounts (volumes) of data • Multiple number of processors • How to speed up accomplishment of tasks? • Embarrassingly parallel refers to tasks that is easy to parallelize • Take a list of numbers (say, prices) and increase each by 10% • ?

  17. What about • Tasks in which some parts can be done in parallel, but some cannot • How to devise ways to take advantage of multiple processors

  18. Parallel exercise • Divide into groups of 5 • Each take a deck of cards • Shuffle • Devise plan to sort into order • suits hearts, spades, diamonds, clubs, • each suit A, 2, …. J, Q, K

  19. Hadoop • open source utilities for distributed computing • http://hadoop.apache.org/ • Includes MapReduce

  20. MapReduce A MapReduce job • map sets up tasks to be done in parallel • reduce combines the results • may be local combine step and then a reduce across all output steps • Requires a file system • Data is in key/value pairs

  21. Applications • What are applications that using multiple processors for a [big] gain in speed?

  22. Homework • Come up with improved parallel sorting • Postings: more on Hadoop, MapReduce, Big Data, etc.

More Related