1 / 42

Typhoon: An Ultra-Available Archive and Backup System Utilizing Linear-Time Erasure Codes

Typhoon: An Ultra-Available Archive and Backup System Utilizing Linear-Time Erasure Codes. Typhoon: An Ultra-Available Archive and Backup System Utilizing Linear-Time Erasure Codes. Part of OceanStore. POOL. POOL. OceanStore. Cache. Naming/Location. Client. Erasure Codes.

oro
Download Presentation

Typhoon: An Ultra-Available Archive and Backup System Utilizing Linear-Time Erasure Codes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Typhoon: An Ultra-Available Archive and Backup System Utilizing Linear-Time Erasure Codes

  2. Typhoon: An Ultra-Available Archive and Backup System Utilizing Linear-Time Erasure Codes Part of OceanStore...

  3. POOL POOL OceanStore Cache Naming/Location Client

  4. Erasure Codes • Erasure Code: a form of data coding that allows lost portions of data to be recovered • Idea is similar to ECC, except that the algorithm must be told which portions of the data are missing • Reed Solomon Codes are a common type of Erasure Code, but they are computationally expensive and are usually implemented in hardware

  5. Tornado Codes: A Linear-Time Probabilistic Family of Erasure Codes • Tornado Codes are linear time, but use probabilistic assumptions to “guarantee” that the decoding process will succeed • A 1/2 rate Erasure Code will double the size of a file • Any half of e. file can be used to recreate the original data • T. Codes also require slightly more than half of the encoded file, thus trading a network bandwidth for speed • Inventors of T. Codes report that 5% is typical

  6. Overview of Encoding Process Data File • File is divided into nodes of equal size (e.g. 512 bytes) • Data Nodes are associated with Check Nodes using a series of Bipartite Graphs • Contents of a Check Node is the XOR of its neighbors • Bipartite Graphs are created to satisfy mathematical constraints that “guarantee” the recovery process will successfully recover the file Check Nodes Data Nodes

  7. Overview of Encoding Process • Once a file is encoded, the data nodes and check nodes are randomly distributed to a set of recipients Data File Data Nodes Check Nodes

  8. MMX: SIMD or Marketing? • There are eight MMX registers • Data in registers can be divided into four different sizes:

  9. MMX: SIMD or Marketing? • There are eight MMX registers • Data in registers can be divided into four different sizes • MMX has 57 instructions for 6 types of operations: • ADD • SUBTRACT • MULTIPLY • MULTIPLY THEN ADD • COMPARISON • LOGICAL • AND • NAND • OR • XOR

  10. MMX: SIMD or Marketing? • There are eight MMX registers • Data in registers can be divided into four different sizes • MMX has 57 instructions for 6 types of operations char array1[512]; char array2[512]; for(int i=0; i<512; ++i) array1[i]=array1[i] ^ array2[i]; MMX is 2.3 times faster than this (1.9 w/o pipeline sched.)

  11. MMX: SIMD or Marketing? • There are eight MMX registers • Data in registers can be divided into four different sizes • MMX has 57 instructions for 6 types of operations char array1[512]; char array2[512]; long * array1ptr=(long*)array1; long * array2ptr=(long*)array2; for(int i=0; i<512/sizeof(long); ++i) array1ptr[i]=array1ptr[i] ^ array2ptr[i]; MMX is 50% faster than this (22% w/o sched.)

  12. MMX: SIMD or Marketing? • There are eight MMX registers • Data in registers can be divided into four different sizes • MMX has 57 instructions for 6 types of operations char array1[512]; char array2[512]; long * array1ptr=(long*)array1; long * array2ptr=(long*)array2; for(int i=0; i<512; i+=32) xor32fast(array1ptr+i, array2ptr+i);

  13. MMX: SIMD or Marketing? inline void xor32bytes(long * array1reg, long* array2reg, long* destreg) { _asm { mov eax, [array1reg] mov ecx, [array2reg] movq mm0, [eax] movq mm1, [ecx] movq mm2, [eax+8] movq mm3, [ecx+8] movq mm4, [eax+16] movq mm5, [ecx+16] movq mm6, [eax+24] movq mm7, [ecx+24] pxor mm0, mm1 ; 64-bit xor pxor mm2, mm3 ; 64-bit xor pxor mm4, mm5 ; 64-bit xor pxor mm6, mm7 ; 64-bit xor mov ecx, [destreg] movq [ecx], mm0 ; store result movq [ecx+8], mm2 ; store result movq [ecx+16], mm4 ; store result movq [ecx+24], mm6 ; store result } }

  14. MMX: SIMD or Marketing? inline void xor32fast(long * array1reg, long* array2reg, long* destreg) { _asm { mov eax, [array1reg] mov ebx, [array2reg] mov ecx, [destreg] movq mm0, [eax] ; load 1a U movq mm1, [ebx] ; load 1b U movq mm2, [eax+8] ; load 2a U V pxor mm0, mm1 ; xor 1 movq mm3, [ebx+8] ; load 2b U movq [ecx], mm0 ; store 1 U V pxor mm2, mm3 ; xor 2 movq mm4, [eax+16] ; load 3a U movq mm5, [ebx+16] ; load 3b U movq mm6, [eax+24] ; load 4a U V pxor mm4, mm5 ; xor 3 movq mm7, [ebx+24] ; load 4b U movq [ecx+8], mm2 ; store 2 U V pxor mm6, mm7 ; xor 4 movq [ecx+16], mm4 ; store 3 U movq [ecx+24], mm6 ; store 4 U } }

  15. Overview of Encoding Process • Server sends storage announcement to a particular set of severs • Set can be determined/specified using multicast groups, a server list, or some form of DNS address lookup UDP

  16. Overview of Encoding Process • Server sends storage announcement to a particular set of severs • Set can be determined/specified using multicast groups, a server list, or some form of DNS address lookup Multicast

  17. Overview of Encoding Process • Server encodes file • During encoding process, the data nodes and check nodes are [randomly] distributed to other servers

  18. Overview of Decoding Process Data File • A set of nodes are received, ideally with random distribution • Check nodes can be used to recover missing data nodes • Only check nodes that are missing one neighbor can recreate a data node • The structure of the graph ensures [w.h.p.] that the encoding process will succeed • Graph is designed so that there is always at least one check node that is missing only one child • Data nodes can be used to recover check nodes, but is not important Check Nodes Node Received Node Not Received

  19. Overview of Decoding Process Data File • A set of nodes are received, ideally with random distribution • Check nodes can be used to recover missing data nodes • Only check nodes that are missing one neighbor can recreate a data node • The structure of the graph ensures [w.h.p.] that the encoding process will succeed • Graph is designed so that there is always at least one check node that is missing only one child • Data nodes can be used to recover check nodes, but is not important Check Nodes Node Received Node Not Received

  20. Overview of Decoding Process Data File • A set of nodes are received, ideally with random distribution • Check nodes can be used to recover missing data nodes • Only check nodes that are missing one neighbor can recreate a data node • The structure of the graph ensures [w.h.p.] that the encoding process will succeed • Graph is designed so that there is always at least one check node that is missing only one child • Data nodes can be used to recover check nodes, but is not important Check Nodes Node Received Node Not Received

  21. Overview of Decoding Process Data File • A set of nodes are received, ideally with random distribution • Check nodes can be used to recover missing data nodes • Only check nodes that are missing one neighbor can recreate a data node • The structure of the graph ensures [w.h.p.] that the encoding process will succeed • Graph is designed so that there is always at least one check node that is missing only one child • Data nodes can be used to recover check nodes, but is not important Check Nodes Node Received Node Not Received

  22. Overview of Decoding Process Data File • A set of nodes are received, ideally with random distribution • Check nodes can be used to recover missing data nodes • Only check nodes that are missing one neighbor can recreate a data node • The structure of the graph ensures [w.h.p.] that the encoding process will succeed • Graph is designed so that there is always at least one check node that is missing only one child • Data nodes can be used to recover check nodes, but is not important Check Nodes Node Received Node Not Received

  23. Overview of Decoding Process Data File • A set of nodes are received, ideally with random distribution • Check nodes can be used to recover missing data nodes • Only check nodes that are missing one neighbor can recreate a data node • The structure of the graph ensures [w.h.p.] that the encoding process will succeed • Graph is designed so that there is always at least one check node that is missing only one child • Data nodes can be used to recover check nodes, but is not important Check Nodes Node Received Node Not Received

  24. Overview of Decoding Process Data File • A set of nodes are received, ideally with random distribution • Check nodes can be used to recover missing data nodes • Only check nodes that are missing one neighbor can recreate a data node • The structure of the graph ensures [w.h.p.] that the encoding process will succeed • Graph is designed so that there is always at least one check node that is missing only one child • Data nodes can be used to recover check nodes, but is not important Check Nodes Node Received Node Not Received

  25. Overview of Decoding Process Data File • A set of nodes are received, ideally with random distribution • Check nodes can be used to recover missing data nodes • Only check nodes that are missing one neighbor can recreate a data node • The structure of the graph ensures [w.h.p.] that the encoding process will succeed • Graph is designed so that there is always at least one check node that is missing only one child • Data nodes can be used to recover check nodes, but is not important Check Nodes Node Received Node Not Received

  26. Overview of Decoding Process Data File • A set of nodes are received, ideally with random distribution • Check nodes can be used to recover missing data nodes • Only check nodes that are missing one neighbor can recreate a data node • The structure of the graph ensures [w.h.p.] that the encoding process will succeed • Graph is designed so that there is always at least one check node that is missing only one child • Data nodes can be used to recover check nodes, but is not important Check Nodes Node Received Node Not Received

  27. Overview of Decoding Process Data File • A set of nodes are received, ideally with random distribution • Check nodes can be used to recover missing data nodes • Only check nodes that are missing one neighbor can recreate a data node • The structure of the graph ensures [w.h.p.] that the encoding process will succeed • Graph is designed so that there is always at least one check node that is missing only one child • Data nodes can be used to recover check nodes, but is not important Check Nodes Node Received Node Not Received

  28. Overview of Decoding Process Data File • A set of nodes are received, ideally with random distribution • Check nodes can be used to recover missing data nodes • Only check nodes that are missing one neighbor can recreate a data node • The structure of the graph ensures [w.h.p.] that the encoding process will succeed • Graph is designed so that there is always at least one check node that is missing only one child • Data nodes can be used to recover check nodes, but is not important Check Nodes Node Received Node Not Received

  29. Overview of Decoding Process Data File • A set of nodes are received, ideally with random distribution • Check nodes can be used to recover missing data nodes • Only check nodes that are missing one neighbor can recreate a data node • The structure of the graph ensures [w.h.p.] that the encoding process will succeed • Graph is designed so that there is always at least one check node that is missing only one child • Data nodes can be used to recover check nodes, but is not important Check Nodes Node Received Node Not Received

  30. Overview of Decoding Process Data File • A set of nodes are received, ideally with random distribution • Check nodes can be used to recover missing data nodes • Only check nodes that are missing one neighbor can recreate a data node • The structure of the graph ensures [w.h.p.] that the encoding process will succeed • Graph is designed so that there is always at least one check node that is missing only one child • Data nodes can be used to recover check nodes, but is not important Check Nodes Node Received Node Not Received

  31. Overview of Decoding Process Data File • A set of nodes are received, ideally with random distribution • Check nodes can be used to recover missing data nodes • Only check nodes that are missing one neighbor can recreate a data node • The structure of the graph ensures [w.h.p.] that the encoding process will succeed • Graph is designed so that there is always at least one check node that is missing only one child • Data nodes can be used to recover check nodes, but is not important Check Nodes Node Received Node Not Received

  32. Overview of Decoding Process • Server sends file request announcement to a particular set of servers • Retrieves data from multiple servers simultaneously • Recovery process can be performed in parallel with receive (network-based RAID-1) • Depending on data loss pattern, a particular subset of the servers can be selected • Fastest servers (closest servers, or least utilized servers) • Operational Servers (i.e., some portion of the set is not functioning) • All servers might be needed in some cases, such as network congestion / packet loss

  33. POOL POOL OceanStore Cache Naming/Location Client Architecture

  34. Architecture • What did we implement? • Client, Cache, Naming and Location Mechanism, Replication mechanism, filestore. • What did we test? • Communication • Explicit communication  Unicast request • Implicit communication  Multicast request • Network • Distributed servers throughout Berkeley domain. • Simulated network delay by randomizing response time. • Caching • None for worst case • Simulation • Strained the Typhoon system by creating requests at the same rate as a 24 hour NFS traces over a 3 hour period.

  35. Typhoon: An Ultra-Available Archive and Backup System Utilizing Linear-Time Erasure Codes Benefits of Typhoon Data is ultra-available: up to half of the servers can fail before availability is affected Fast file retrieval: data can be retrieved simultaneously from multiple servers System can choose to use the fastest machines in a set of servers Load balancing can be achieved because slow or heavily utilized servers are not used Information can be disbursed geographically Increases the accessibility of data in the event of a major disaster, such as an earthquake Can benefit people who travel to remote locations, since data may be closer to them Multicast can be used to reduce latency Low-overhead algorithms: algorithms for encoding and decoding are linear-time Disk overhead of system can be adjusted (typically doubles the size of a file)

  36. Conclusion • Tornado Codes are significantly faster than Cauchy-Reed Solomon • A Typhoon based system can match the the request of a loaded NFS • Typhoon is a viable solution for increasing the reliability and accessibility of data

  37. Architecture • What did we implement? • Client, Cache, Naming and Location Mechanism, Replication Mechanism, filestore. • What did we test? • Communication • Explicit communication  TCP request, TCP Response. • Implicit communication  Multicast request, TCP Response. • Network • Distributed servers throughout Berkeley domain. • Simulated network delay by randomizing response time. • Caching • None for worst case • Simulation • Strained the Typhoon system by creating requests at the same rate as a 24 hour NFS traces over a 3 hour period.

More Related