1 / 32

Encoded Bitmap Indexing and Compressed Bitmaps

Encoded Bitmap Indexing and Compressed Bitmaps. Yashvardhan Sharma Faculty, CS&IS BITS-Pilani. Outline. Problems with Simple Bitmap Indexes Encoded Bitmap Indexes Compression of Bitmaps Byte Aligned Bitmap Code (BBC) Word Aligned Hybrid Code (WAH). Pure Bitmap Index.

kioshi
Download Presentation

Encoded Bitmap Indexing and Compressed Bitmaps

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Encoded Bitmap Indexingand Compressed Bitmaps Yashvardhan Sharma Faculty, CS&IS BITS-Pilani Data Warehousing

  2. Outline • Problems with Simple Bitmap Indexes • Encoded Bitmap Indexes • Compression of Bitmaps • Byte Aligned Bitmap Code (BBC) • Word Aligned Hybrid Code (WAH) Data Warehousing

  3. Pure Bitmap Index • Consists of a collection of bitmap vectors each created to represent a distinct value. • More than one conditions in a query can be replied by boolean operation on the respective bitmaps. Properties: • Suited for low cardinality column. • Utilizes bitwise operation. • Easy to build and add new indexed value. • Whole bitmap segment is locked at index updating. • Less space for storing indexes. More indexes can be cached in memory. Data Warehousing

  4. Problems with Bitmap Indexes • Space inefficient for attributes with high cardinality (sparsity of bitmap vectors) • Increase the complexity of the software Solution: • Bitmap Encoding • Bitmap Compression Data Warehousing

  5. Encoded Bitmap Index • Consists of a set of bitmap vectors,a lookup table, a set of retrieval boolean function. • Each distinct value of a column is encoded using a number of bits, each of which is stored in a bitmap vector. • Lookup table stores mapping between column values and there encoded representation. Properties: • Uses space efficiently. • Efficient with wide range queries. • Difficult to find a good encoding scheme. • Inefficient with equality queries. Data Warehousing

  6. Simple Bitmap Indexing Data Warehousing

  7. Simple Bitmap Indexing Advantages and disadvantages • Dynamic • Performance and stability under update • Costs less time and space than B-trees • Can work efficiently together • Required bytes • As cardinality increase space and time complexity increases rapidly Data Warehousing

  8. Encoded Bitmap Indexes • Why shifted from SBI to EBI? • The restriction in SBI is that they are not suitable for low cardinality attributes. • Advantage of a drastic reduction in space requirements. • The main idea of EBI is to encode the attribute domain. • Let us see through an example…… Data Warehousing

  9. Encoded Bitmap Indexes • We assume that our attribute domain is given by the table T is {a, b, c}. • The encoding schema of EBI is stored in a separate table called mapping table and simply encodes the values from a SBI by means of Huffman encoding. • Therefore reduces the number of bitmaps vectors. In particular, we use only ceil( log² 3)= 2 Encoded Bitmap vectors instead of 3 simple bitmap vectors. • This means that 2 bits are used to encode the domain {a, b, c}. Data Warehousing

  10. Encoded bitmap indexing Data Warehousing

  11. Encoded Bitmap Indexes • We assume that we have a fact table SALES with N tuples and a dimension table PRODUCT with 12,000 different products. • If we build a simple bitmap index (SBI)on PRODUCT, It will require 12,000 bitmap vectors of N bits in length. • However, if we use encoded bitmap indexing (EBI) we only need ceil( log² 12.000)= 14 bitmap vectors plus a mapping table which is a very significant reduction of the space complexity. Data Warehousing

  12. Encoded Bitmap Indexing • Retrieval function k variable minterm XY = X AND Y X + Y = X OR Y =B’ Data Warehousing

  13. Maintenance of Encoded Bitmap Indexes • Updates without domain expansion • Updates with domain expansion Data Warehousing

  14. Encoded bitmap indexing • Null ,not exist ,reserve zero for nonexisting Data Warehousing

  15. Applications and variations of encoding indexing • Hierarchy encoding • Total order preserving • Using encoding indexes for range encoding Data Warehousing

  16. Compression Of Bitmap Index • Cardinalities of many queried attribute is very high. • Basic Bitmap index generates too many bitmaps and operation take too long. • Data structures used to represent bitmap should be designed to provide efficient search operations. • Compression is the best technique to improve the effectiveness of basic bitmap. • Possible Methods can be LZ77(gzip), BBC,WAH,WBC. Data Warehousing

  17. WAH(Word Aligned Hybrid Code) • Hybrid between RLE and Literal scheme. • Stores compressed data in words. • MSB of a word is used to distinguish between a literal word(0) and a fill word(1). • Lower bits of LW contains bit values from bitmap. • Second MSB of a fill word is fill bit and lower bits store the fill length. • Word alignment requires all fill length to be integer multiples of no. of bits. Data Warehousing

  18. WAH Encoding Example Data Warehousing

  19. BBC(Byte aligned Bitmap Code) • Based on idea of run length encoding that represents consecutive identical bits(fill or gap) by their bit value and their length. • First the bit sequence is divided into bytes and then bytes are grouped into runs. • Run consists of a fill lenght followed by a tail of literal bytes,fill length is represented in terms of no. of bytes • Byte alignment limits length to be an integer multiple of bytes. Data Warehousing

  20. BBC • Based on basic idea of run length encoding • Organizes bits into bytes • Runs of BBC are of form [fill] [tail] • Fill can be zero or one fill depending on all bits been zero or one. • Two types one-sided or two-sided Data Warehousing

  21. BBC Variants • Type-1 Run • Type-2 Run • Type-3 Run • Type-4 Run Data Warehousing

  22. Type-1 Run • 0-3 bytes in fill and 0-15 literal bytes • Header is of only byte • Eight bits of header are 1[fill bit] [fill length(2 bits)] [tail length(4 bits)]. • Literal tail follows header byte Data Warehousing

  23. Example of Type-1 run Hexadecimal representation of bitmap 00 00 8A 37 BBC Type-1 run representation header 1 [0] [10] [0010] 8A 37 hex A2 8A 37 Data Warehousing

  24. Type-2 Run • 0-3 bytes in fill and tail of single byte with one bit different from fill bit • Header is of only byte • Eight bits of header are 0 1[fill bit] [fill length(2 bits)] [odd bit position(3 bits)]. • Single byte tail is not stored Data Warehousing

  25. Example of Type-2 run Hexadecimal representation of bitmap 00 00 00 02 BBC Type-1 run representation header 0 1 [0] [11] [001] hex 59 Data Warehousing

  26. Type-3 Run • More than 3 bytes in fill and 0-15 literal bytes • Multibyte counter is used to represent fill length • Header is of only byte • Eight bits of header are 0 0 1[fill bit] [tail length(4 bits)]. • Bytes in multibyte counter follows header byte • Literal tail follows header byte • Each mutilbyte counter byte is of form 1 [(7 bits) significant information] except last byte • Actual no. bytes in fill is literal value plus 4. Data Warehousing

  27. Example of Type-3 run Hexadecimal representation of bitmap 00 00 00 00 00 00 00 00 00 F3 BBC Type-1 run representation header 0 0 1 [0] [0001] hex 21 05 F3 Data Warehousing

  28. Type-4 Run • More than 3 bytes in fill and tail of single byte with one bit different from fill bit • Single byte tail is not stored • Eight bits of header are 0 0 0 1 [fill bit] [odd bit position(3 bits)]. • Bytes in multibyte counter follows header byte • Each mutilbyte counter byte is same as of type-3 run Data Warehousing

  29. Example of Type-4 run Hexadecimal representation of bitmap 00 00 00 00 00 00 00 01 BBC Type-1 run representation header 0 0 0 1 [0] [001] hex 11 03 Data Warehousing

  30. Improvement achieved and Comparisons BBC : • Can perform bitwise logical operation efficiently compared to other compression. • It compresses almost as well as gzip. • Most suited for range queries. • Suitable for OLAP applications. Data Warehousing

  31. WAH: • Performs logical operations about 12 times faster and uses only 60% more space compared to BBC. • Compared to uncompressed scheme WAH is faster while still using less space. • All the features of BBC are available in this scheme Data Warehousing

  32. Factors for performance difference between BBC and WAH • In WAH one test is sufficient to determine type of word while in BBC more than three tests are required to decide run type. • WAH accesses whole words while BBC accesses bytes hence it needs time to load data. • BBC can encode shorter fills more compactly than WAH but BBC starts new run for a short fill. Data Warehousing

More Related