1 / 61

Introduction to Spatial Databases

Introduction to Spatial Databases. Donghui Zhang CCIS Northeastern University. What is spatial database?. A database system that is optimized to store and query spatial objects: Point: a hotel, a car Line: a road segment Polygon: landmarks, layout of VLSI. Road Network.

pepin
Download Presentation

Introduction to Spatial Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Spatial Databases Donghui Zhang CCIS Northeastern University

  2. What is spatial database? • A database system that is optimized to store and query spatial objects: • Point: a hotel, a car • Line: a road segment • Polygon: landmarks, layout of VLSI Road Network Satellite Image VLSI Layout

  3. Are spatial databases useful? • Geographical Information Systems • e.g. data: road network and places of interest. • e.g. usage: driving directions, emergency calls, standalone applications. • Environmental Systems • e.g. data: land cover, climate, rainfall, and forest fire. • e.g. usage: find total rainfall precipitation. • Corporate Decision-Support Systems • e.g. data: store locations and customer locations. • e.g. usage: determine the optimal location for a new store. • Battlefield Soldier Monitoring Systems • e.g. data: locations of soldiers (w/wo medical equipments). • e.g. usage: monitor soldiers that may need help from each one with medical equipment.

  4. Shortest-Path Query Fastest-Path Query MapQuest.com

  5. Driving directions as you go. • Find nearest Wal-Mart or hospital. NN Query

  6. Range query ArcGIS 9.2, ESRI

  7. Are spatial databases useful? • Geographical Information Systems • e.g. data: road network and places of interest. • e.g. usage: driving directions, emergency calls, standalone applications. • Environmental Systems • e.g. data: land cover, climate, rainfall, and forest fire. • e.g. usage: find total rainfall precipitation. • Corporate Decision-Support Systems • e.g. data: store locations and customer locations. • e.g. usage: determine the optimal location for a new store. • Battlefield Soldier Monitoring Systems • e.g. data: locations of soldiers (w/wo medical equipments). • e.g. usage: monitor soldiers that may need help from each one with medical equipment.

  8. Aggregation query

  9. Are spatial databases useful? • Geographical Information Systems • e.g. data: road network and places of interest. • e.g. usage: driving directions, emergency calls, standalone applications. • Environmental Systems • e.g. data: land cover, climate, rainfall, and forest fire. • e.g. usage: find total rainfall precipitation. • Corporate Decision-Support Systems • e.g. data: store locations and customer locations. • e.g. usage: determine the optimal location for a new store. • Battlefield Soldier Monitoring Systems • e.g. data: locations of soldiers (w/wo medical equipments). • e.g. usage: monitor soldiers that may need help from each one with medical equipment.

  10. Optimal Location query

  11. Are spatial databases useful? • Geographical Information Systems • e.g. data: road network and places of interest. • e.g. usage: driving directions, emergency calls, standalone applications. • Environmental Systems • e.g. data: land cover, climate, rainfall, and forest fire. • e.g. usage: find total rainfall precipitation. • Corporate Decision-Support Systems • e.g. data: store locations and customer locations. • e.g. usage: determine the optimal location for a new store. • Battlefield Soldier Monitoring Systems • e.g. data: locations of soldiers (w/wo medical equipments). • e.g. usage: monitor soldiers that may need help from each one with medical equipment.

  12. NN(Bob) = George George John Bob Bill Mike

  13. Who will seek help from me? RNN(Bob) = {John, Mike} George John Bob Bill Mike RNN query

  14. And beyond the “space” … • 2004 NBA dataset*: each player has 17 attributes • “Spatial Data”: an object is a point in a 17-dimensional space • Who are the best players? • i.e. not “dominated” by any other player. Skyline query * www.databaseBasketball.com

  15. And beyond the “space” … • 2004 NBA dataset*: each player has 17 attributes • “Spatial Data”: an object is a point in a 17-dimensional space • Who are the best players? • i.e. not “dominated” by any other player. Skyline query * www.databaseBasketball.com

  16. And beyond the “space” … • 2004 NBA dataset*: each player has 17 attributes • “Spatial Data”: an object is a point in a 17-dimensional space • Who are the best players? • i.e. not “dominated” by any other player. Skyline query * www.databaseBasketball.com

  17. And beyond the “space” … • 2004 NBA dataset*: each player has 17 attributes • “Spatial Data”: an object is a point in a 17-dimensional space • Who are the best players? • i.e. not “dominated” by any other player. Skyline query * www.databaseBasketball.com

  18. And beyond the “space” … • 2004 NBA dataset*: each player has 17 attributes • “Spatial Data”: an object is a point in a 17-dimensional space • Who are the best players? • i.e. not “dominated” by any other player. Skyline query * www.databaseBasketball.com

  19. And beyond the “space” … • 2004 NBA dataset*: each player has 17 attributes • “Spatial Data”: an object is a point in a 17-dimensional space • Who are the best players? • i.e. not “dominated” by any other player. Skyline query * www.databaseBasketball.com

  20. Research goals in spatial databases • Support spatial database queries efficiently! • range query, aggregation query, NN query, RNN query, optimal-location query, fastest-path query, skyline query, … • Which statement is the best in a large spatial database? (a) Both an O(n2) algorithm and an O(n) algorithm are efficient. (b) An O(n2) algorithm is not efficient, but an O(n) algorithm is. (c) Neither an O(n2) algorithm nor an O(n) algorithm is efficient. Answer: (c)! Even a linear algorithm is not efficient!

  21. Research goals in spatial databases • Example of a linear algorithm: to find my nearest Wal-mart, compare my location with all Wal-marts in the world. • Example of a quadratic algorithm: to find the skyline of NBA players, compare every player against all other players (to see if it is dominated). • Sample scenario: • Disk page size: 8KB. • Database size: 1GB = 131,072 disk page. • Let each disk I/O be 10-3 second. • O(n): 131 seconds  2 minutes. (Not efficient!) • O(n2):  200 days! (Out of the question!)

  22. How can you do better than O(n)? • Answer: use (disk-based) index structures! • However, 1-dim index structures, e.g. the B+-tree, are not efficient. • E.g. to search for hotels in Boston…

  23. A 1-dim index is not good enough Suppose a B+-tree exists on X.

  24. A 1-dim index is not good enough Suppose a B+-tree exists on X.

  25. Content • The R-tree • Range Query • Aggregation Query • NN Query • Skyline Query • Highlights of Our Research

  26. R-Tree Motivation y axis 10 m g h l 8 k f e 6 i j d 4 b a 2 c x axis 10 0 8 2 4 6 Range query: find the objects in a given range. E.g. find all hotels in Boston. No index: scan through all objects. NOT EFFICIENT!

  27. R-Tree: Clustering by Proximity

  28. R-Tree

  29. R-Tree

  30. Range Query y axis 10 m g h l 8 k f e E 2 6 i j E d 1 4 b a 2 c x axis 10 0 8 2 4 6 Root E E 1 2 E E E E E E 1 E 3 4 5 6 7 2 e a c d g b f j m i l h k E E E E E 4 5 3 6 7

  31. Range Query y axis 10 m g h l 8 k f e E 2 6 i j E d 1 4 b a 2 c x axis 10 0 8 2 4 6 Root E E 1 2 E E E E E E 1 E 3 4 5 6 7 2 e a c d g b f j m i l h k E E E E E 4 5 3 6 7

  32. Aggregation Query • Given a range, find some aggregate value of objects in this range. • COUNT, SUM, AVG, MIN, MAX • E.g. find the total number of hotels in Massachusetts. • Straightforward approach: reduce to a range query. • Better approach: along with each index entry, store aggregate of the sub-tree.

  33. Aggregation Query y axis 10 m g h l 8 k f e E 2 6 i j E d 1 4 b a 2 c x axis 10 0 8 2 4 6 Root E :8 E :5 1 2 E E :3 E :2 E :3 E :3 E :2 1 E 3 4 5 6 7 2 e a c d g b f j m i l h k E E E E E 4 5 3 6 7

  34. Aggregation Query y axis 10 m g h l 8 k f e E 2 6 i j E d 1 4 b a Subtree pruned! 2 c x axis 10 0 8 2 4 6 Root E :8 E :5 1 2 E E :3 E :2 E :3 E :3 E :2 1 E 3 4 5 6 7 2 e a c d g b f j m i l h k E E E E E 4 5 3 6 7

  35. Content • The R-tree • Range Query • Aggregation Query • NN Query • Skyline Query • Highlights of Our Research

  36. Nearest Neighbor (NN) Query • Given a query location q, find the nearest object. • E.g.: given a hotel, find its nearest bar. a q

  37. A Useful Metric: MINDIST • Minimum distance between q and an MBR. • It is an lower bound of d(o, q) for every object o in E1. E1 MINDIST(q, E1) q

  38. NN Basic Algorithm E1 q • Keep a heap H of index entries and objects, ordered by MINDIST. • Initially, H contains the root. • While H  • Extract the element with minimum MINDIST • If it is an index entry, insert its children into H. • If it is an object, return it as NN. • End while

  39. NN Query Example y axis 10 m E E 7 2 g Action Heap h l 8 E 6 Visit Root E E k 1 2 1 2 f E e 5 6 E i j 4 E 1 query d 4 E 3 b a 2 c x axis 10 0 8 2 4 6 Root E E 1 2 1 2 E E E E E E 3 4 5 6 7 1 E 2 9 5 13 5 2 a c b j i k e d g f m l h 2 10 13 E E E E E 4 5 3 6 7

  40. NN Query Example y axis 10 m E E 7 2 g Action Heap h l 8 E 6 Visit Root E E k 1 2 1 2 f E e 5 E follow E E E E 6 5 5 9 1 3 5 4 2 E 2 i j 4 E 1 query d 4 E 3 b a 2 c x axis 10 0 8 2 4 6 Root E E 1 2 1 2 E E E E E E 3 4 5 6 7 1 E 2 9 5 13 5 2 a c b j i k e d g f m l h 2 10 13 E E E E E 4 5 3 6 7

  41. NN Query Example y axis 10 m E E 7 2 g Action Heap h l 8 E 6 Visit Root E E k 1 2 1 2 f E e 5 E follow E E E E 6 5 5 9 1 3 5 4 2 E 2 i j 4 E E follow E E E E E 1 query 5 5 9 2 13 d 3 5 4 2 7 6 4 E 3 b a 2 c x axis 10 0 8 2 4 6 Root E E 1 2 1 2 E E E E E E 3 4 5 6 7 1 E 2 9 5 13 5 2 a c b j i k e d g f m l h 2 10 13 E E E E E 4 5 3 6 7

  42. NN Query Example y axis 10 m E E 7 2 g Action Heap h l 8 E 6 Visit Root E E k 1 2 1 2 f E e 5 E follow E E E E 6 5 5 9 1 3 5 4 2 E 2 i j 4 E E follow E E E E E 1 query 5 5 9 2 13 d 3 5 4 2 7 6 4 E E follow E j i E E k E 3 13 10 6 5 5 13 b 2 9 7 3 5 a 4 2 c x axis 10 0 8 2 4 6 Root E E 1 2 1 2 E E E E E E 3 4 5 6 7 1 E 2 9 5 13 5 2 a c b j i k e d g f m l h 2 10 13 E E E E E 4 5 3 6 7

  43. NN Query Example y axis 10 m E E 7 2 g Action Heap h l 8 E 6 Visit Root E E k 1 2 1 2 f E e 5 E follow E E E E 6 5 5 9 1 3 5 4 2 E 2 i j 4 E E follow E E E E E 1 query 5 5 9 2 13 d 3 5 4 2 7 6 4 E E follow E j i E E k E 3 13 10 6 5 5 13 b 2 9 7 3 5 a 4 Report i and terminate 2 c x axis 10 0 8 2 4 6 Root E E 1 2 1 2 E E E E E E 3 4 5 6 7 1 E 2 9 5 13 5 2 a c b j i k e d g f m l h 2 10 13 E E E E E 4 5 3 6 7

  44. Content • The R-tree • Range Query • Aggregation Query • NN Query • Skyline Query • Highlights of Our Research

  45. Skyline of Manhattan • Which buildings can we see? • not dominated (further away and shorter)

  46. A skyline example: best hotels • Which one is better? • i or h? (i, because its price and distance dominate those of h) • i or k?

  47. A skyline example: best hotels • The skyline: a, i, k.

  48. Branched and Bound Skyline (BBS) • Assume all points are indexed in an R-tree. • mindist(MBR) = the L1 distance between its lower-left corner and the origin.

  49. Branched and Bound Skyline (BBS) • Each heap entry keeps the mindist of the MBR.

  50. Example of BBS • Process entries in ascending order of their mindists.

More Related