560 likes | 821 Views
Csci6330 Spatial Database Management. Yan Huang huangyan@unt.edu http://www.cs.unt.edu/~huangyan/6330. Today’s Outline . Introduction Syllabus Course description Next class’s topic. Csci6330 Spatial Database Management. Instructor : Yan Huang Office: NTRP F251
E N D
Csci6330 Spatial Database Management Yan Huang huangyan@unt.edu http://www.cs.unt.edu/~huangyan/6330
Today’s Outline • Introduction • Syllabus • Course description • Next class’s topic
Csci6330 Spatial Database Management • Instructor: Yan Huang • Office:NTRP F251 • Office Tel: 940-369-8353 • Email: huangyan@unt.edu • Office Hour: T, Th 8:55-9:55AM, or by appointment. • Class Time and Location:TR, 10:00am-11:20am, NTRP B190 • Class Website:http://www.cs.unt.edu/~huangyan/6330
Self Introduction • Name • Major • Previous study and working experiences • Computer background: hardware, software • Why taking this course • Expectation for this course
Textbook • Text Book (required): • Spatial Databases : A Tour • By Shashi Shekhar and Sanjay Chawla • Available in the Book Store • Reference Book • Spatial Databases: with application to GIS • By Philippe Rigaux, Michel Scholl, and Agnes Voisard • GIS: A Computing Perspective, Taylor and Francis • By M.R. Worboys
Background • Prerequisite: csci4350 or csci5350 • Entity-Relationship Model • File Organizations • Index Structures for Files • Basic Queries in SQL • Query Processing • Query Optimization
Course Description • This course treats a specific advanced topic of current research interest in the area of spatial data and information. • The main objective of this class is to study research methods and literature in spatial database systems. • Core research skills of literature analysis, innovation, evaluation of new ideas, and communication are emphasized via homework assignments and projects.
Topics • Introduction to Spatial Databases • Spatial Concepts and Data Models • Spatial Query Languages: SQL3, OGIS • Spatial Storage and Indexing: R-tree, Grid files, • Query Processing and Optimization • Strategies for range query, nearest neighbor query • Spatial joins (e.g. tree matching), cost models for new strategies, impact on rule based optimization. • Spatial Networks: • Query language for graphs, graph algorithms, access methods. • Introduction to Spatial Data Mining • Spatial auto-correlation, co-location patterns, spatial outliers, classification methods • Trends in Spatial Databases: mobile wireless applications, spatio-temporal data.
Homework • Assignments • Include research paper critique and presentation • Assignments will be due at the beginning of the class on the due date • All assignments must have your name, student ID and course name/number • You are encouraged to type your answers • Late homework • Submit to my office on paper • 30% penalty for the first 24-hour • 70% penalty for >= 24 hour
Examinations/Grading • There will be no exams for this course • Grading • Class participation and 10 quiz 20% • Paper analysis and presentation 30% • Final Project 50% • Presentation 20% • Final Report 30%
Paper Review and Presentation • Each student is expected to write a critique of TWO paper P and Q • Present their overview in the class • This homework will help us learn literature analysis skills towards starting a thesis or to simply understand a research paper.
Steps for Paper Review and Presentation • A1. Identify the paper P and Q you would present. Make a home page for the course assignment. Identify paper R and S you would like to lead discussion • A2. Submit the narratives and slides of the paper P and Q. Electronic copy will be linked to the web-page for the course to make those available to the audience. The written critique should be about 2000 words long and the oral presentation should consist of about 30 transparencies addressing the following questions: • Problem statement • List the major contributions of the paper • What are the key concepts behind the approach in this paper • What is the validation methodology • List the assumptions made by the authors • If you were to rewrite this paper today, what would you preserve and what would be revise? Briefly justify. • A3. Make a 45-minutes oral presentation in the class on paper P and Q. • A4. Lead the group discussion when paper R and S are presented.
Term Paper/Project • P0. Identify a team of two for your project and create a webpage for the project • P1. Define possible questions • P2. Identify key sources, types of evidence • P3. Summary of key readings • P4. Possible outline or overall structure for paper / project. • P5. Formal proposal: Includes questions or thesis, key resources, and proposed structure revised with feedback from peers and the instructor. • P6. Rough draft. Follow the outlines from the research papers covered in the course. For example, the draft may have 5 sections: • Problem Statement, Significance of the problem • Related Work and Our Contributions • Proposed Approach • Validation of listed contributions (experimental, analytical) • Conclusions and Future Work • P7. Oral presentation in the class using 20 transparencies
Honor System • All work is to be done under the provisions of the University of North Texas Honor System. • Students are encouraged to discuss the interpretation of an assignment • However, the actual solution to problems must be one's own • In research report, related work should be properly cited
Helpful Comments • Read the textbook and papers before class • Participate class discussion • Review textbook and papers regularly after class • Start working on the assignments soon after they are handed out. • Come to my office hours for questions and project discussion
Chapter 1: Introduction to Spatial Databases1.1 Overview1.2 Application domains1.3 Compare a SDBMS with a GIS 1.4 Categories of Users1.5 An example of an SDBMS application1.6 A Stroll though a spatial database 1.6.1 Data Models, 1.6.2 Query Language, 1.6.3 Query Processing, 1.6.4 File Organization and Indices, 1.6.5 Query Optimization, 1.6.6 Data Mining
Learning Objectives • Learning Objectives (LO) • LO1 : Understand the value of SDBMS • Application domains • Users • How is different from a DBMS? • LO2: Understand the concept of spatial databases • LO3: Learn about the Components of SDBMS • Mapping Sections to learning objectives • LO1 - 1.1, 1.2, 1.4 • LO2 - 1.3, 1.5 • LO3 - 1.6
Value of SDBMS • Traditional (non-spatial) database management systems provide: • Persistence across failures • Allows concurrent access to data • Scalability to search queries on very large datasets which do not fit inside main memories of computers • Efficient for non-spatial queries, but not for spatial queries • Non-spatial queries: • List the names of all bookstore with more than ten thousand titles. • List the names of ten customers, in terms of sales, in the year 2001 • Use an index to narrow down the search • Spatial Queries: • List the names of all bookstores with ten miles of Minneapolis • List all customers who live in Tennessee and its adjoining states • List all the customers who reside within fifty miles of the company headquarter
Value of SDBMS – Spatial Data Examples • Examples of non-spatial data • Names, phone numbers, email addresses of people • Examples of Spatial data • Census Data • NASA satellites imagery - terabytes of data per day • Weather and Climate Data • Rivers, Farms, ecological impact • Medical Imaging • Exercise: Identify spatial and non-spatial data items in • A phone book • A Product catalog
Value of SDBMS – Users, Application Domains • Many important application domains have spatial data and queries. Some Examples follow: • Army Field Commander: Has there been any significant enemy troop movement since last night? • Insurance Risk Manager: Which homes are most likely to be affected in the next great flood on the Mississippi? • Medical Doctor: Based on this patient's MRI, have we treated somebody with a similar condition ? • Molecular Biologist:Is the topology of the amino acid biosynthesis gene in the genome found in any other sequence feature map in the database ? • Astronomer:Find all blue galaxies within 2 arcmin of quasars. • Exercise: List two ways you have used spatial data. Which software did you use to manipulate spatial data?
Due and Next Class • Paper review selection and webpage • Team formation and project webpage • Prepare on a open book quiz with 2 questions based on today’s topic • Read chapter 1.1,1.2,1.4 if you have not • Next class: chapter 1.3,1.5,1.6 of the book
Agenda Today • Quiz with two questions • Paper selection and team formation • Two papers to present and two papers to lead discussion • Team of two or one • Topics • Concepts of spatial database • Components of a spatial database • Summary of chapter 1 • Next class’s topic
Quiz • Give a scenario and a query in your life which involves spatial data. • Give a possible reason why spatial analysis using spatial queries on customer data may be important to vendors such as best buy and walmart.
Learning Objectives • Learning Objectives (LO) • LO1 : Understand the value of SDBMS • LO2: Understand the concept of spatial databases • What is a SDBMS? • How is it different from a GIS? • LO3: Learn about the Components of SDBMS • Sections for LO2 • Section 1.5 provides an example SDBMS • Section 1.1 and 1.3 compare SDBMS with DBMS and GIS
What is a SDBMS ? • A SDBMS is a software module that • can work with an underlying DBMS • supports spatial data models, spatial abstract data types (ADTs) and a query language from which these ADTs are callable • supports spatial indexing, efficient algorithms for processing spatial operations, and domain specific rules for query optimization • Example: Oracle Spatial data cartridge, ESRI SDE • can work with Oracle 8i DBMS • Has spatial data types (e.g. polygon), operations (e.g. overlap) callable from SQL3 query language • Has spatial indices, e.g. R-trees • IBM: Spatial Option • Informix: Spatial Datablade
SDBMS Example • Consider a spatial dataset with: • County boundary (dashed white line) • Census block - name, area, population, boundary (dark line) • Water bodies (dark polygons) • Satellite Imagery (gray scale pixels) • Storage in a SDBMS table: create table census_blocks ( name string, area float, population number, boundary polygon ); Fig 1.2
Modeling Spatial Data in Traditional DBMS • A row in the table census_blocks (Figure 1.3) • Question: Is Polyline datatype supported in DBMS? Figure 1.3
Spatial Data Types and Traditional Databases • Traditional relational DBMS • Support simple data types, e.g. number, strings, date • Modeling Spatial data types is tedious • Example: Figure 1.4 shows modeling of polygon using numbers • Three new tables: polygon, edge, points • Note: Polygon is a polyline where last point and first point are same • A simple unit sqaure represented as 16 rows across 3 tables • Simple spatial operators, e.g. area(), require joining tables • Tedious and computationally inefficient • Question. Name post-relational database management systems which facilitate modeling of spatial data types, e.g. polygon.
Evolution of DBMS technology Fig 1.5
Spatial Data Types and Post-relational Databases • Post-relational DBMS • Support user defined abstract data types • Spatial data types (e.g. polygon) can be added • Choice of post-relational DBMS • Object oriented (OO) DBMS • Object relational (OR) DBMS • A spatial database is a collection of spatial data types, operators, indices, processing strategies, etc. and can work with many post-relational DBMS as well as programming languages like Java, Visual Basic etc.
An Example : OO-Model • Class Highway • tuple (highway_name: string, highway_type: string, • sections: list(Section)) • Class Section • tuple (section_name: string, • number_lanes: integer, • city_start: City, • city_end: City, • geometry: Line) • Class City • Tuple (city_name: string, • population: integer, • geometry: Region)
An Example : OO-Model • Method • Method length in class Line: real (Computes the length of a line.) • Query: Length of Interstate 99 • Sum (Select s. geometry->length() • from h in All_highways, s in h.sections • where h.highway_name = ‘I99’)
How is a SDBMS different from a GIS ? • GIS is a software to visualize and analyze spatial data using spatial analysis functions such as • Search Thematic search, search by region, (re-)classification • Location analysis Buffer, corridor, overlay • Terrain analysis Slope/aspect, catchment, drainage network • Flow analysis Connectivity, shortest path • Distribution Change detection, proximity, nearest neighbor • Spatial analysis/Statistics Pattern, centrality, autocorrelation, indices of similarity, topology: hole description • Measurements Distance, perimeter, shape, adjacency, direction • GIS uses SDBMS • to store, search, query, share large spatial data sets
How is a SDBMS different from a GIS ? • SDBMS focuses on • Efficient storage, querying, sharing of large spatial datasets • Provides simpler set based query operations • Example operations: search by region, overlay, nearest neighbor, distance, adjacency, perimeter etc. • Uses spatial indices and query optimization to speedup queries over large spatial datasets. • SDBMS may be used by applications other than GIS • Astronomy, Genomics, Multimedia information systems, ...
Evolution of acronym “GIS” • Geographic Information Systems (1980s) • Geographic Information Science (1990s) • Geographic Information Services (2000s) Fig 1.1
Three meanings of the acronym GIS • Geographic Information Services • Web-sites and service centers for casual users, e.g. travelers • Example: Service (e.g. AAA, mapquest) for route planning • Geographic Information Systems • Software for professional users, e.g. cartographers • Example: ESRI Arc/View software • Geographic Information Science • Concepts, frameworks, theories to formalize use and development of geographic information systems and services • Example: design spatial data types and operations for querying
Learning Objectives • Learning Objectives (LO) • LO1 : Understand the value of SDBMS • LO2: Understand the concept of spatial databases • LO3: Learn about the Components of SDBMS • Architecture choices • SDBMS components: • data model, query languages, • query processing and optimization • File organization and indices • Data Mining • Chapter Sections • 1.5 second half • 1.6 – entire section
Components of a SDBMS • Recall: a SDBMS is a software module that • can work with an underlying DBMS • supports spatial data models, spatial ADTs and a query language from which these ADTs are callable • supports spatial indexing, algorithms for processing spatial operations, and domain specific rules for query optimization • Components include • spatial data model, query language, query processing, file organization and indices, query optimization, etc. • Figure 1.6 shows these components • We discuss each component briefly in chapter 1.6 and in more detail in later chapters.
Three Layer Architecture Fig 1.6
1.6.1 Spatial Taxonomy, Data Models • Spatial Taxonomy: • multitude of descriptions available to organize space. • Topology models homeomorphic relationships, e.g. overlap, adjacent • Euclidean space models distance and direction in a plane • Graphs models connectivity, Shortest-Path • Spatial data models • rules to identify identifiable objects and properties of space • Object model help manage identifiable things, e.g. mountains, cities, land-parcels etc. • Field model help manage continuous and amorphous phenomenon, e.g. temperature, satellite imagery, snowfall etc. • More details in chapter 2.
1.6.2 Spatial Query Language • Spatial query language • Spatial data types, e.g. point, linestring, polygon, … • Spatial operations, e.g. overlap, distance, nearest neighbor, … • Callable from a query language (e.g. SQL3) of underlying DBMS • SELECT S.name • FROM Senator S • WHERE S.district.Area() > 300 • Standards • SQL3 (a.k.a. SQL 1999) is a standard for query languages • OGIS is a standard for spatial data types and operators • Both standards enjoy wide support in industry • More details in chapters 2 and 3
Multi-scan Query Example • Find all senators who serve a district of are greater than 300 square miles and who own a business within the district • Spatial join example • SELECT S.name FROM Senator S, Business B • WHERE S.district.Area() > 300 AND Within(B.location, S.district) Fig 1.7
Multi-scan Query Example • Find the name of all female senators who own a business • Non-Spatial Join example • SELECT S.name FROM Senator S, Business B • WHERE S.soc-sec = B.soc-sec AND S.gender = ‘Female’ Fig 1.7
1.6.3 Query Processing • Efficient algorithms to answer spatial queries • Common Strategy - filter and refine • Filter Step:Query Region overlaps with MBRs of B,C and D • Refine Step: Query Region overlaps with B and C Fig 1.8
Query Processing of Join Queries • Example - Determining pairs of intersecting rectangles • (a):Two sets R and S of rectangles, (b): A rectangle with 2 opposite corners marked, (c ): Rectangles sorted by smallest X coordinate value • Plane sweep filter identifies 5 pairs out of 12 for refinement step • Details of plane sweep algorithm on page 15 Fig 1.9
1.6.4 File Organization and Indices • A difference between GIS and SDBMS assumptions • GIS algorithms: dataset is loaded in main memory (Fig. 1.10(a)) • SDBMS: dataset is on secondary storage e.g disk (Fig. 1.10(b)) • SDBMS uses space filling curves and spatial indices • to efficiently search disk resident large spatial datasets • Memory access: 10ns, disk access 10,000,000ns (2000) Fig 1.10
Organizing spatial data with space filling curves • Issue: • Sorting is not naturally defined on spatial data • Many efficient search methods are based on sorting datasets • Space filling curves • Impose an ordering on the locations in a multi-dimensional space • Examples: row-order (Fig. 1.11(a), z-order (Fig 1.11(b)) • Allow use of traditional efficient search methods on spatial data Fig 1.11
Spatial Indexing: Search Data-Structures • Choice for spatial indexing: • B-tree is a hierarchical collection of ranges of linear keys, e.g. numbers • B-tree index is used for efficient search of traditional data • B-tree can be used with space filling curve on spatial data • R-tree provides better search performance yet! • R-tree is a hierarchical collection of rectangles • More details in chapter 4 Fig 1.12: B-tree Fig. 1.13: R- tree