1 / 42

Geospatial Data Mining at University of Texas at Dallas

Geospatial Data Mining at University of Texas at Dallas. Dr. Bhavani Thuraisingham (Computer Science) Dr. Latifur Khan (Computer Science) Dr. Fang Qiu (GIS) Students Shaofei Chen (GIS) Mohammad Farhan (CS) Shantnu Jain (GIS), Lei Wang (CS) Post Doc: Dr. Chuanjun Li

ossie
Download Presentation

Geospatial Data Mining at University of Texas at Dallas

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Geospatial Data Mining at University of Texas at Dallas Dr. Bhavani Thuraisingham (Computer Science) Dr. Latifur Khan (Computer Science) Dr. Fang Qiu (GIS) Students Shaofei Chen (GIS) Mohammad Farhan (CS) Shantnu Jain (GIS), Lei Wang (CS) Post Doc: Dr. Chuanjun Li This Research is Partly Funded by Raytheon

  2. Outline • Ontology-driven Modeling and Mining of Geospatial Data • Ontology • Case Study: Dataset • Aster Dataset • Process of Our Approach • SVM Classifiers • Region Growing • Graph of Regions: Near Neighboring Regions • Ontology Driven Rule Mining • High Level Concept Detection • Output • Related Work • Future Work

  3. Ontology-Driven Modeling and Mining of Geospatial Data • Ontology will be represented as a directed acyclic graph (DAG). Each node in DAG represents a concept • Interrelationships are represented by labeled arcs/links. Various kinds of interrelationships are used to create an ontology such as specialization (Is-a), instantiation (Instance-of), and component membership (Part-of). IS-A Urban Residential Part-of Multi-family Home Single Family Home Apartment

  4. Ontology-Driven Modeling and Mining of Geospatial Data • We will develop domain-dependent ontologies • Provide for specification of fine grained concepts • USGS taxonomy can be extended by adding concepts to facilitate finer grained classification • Concept, “Residential” can be further categorized into concepts, “Apartment”, “Single Family House” and “Multi-family House” • Generic ontologies provide concepts in coarser grain

  5. Case Study: Dataset • ASTER (Advanced Spaceborne Thermal Emission and Reflection Radiometer) • To obtain detailed maps of land surface temperature, reflectivity and elevation. • ASTER obtains high-resolution (15 to 90 square meters per pixel) images of the Earth in 14 different wavelengths of the electromagnetic spectrum, ranging from visible to thermal infrared light. • ASTER data is used to create detailed maps of land surface temperature, emissivity, reflectivity, and elevation.

  6. Case Study: Dataset & Features • Remote sensing data used in this study is ASTER image acquired on 31 December 2005. • Covers northern part of Dallas with Dallas-Fort Worth International Airport located in southwest of the image. • ASTER data has 14 channels from visible through the thermal infrared regions of the electromagnetic spectrum, providing detailed information on surface temperature, emissive, reflectance, and elevation. • ASTER is comprised of the following three radiometers : • Visible and Near Infrared Radiometer (VNIR --band 1 through band 3) has a wavelength range from 0.56~0.86μm.

  7. Case Study: Dataset & Features • Short Wavelength Infrared Radiometer (SWIR-- band 4 through band 9) has a wavelength range from 1.60~2.43μm. • Mid-infrared regions. Used to extract surface features. • Thermal Infrared Radiometer (TIR --band 10 through band 14) covers from 8.125~11.65μm. • Important when research focuses on heat such as identifying mineral resources and observing atmospheric condition by taking advantage of their thermal infrared characteristics.

  8. ASTER Dataset: Technical Challenges • Testing will be done based on pixels • Goal: Region-based classification and identify high level concepts • Solution • Grouping adjacent pixels that belong to same class • Identify high level concepts using ontology-based rule mining

  9. Testing Image Pixels Training Image Pixels SVM Classifier Classified Pixels Region Growing Graph of Regions Shortest Path Tree Graph of Near Neighboring Regions Ontology Driven Rule Mining High Level Concept Process of Our Approach

  10. Testing Image Pixels Training Image Pixels SVM Classifier Classified Pixels Region Growing Graph of Regions Shortest Path Tree Graph of Near Neighboring Regions Ontology Driven Rule Mining High Level Concept Process of Our Approach

  11. SVM Classifiers: Atomic Concepts

  12. SVM Classifiers: Atomic Concepts Different Class Distribution of Training and Test Set

  13. SVM Classifiers: Atomic Concepts Accuracy of Various Classifiers

  14. Testing Image Pixels Training Image Pixels SVM Classifier Classified Pixels Region Growing Graph of Regions Shortest Path Tree Graph of Near Neighboring Regions Ontology Driven Rule Mining High Level Concept Process of Our Approach

  15. Region Growing

  16. Region Growing

  17. Region Growing

  18. Region Growing

  19. Testing Image Pixels Training Image Pixels SVM Classifier Classified Pixels Region Growing Graph of Regions Shortest Path Tree Graph of Near Neighboring Regions Ontology Driven Rule Mining High Level Concept Process of Our Approach

  20. Graph of Regions: Near Neighbor Regions • After region growing • We generate a graph by treating each region as a node • Distance between two regions as edge between two nodes. • Generate Shortest Path Tree (SPT) of this graph for each source. • Near Neighboring regions will be determined

  21. Shortest Path Tree …… …

  22. Testing Image Pixels Training Image Pixels SVM Classifier Classified Pixels Region Growing Graph of Regions Shortest Path Tree Graph of Near Neighboring Regions Ontology Driven Rule Mining High Level Concept Process of Our Approach

  23. RootNode DeepForest CountrySide City Forest Grass BareLand Water OpenPlaces Road Building Park Athletic Field Garden Water Cross Lake Reservoir Ontology Driven Rule Mining

  24. Ontology-Driven Modeling and Mining of Geospatial Data • Ontology-based Pruning and Retrieval: • Ontology will facilitate mining of information at various level of abstraction. • Using ontology and a set of atomic concepts we will infer a set of high level concepts (i.e., apartment, single family house, multi-level house). • We will exploit the possible influence relations between concepts based on the given ontology hierarchy.

  25. Ontology-Driven Modeling and Mining of Geospatial Data • To determine or to improve the accuracy of high level concept classifier learning, two forms of influence are taken into consideration: boosting, and confusion. • Boosting factor is Co-occurrence of regions based on topology (spatial relationship) such as adjacency, connectivity, orientation, hierarchy, or combinations thereof embedded in the ontology. For a certain concept, “City”, specific concepts “Building,” “Road” and “Open Space” will co-exist. • Confusion factor is the influence between concepts that cannot be coexistent.

  26. Rules: From Ontology • Class(A1)=Building ^ Class(A2) = Road ^ Class(A3) =Open Place ^ NextTo (A1,A2, Distance) ^ NextTo (A2, A3, D)=> City (A1 U A2 U A3) • Class(A1)=Forest ^ Class(A2)=Water ^ Class(A3) =Bare Land ^ NextTo (A1,A2, Distance) ^ NextTo (A2, A3, D)=> Deep Forest (A1 U A2 U A3) • Class(A1)=Forest ^ Class(A2)=Water ^ NextTo (A1,A2, D)=> Deep Forest(A1 U A2) • Class(A1)=Forest ^ Class(A2)=Bare Land ^ NextTo(A1,A2,D)=> Deep Forest(A1 U A2) • Class(A1)=Building ^ Class(A2)=Open Place ^ NextTo(A1,A2,D)=> City (A1 U A2) Note that D is for Distance; Ai is a Region & Class (Ai)= Concept of the Region

  27. Ontology Driven Rule Mining: Psudocode

  28. Implementation • Software: • ArcGIS 9.1 software. • For programming, we use Visual Basic 6.0 embedded in the software. • As of Today • 8 rules • Two levels Taxonomy

  29. Output:Training set

  30. Output:Test set

  31. Output:City Concept

  32. Output:Deep Forest Concept

  33. Related Work • Classification • ML • Wilson, Gina M. 2004. Landcover classification of the City of Rocks, National Reserve using ASTER satellite imagery. Upper Columbia Basin Network, Inventory and Monitoring Program. Project Number UCBN-000001, National Park Service. Moscow, ID. 19 Pages. • SVM • Farid Melgani, Lorenzo Bruzzone, Classification of hyperspectral remote-sensing images with support vector machines. • Zhu, G. and D.G. Blumberg. (2002). Classification using ASTER data and SVM algorithms - The case study of Beer Sheva, Israel. • Huang C.; Davis L. S.; Townshend J. R. G. (2002) An assessment of support vector machines for land cover classification.

  34. Rules: From Ontology • Technical Challenges • Sparse Test Dataset • Difficult to determine adjacency • Size of Area should be included in Rules • Finer grain classification is required • Concepts like Lake, River Rather than Water Concept • Ordering of Rules will play a role

  35. Future Work • Develop Full Fledged Prototype (By January 31, 2007) • Improve Accuracy of SVM classification (By January 31, 2007) • Hierarchical SVM • Generate Rules automatically (By June 30, 2007) • Ripper –Semi-automatically • Association Rule mining

  36. Confusion Matrix (7 Classes) Predicted Actual Water Bare Lands Grass Forests Buildings Open Places Roads Water 23392 0 0 0 1 0 0 Bare Lands 0 3685 10 5 1 3 3 Grass 0 10 407 76 0 2 0 Forests 3 5 46 1022 1 2 0 Buildings 0 1 4 3 218 2 0 Open Places 0 13 1 5 4 638 7 Roads 0 0 0 0 0 6 63

  37. Observations: Hierarchical SVM • Different Classes have different true recognition rates (TR) and different false recognition rates (FR) • If there is one class for which TR is HIGH and FR is LOW: • Classification to this class can be accepted with high confidence • Classes with low TR and high FR can be considered for a NEW and possibly better classifier

  38. Confusion Matrix (6 Classes) Predicted Actual Bare Lands Grass Forests Buildings Open Places Roads Bare Lands 3685 10 5 1 3 3 Grass 10 407 76 0 2 0 Forests 5 46 1022 1 2 0 Buildings 1 4 3 218 2 0 Open Places 13 1 5 4 638 7 Roads 0 0 0 0 6 63

  39. Confusion Matrix (5 Classes) Predicted Actual Bare Lands Grass Forests Open Places Roads Bare Lands 3685 10 5 3 3 Grass 10 407 76 2 0 Forests 5 46 1022 2 0 Open Places 13 1 5 638 7 Roads 0 0 0 6 63

  40. Suppose k classes • ONE multi-class Classifier • Originally k(k-1)/2 binary SVMs Class with HIGH TR and LOW FR K(k-1)/2 binary SVMs Class 1 Class 2 Class 3 …… Class k

  41. Suppose k classes • ONE multi-class Classifier • Originally k(k-1)/2 binary SVMs • Then (k-1)(k-2)/2 binary SVMs High TR and Low FR K(k-1)/2 binary SVMs Class 1 First Classifier: (k-1)(k-2)/2 binary SVMs Second Classifier: Class 2 Class 2 Class 3 Class 3 …… …… … Class k Class k

  42. Challenges: Hierarchical SVM • Same set of parameters will not yield the same classification rates for classifiers at different levels • Classification accuracy might not be sensitive to parameters • How to achieve High TR and Low FR for some classes?

More Related