1 / 20

A PARALLEL FORMULATION OF THE SPATIAL AUTO-REGRESSION MODEL FOR MINING LARGE GEO-SPATIAL DATASETS

HPDM 2004 Workshop at SIAM Data Mining Conference Barış M. Kazar, Shashi Shekhar, David J. Lilja, Daniel Boley Army High Performance Computing and Research Center (AHPCRC) Minnesota Supercomputing Institute (MSI) Digital Technology Center (DTC) University of Minnesota 04.24.2004.

Jims
Download Presentation

A PARALLEL FORMULATION OF THE SPATIAL AUTO-REGRESSION MODEL FOR MINING LARGE GEO-SPATIAL DATASETS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HPDM 2004 Workshop at SIAM Data Mining Conference Barış M. Kazar, Shashi Shekhar, David J. Lilja, Daniel BoleyArmy High Performance Computing and Research Center (AHPCRC) Minnesota Supercomputing Institute (MSI) Digital Technology Center (DTC) University of Minnesota 04.24.2004 A PARALLEL FORMULATION OF THE SPATIAL AUTO-REGRESSION MODEL FOR MINING LARGE GEO-SPATIAL DATASETS

  2. Overview • Motivation • Classical and New Data-Mining Techniques • Problem Definition • Our Approach • Experimental Results • Conclusions and Future Work A Parallel Formulation of The Spatial Auto-Regression Model for Mining Large Geo-spatial Datasets

  3. Motivation • Widespread use of spatial databases • Mining spatial patterns • The 1855 Asiatic Cholera on London [Griffith] • Fair Landing [NYT, R. Nader] • Correlation of bank locations with loan activity in poor neighborhoods • Retail Outlets [NYT, Walmart, McDonald etc.] • Determining locations of stores by relating neighborhood maps with customer databases • Crime Hot Spot Analysis [NYT, NIJ CML] • Explaining clusters of sexual assaults by locating addresses of sex-offenders • Ecology [Uygar] • Explaining location of bird nests based on structural environmental variables A Parallel Formulation of The Spatial Auto-Regression Model for Mining Large Geo-spatial Datasets

  4. Key Concept: Neighborhood Matrix (W) 6th row 6th row • Given: • Spatial framework • Attributes Space + 4-neighborhood Binary W Row-normalized W • Wallows other neighborhood definitions • distance based • 8 and more neighbors A Parallel Formulation of The Spatial Auto-Regression Model for Mining Large Geo-spatial Datasets

  5. Solving Spatial Auto-regression Model • = 0, = 0 : Least Squares Problem • = 0, = 0 : Eigenvalue Problem • General case: Computationally expensive • Maximum Likelihood Estimation • Need parallel implementation to scale up Classical and New Data-Mining Techniques A Parallel Formulation of The Spatial Auto-Regression Model for Mining Large Geo-spatial Datasets

  6. Related Work & Our Contributions • Related work: Li, 1996 • Limitations: Solved 1-D problem • Our Contributions • Parallel solution for 2-D problems • Portable software • Fortran 77 • An Application of Hybrid Parallelism • MPI messaging system • Compiler directives of OpenMP A Parallel Formulation of The Spatial Auto-Regression Model for Mining Large Geo-spatial Datasets

  7. A Serial Solution A Compute Eigenvalues • B • Golden Section Search • Calculate ML Function C Least Squares Eigenvalues ofW • Compute Eigenvalues (Stage A) • Produces dense W neighborhood matrix, • Forms synthetic data y • Makes W symmetric • Householder transformation • Convert dense symmetric matrix to tri-diagonal matrix • QL Transformation • Compute all eigenvalues of tri-diagonal matrix A Parallel Formulation of The Spatial Auto-Regression Model for Mining Large Geo-spatial Datasets

  8. Serial Response Times (sec) • Stage A is the bottleneck & Stage B and C contribute very small to response time A Parallel Formulation of The Spatial Auto-Regression Model for Mining Large Geo-spatial Datasets

  9. Problem Definition • Given: • A Sequential solution procedure: “Serial Dense Matrix Approach” for one-dimensional geo-spaces • Find: • Parallel Formulation of Serial Dense Matrix Approach for • multi-dimensional geo-spaces • Constraints: •   N(0,2I) IID • Reasonably efficient parallel implementation • Parallel Platform • Size of W (large vs. small and dense vs. sparse) • Objective: • Portable & scalable software A Parallel Formulation of The Spatial Auto-Regression Model for Mining Large Geo-spatial Datasets

  10. Our Approach – Parallel Spatial Auto-Regression • Function vs. Data Partitioning • Function partitioning: Each processor works on the same data with different instructions • Data partitioning (applied): Each processor works on different data with the same instructions • Implementation Platform: • Fortran with MPI & OpenMP API’s • No machine-specific compiler directives • Portability • Help software development and technology transfer • Other Performance Tuning • Static terms computed once A Parallel Formulation of The Spatial Auto-Regression Model for Mining Large Geo-spatial Datasets

  11. Contiguous 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 Round-robin with chunk size 1 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 P1 P3 P1 P3 P1 P3 P1 P3 P1 P1 P1 P1 P3 P3 P3 P3 P2 P2 P2 P2 P4 P4 P4 P4 P4 P2 P4 P2 P4 P2 P4 P2 Data Partitioning in a Smaller Scale • 4 processors are used and • chunk size can be determined by the user • W is 16-by-16 and partitioned across processors P1- (40 vs. 58) P2- (36 vs. 42) P3- (32 vs. 26) P4- (28 vs. 10) A Parallel Formulation of The Spatial Auto-Regression Model for Mining Large Geo-spatial Datasets

  12. Data Partitioning & Synchronization A Compute Eigenvalues • B • Golden Section Search • Calculate ML Function C Least Squares Eigenvalues ofW • A : Contiguous for rectangular loops • & round-robin with chunk-size 4 • B : Contiguous • C : Contiguous • The arrows are also synchronization points for parallel solution • A B C • There are synchronization points within the boxes as well A Parallel Formulation of The Spatial Auto-Regression Model for Mining Large Geo-spatial Datasets

  13. Experimental Design A Parallel Formulation of The Spatial Auto-Regression Model for Mining Large Geo-spatial Datasets

  14. Experimental Results – Effect of Load Balancing A Parallel Formulation of The Spatial Auto-Regression Model for Mining Large Geo-spatial Datasets

  15. Experimental Results- Effect of Problem Size A Parallel Formulation of The Spatial Auto-Regression Model for Mining Large Geo-spatial Datasets

  16. Experimental Results- Effect of Chunk Size • Critical value of the chunk size for which the speedup reaches the maximum. • This value is higher for dynamic scheduling to compensate for • the scheduling overhead. • The workload is more evenly distributed across processors at the • critical chunk size value. A Parallel Formulation of The Spatial Auto-Regression Model for Mining Large Geo-spatial Datasets

  17. Experimental Results- Effect of # of Processors A Parallel Formulation of The Spatial Auto-Regression Model for Mining Large Geo-spatial Datasets

  18. Summary • Developed a parallel formulation of spatial auto-regression model • Estimates maximum likelihood of regular square tessellation 1-D and 2-D planar surface partitionings for location prediction problems • Used dense eigenvalue computation and hybrid parallel programming A Parallel Formulation of The Spatial Auto-Regression Model for Mining Large Geo-spatial Datasets

  19. Future Work • Understand reasons of inefficiencies • Algebraic cost model for speedup measurements on different architectures • Fine tune implemented parallel formulation • Consider alternate parallel formulations • Parallelize other serial solutions using sparse-matrix techniques • Chebyshev Polynomial approximation • Markov Chain Monte Carlo Estimator A Parallel Formulation of The Spatial Auto-Regression Model for Mining Large Geo-spatial Datasets

  20. Acknowledgments & Final Word • Army High Performance Computing Research Center-AHPCRC • Minnesota Supercomputing Institute - MSI • Digital Technology Center – DTC • Spatial Database Group Members • ARCTiC Labs Group Members • Dr. Sanjay Chawla • Dr. Kelley Pace • Dr. James LeSage THANK YOU VERY MUCH Questions? A Parallel Formulation of The Spatial Auto-Regression Model for Mining Large Geo-spatial Datasets

More Related