1 / 29

Date: 2012-12-10 Armin Ashoury Rad, Youngjib Ham, and Yuhyun Song

Evaluation, Prediction, and Visualization of Spatio -Temporal Crime Patterns in Washington D.C. Area. Date: 2012-12-10 Armin Ashoury Rad, Youngjib Ham, and Yuhyun Song. Introduction. Data mining is the intersection of statistics and computer science to explore huge data sets

ziarre
Download Presentation

Date: 2012-12-10 Armin Ashoury Rad, Youngjib Ham, and Yuhyun Song

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Evaluation, Prediction, and Visualization of Spatio-Temporal Crime Patterns in Washington D.C. Area Date: 2012-12-10 Armin AshouryRad, Youngjib Ham, and Yuhyun Song

  2. Introduction • Data mining is the intersection of statistics and computer science to explore huge data sets • Using crime dataset in Washington D.C area, provide users with useful information such as safe factor of location

  3. Problem Statements and Objective of the Project • Lots of crime related to Larceny, Larceny auto, Larceny F/auto in the area near Washington D.C. Focusing on Larceny incidents, offer the safety factor and information about crime history visually. • Predict the possible type of crime and find the association betweentype of crimes • Objective : Share the crime information with users about spatial distribution of different crimes occurred in Washington D.C. area

  4. Outline • Classification Rules on Crime data in Washington D.C • Detection of Association Rules between the types of crime • Supervised and unsupervised spatial temporal clustering on stolen car

  5. Classification Rules:General Idea • We can predict the type of crimes using the built classifier. • Use 70% of set of data as a training set and the rest of data as a test set to find the best classifier giving the lowest misclassification rate. • Utilize three different classification models using crime data in Washington D.C from 2006 to 2009 • Used variables: • Class variable: Type of Crimes(Larceny, Larceny F/auto, Larceny auto excluded) • Latitude, Longitude, Month, Date • RandomForest, QDA, KNN implemented

  6. Classifiers: Random Forest, KNN,and QDA • Random Forest • Main idea: Grow an ensemble of decision trees that vote for the most popular class • K-Nearest Neighbor • Main idea:  method for classifying objects based on closest training examples in the feature space. k-NN is a type of instance based learning. • An object is classified by a majority vote of its neighbors, with the object being assigned to the class most common amongst its k nearest neighbors • QDA • Main Idea: Quadratic discriminant analysis (QDA) is closely related to linear discriminant analysis(LDA) where it is assumed that the measurements from each class are normally distributed. Unlike LDA however, in QDA there is no assumption that the covariance of each of the classes is identical.

  7. Confusion matrix from random forest • The total misclassification rate when using Longitude, Latitude, Month, Date is 57.05% • Use random forest algorithm to decide which variables are important to classify the type of crimes.

  8. Variable Selection in Random Forest • To group the type of crimes, Location variables are key variables • Location attributes are more important than time variables Longitude Latitude

  9. Comparison: KNN vs. QDA KNN on training set QDA on training set KNN performs better than QDA

  10. KNN’s apparent misclassification rate using test data Apparent Misclassification rate: 0.659168

  11. From Classification rules… • From random forest, geographical information of crime is more important than time attributes. • KNN showed better performance than Random forest and QDA for classifying the type of crimes. However, made large misclassification error rate on test data when predicting type of crimes

  12. Association rules analysis • Different set of crimes happen in every day • When X occurs, Y also occurs in each day • Police can use this information to: • Understand why some kinds of crime happen simultaneously • Gain insight about the crime patterns: • Which crimes happen together • Take action: • Predict specific kinds of crimes • Proactively take a step for the crime with high probability

  13. Use of Association Rules • Definition • An item: a kind of crime • A transaction: a set of crimes which happen one day • Data • Area • Washington D.C. area • Duration • 2006 ~ 2009 (until Sep 27) • 1366 days • Crimes data

  14. Use of Association Rules • Data trends

  15. Mining association rules • Brute-force approach: • List all possible association rules • Prune the rules based on the minsup and minconf thresholds • Computationally expensive • Reducing number of candidates • Apriori principle • Probably the best known algorithm • Find all itemsets that have minimum support • Use frequent itemsets to generate rules • Compute k-item set by merging (k-1)-item sets

  16. Mining association rules • Apriori principle • Frequent itemset generation • Confidence (A  B) ≥ minConf • Support (A  B) ≥ minSup • The property • If (A B) is a frequent item set, then (A) and (B) have to be frequent item sets as well • In general: if X is frequent k-item set, then all (k-1)-item subsets of X are also frequent • Modify ‘ARMADA’ data mining tool in MATLAB central (http://www.mathworks.com/matlabcentral/fileexchange/3016)

  17. Association rules analysis • Types of association rules • Actionable Rules • contain high-quality, actionable information • Trivial Rules • information already well-known and familiar with • Inexplicable Rules • no explanation and do not suggest action • Trivial and Inexplicable Rules occur most often

  18. Association rules analysis • Example of Rules • {ASSAULT OFFENSES, BURGLARY, LARCENY}  {ROBBERY} (s=0.76, c=1.0) • {LARCENY AUTO}  {LARCENY F/AUTO} (s=0.89, c=1.0) • {ROBBERY}  {LARCENY AUTO} (s=0.87, c=0.89) • {LARCENY}  {BURGLARY} (s=0.79, c=0.91) • High support and high confidence: • Such obvious rules may tend to be uninteresting

  19. Association rules analysis • Example of Rules • {ASSAULT OFFENSES, BURGLARY, HOMICIDE OFFENSES}  {SEX OFFENSES ABUSE} (s=0.16, c=0.63) • {ARSON, ASSAULT OFFENSES, HOMICIDE OFFENSES}  {SEX OFFENSES ABUSE} (s=0.02, c=0.67) • {HOMICIDE OFFENSES, SEX OFFENSES ABUSE}  {ARSON} (s=0.02, c=0.12) • Simple interesting patterns: • ARSON, ASSAULT OFFENSES, SEX OFFENSES ABUSE, and HOMICIDE OFFENSES are likely to happen together

  20. Maps and Charts • Web app coded on ASP .NET • For maps we used Google fusion tables and imported them in app using iframe. • All the charts prepared using Google charts tool

  21. Safety factor Clustering all data points using k-means with 10 cluster Creating random points in close distance from actual data point Re-cluster random points using previous centroid • K-means clustering package on R

  22. Framework • Google map •  Marker Clusterer •  Geocoder •  500 points • Google street view • Google earth

  23. Prediction model and result • Spatio-Temporal regression model • Predicting number of crimes in different categories one step ahead using today’s number of crime in our region and neighbors • Prediction using Poisson regression model • Glm package of R

  24. Prediction on app • Google script • Graphic User interface

  25. Organized Larcenies 3D Clustering • Organized larceny: a person or a group that steal more than one car of one specific model from one region in short period of time • Normalized latitude longitude and time of crimes • 3D clustering using k-means clustering with 20 clusters • k-means in R package

  26. 2-steps clustering • 2 clustering procedure • First on location of larcenies • 10 clusters using k-means • Next on time of larcenies • 10 clusters for each one of location clusters • Total 100 clusters

  27. YouTube video

  28. Conclusion • Summary • Extracting previously unknown, valid, comprehensible information from crime dataset (2006 ~ 2009) in Washington D.C. area • Evaluation and Prediction of the crime patterns • Classification • Association analysis • Cluster analysis • Visualization of the results • Web-based App

  29. Conclusion • Contributions • Predicting crime patterns • Proactively taking a measure for the crime with high probability • Helping effectively control the limited police manpower • Future works • Predicting other cities’ crime trends • Studying crime datasets based on a wide range of variables • Demographics, education, etc.

More Related