1 / 16

Data Mining

Eamonn Keogh. Data Mining . What is data mining?. Generally, data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information.

affrica
Download Presentation

Data Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Eamonn Keogh Data Mining

  2. What is data mining? • Generally, data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information. • In my lab, we tend to look at data and problems that no one else looks at.

  3. Data Mining People • Eamonn Keogh • Vagelis Hristidis • Vassilis Tsotras • ChinyaRavishankar • Michael Pazzani • Christian Shelton (AI) • Stefano Lonardi (Bioinformatics)

  4. My PhD Students • Jessica Lin (Ph.d 2005: George Mason University) • Chotirat (Ann) Ratanamahatana (Ph.d 2005: Chulalongkorn University) • Li Wei (Ph.d 2006, Google) • Xiaopeng Xi (Ph.d 2007, Yahoo) • DragomirYankov. (Ph.d  2008, Yahoo) • Lexiang Ye (Ph.d 2010 Google) • Xiaoyue (Elaine) Wang (Ph.d  2010 Nokia) • Jin-Wien Shieh (Ph.d  2010 Microsoft) • Qiang Zhu (Ph.d  2011 stumbleupon.com) • Abdullah Mueen (Ph.d  2012 Microsoft) • BilsonCampana (Ph.d going to Google at Xmas) • Thanawin (Art) Rakthanmanon (Ph.d ongoing) • Bing Hu (Ph.d ongoing) • Yuan Hao (Ph.d ongoing) • Jesin Zakaria (Ph.d ongoing) • Yipeng Chen (Ph.d ongoing)

  5. stinging nettles false nettles

  6. false nettles stinging nettles false nettles Shapelet Dictionary I Shapelet 5.1 Leaf Decision Tree I yes no 0 1 false nettles stinging nettles stinging nettles false nettles

  7. Avonlea Clovis 1.5 1.0 0.5 11.24 I (Clovis) 0 85.47 II (Avonlea) Shapelet Dictionary 0 100 200 300 400 Arrowhead Decision Tree I II 0 2 1 Clovis Mix Avonlea Decision Tree for Arrowheads Of course, this is a decision tree, we want to eventually do clustering. However, in general, features that are good for classification, are good for clustering. To do: On a small labeled subset of data, learn a dictionary of shaplets. Code the large unlabeled dataset with reference to that dictionary. Training data (subset) The shapelet decision tree classifier achieves an accuracy of 80.0%, the accuracy of rotation invariant one-nearest-neighbor classifier is 68.0%.

  8. There now exists, perhaps tens of million of digitized pages of historical manuscripts dating back to the 12th century, that feature one or more heraldic shields The images are often stained, faded or torn

  9. Wouldn’t it be great if we could automatically hyperlink all similar shields to each other? For example, here we could link two occurrence of the Von Sax family shield. To do this, we need to consider shape, color and texture. Lets just consider shape for now… Manesse Codex an illuminated manuscript in codex form, copied and illustrated between 1304 and 1340 in Zurich

  10. Indexing and Mining Rock Art Rock art is found on every continent except Antarctica. To date, computer science has had little impact on analysis of rock art. Australia may have 100 million examples A decade ago, Walt et al. summed up the state of petroglyph research by noting, “Complete-site and cross-site research thus remains impossible, incomplete, or impressionistic”

  11. If we assume that we have high quality binary images of rock art, then we can do clustering, classification, indexing motif discovery. Atlatls Anthropomorphs One challenge is designing distance measures. For example, we would like to find and similar, even though one is solid and one is hollow. Bighorn Sheep *Zhu, Wang, Keogh, Lee (2009). Augmenting the Generalized Hough Transform to Enable the Mining of Petroglyphs. SIGKDD 2009

  12. Why Insects Matter I Because they eat/destroy $40 billion+ worth of food each year One Example Crop/Insect Apple Maggot Rhagoletispomonella Surround WP Crop Protectant against insects. Derived from Kaolin clay, a natural mineral it forms a barrier that acts to control insect pests. Effective & safe, but very expensive Apple maggots cause two types of injury: dimpling and tunneling. Dimpling occurs around the site where eggs are laid, causing the flesh to stop growing, resulting in a sunken, misshapen, dimpled area. Tunneling, done by the larvae (maggots) eating in the fruit, causes the pulp to break down, discolor, and start to rot. The tunnels are often enlarged by bacterial decay. Damaged fruit eventually becomes soft and rotten and cannot be used. Carbaryl is an insecticide that is widely used agriculturally. Effective, but likely a human carcinogen, and it kills honey bees and other pollinators [1]. [1] http://npic.orst.edu/factsheets/carbgen.pdf [2] http://www.maine.gov/agriculture/pesticides/gotpests/bugs/factsheets/apple-maggot-cornell.pdf

  13. Why Insects Matter II Because they kill over one million people each year

  14. Our Sensor • One second of audio from our sensor. • The Common Eastern Bumble Bee (Bombus impatiens) takes about one tenth of a second to pass the laser. 0.2 0.1 0 Background noise Bee begins to cross laser Bee has past though the laser -0.1 -0.2 4 x 10 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5

  15. Peak at 705 Hz 0 100 200 300 400 500 600 700 800 900 1000 Frequency (Hz) Culex quinquefasciatu Aedesaegypti Bombus impatiens 100 200 300 400 500 600 700 800 Frequency (Hz) • Almost certainly a Aedesaegypti

  16. Eamonn KeoghComputer Science & Engineering Department University of California – Riverside eamonn@cs.ucr.edu

More Related