1 / 23

Basic Data Mining Techniques

Basic Data Mining Techniques. Contents. Query Tools Statistical Techniques Visualization Techniques Case-Based Learning (K-Nearest Neighbor). Query Tools and Statistical Techniques. 客戶是電信公司最大的資產 客戶行為存在於交換機的通話記錄中 了解客戶行為成為電信公司的趨勢 案例 : 推銷電話線路 替那些目前線路已經飽和的公司提供更多的電話線路 ‚ 是持續會有的商機

hafwen
Download Presentation

Basic Data Mining Techniques

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Basic Data Mining Techniques

  2. Contents • Query Tools • Statistical Techniques • Visualization Techniques • Case-Based Learning (K-Nearest Neighbor)

  3. Query Tools and Statistical Techniques • 客戶是電信公司最大的資產 • 客戶行為存在於交換機的通話記錄中 • 了解客戶行為成為電信公司的趨勢 • 案例: 推銷電話線路 • 替那些目前線路已經飽和的公司提供更多的電話線路‚ 是持續會有的商機 • 何時客戶會需要額外的連接線路?

  4. 推銷電話線路 交換機的通話記錄 持續時間 轉成總類 對時間排序 統計佔線數

  5. Query Tools and Statistical Techniques Naive Predictions

  6. Query Tools and Statistical Techniques

  7. Query Tools and Statistical Techniques

  8. Query Tools and Statistical Techniques

  9. Query Tools and Statistical Techniques

  10. Query Tools and Statistical Techniques

  11. Visualization Techniques (Scatter Diagram) Music Magazine

  12. Distance between Data Points

  13. K-Nearest Neighbor • Records that are close to each other live in each other’s neighborhood • Customers of the same type (cluster) will show the same behavior • Do as your neighbors do • Not really a learning technique • Disadvantage: • Inefficiency • It is difficult to understand that the performance of k-nearest neighbor is better than naïve prediction r

  14. K-Nearest Neighbor

  15. Result of the K-Nearest Neighbor Process 67.1% 70.2% 55.3% 85.4% 91.9%

  16. 電影推薦

  17. 電影推薦

  18. K-Nearest Neighbors for 0*3*6 • C1: 1 0 0 1 0 0 1 • M1: 0 1 1 1 0 0 1 • Distance = 3 or Similarity = 4 • C1: 1 0 0 1 0 0 1 • M2: 0 1 1 1 0 1 1 • Distance = 4 or Similarity = 3

  19. K-Nearest Neighbors for 0*3*6 If Similarity_Threshold is 6 Then 7 Neighbors (M3, M13, M14, M16, M19, M20, M25) are selected. Similarity

  20. Summarize these 7 Neighbors • Neighbor 1: • 111 134 388 262 261 266 268 012 260 184 238 091 104 142 038 • Neighbor 2: • 240 256 290 441 442 442 510 518 518 520 522 001 005 016 184 • Neighbor 3: • none • Neighbor 4: • 402 193 228 179 227 111 204 364 • Neighbor 5: • 280 • Neighbor 6: • 193 • Neighbor 7: • 186 189 193 214 239 179 227 263 240 Like Movies

  21. Like Movies for 0*3*6 • Count = 03 Movie = 臥虎藏龍 (193) • Count = 02 Movie = 尖峰時刻 (184) • Count = 02 Movie = 蛇眼 (240) • Count = 02 Movie = 美麗人生 (442) • Count = 02 Movie = 厄夜叢林 (518) • Count = 02 Movie = 楚門的世界 (111) • Count = 02 Movie = 全民公敵 (179) • Count = 02 Movie = 神鬼傳奇 (227)

  22. Data Mining Tool & Query Tool • Suppose a large database containing millions of records that describe customers’ purchases • Who bought which product on what date? • What is the average turnover in July? • What is an optimal segmentation of clients? • What are the most important trends in customer behavior? • If you know exactly what you are looking for, use query tool • If you know only vaguely what you are looking for, use data mining tool

  23. Data Mining Tool & Query Tool

More Related