1 / 20

PKDD Discovery Challenge (not only) on Financial Data

PKDD Discovery Challenge (not only) on Financial Data. Petr Berka Laboratory for Intelligent Systems University of Economics, Prague berka@vse.cz. Cups, Challenges, Competitions. KDD Cups (since 1997) KDD Sisyphus at ECML 1998 PKDD Discovery Challenges (since 1999) COIL Competition 2000

tymon
Download Presentation

PKDD Discovery Challenge (not only) on Financial Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PKDD Discovery Challenge(not only)on Financial Data Petr Berka Laboratory for Intelligent Systems University of Economics, Prague berka@vse.cz

  2. Cups, Challenges, Competitions • KDD Cups (since 1997) • KDD Sisyphus at ECML 1998 • PKDD Discovery Challenges (since 1999) • COIL Competition 2000 • PAKDD Challenge 2000 • PT Challenge 2000, 2001 • JSAI KDD Challenge 2001 • EUNITE Competition 2001, 2002 • . . . DMLL Workshop, ICML 2002 Petr Berka, LISp, 2002

  3. PKDD Discovery Challenge Idea • Realistic data mining conditions • collaborative rather then competitive nature • rather vague specification of the problem • Differences to real KDD projects • short time for analysis (2-3 months) • only indirect access to domain and data experts during KDD process DMLL Workshop, ICML 2002 Petr Berka, LISp, 2002

  4. Challenge Settings • Data and their full description available on the web for all participants • Submissions evaluated by domain experts (but no ordering, no winners and losers) • Workshop at PKDD to present the results and discus them with domain experts • Results and comments of experts available on the web (after the workshop) DMLL Workshop, ICML 2002 Petr Berka, LISp, 2002

  5. PKDD Challenges http://lisp.vse.cz/challenge • 1999, Prague • financial data, thrombosis data • 2000, Lyon • financial data, modified thrombosis data • 2001, Freiburg • modified thrombosis data • 2002, Helsinki • atherosclerosis data, hepatitis data DMLL Workshop, ICML 2002 Petr Berka, LISp, 2002

  6. Financial Challenge Background • Czech bank offering private accounts • Available data for pilot study (29000 clients) • personal characteristics • basic info about accounts • transactions for three months • Proposed tasks • segmentation (defining different types of clients w.r.t. debt) • early detection of debts DMLL Workshop, ICML 2002 Petr Berka, LISp, 2002

  7. Financial Challenge Data DMLL Workshop, ICML 2002 Petr Berka, LISp, 2002

  8. Contributions • Method oriented • show a method/system working on the data • Problem oriented (prototype solutions) • loan and/or credit cards description • loan and/or credit cards classification • initial exploration • relation between branches • clients segmentation DMLL Workshop, ICML 2002 Petr Berka, LISp, 2002

  9. Description of loans • Relations between loan category and account characteristics [Coufal et al, 1999 - GUHA] [Mikšovský et al, 1999 - EXCEL] DMLL Workshop, ICML 2002 Petr Berka, LISp, 2002

  10. Classification of loans • Detecting risky clients before they are granted a loan [Mikšovský et al, 1999 - C5.0] • decision tree to find the relevance of attributes • decision tree for classification (using misclassification costs) DMLL Workshop, ICML 2002 Petr Berka, LISp, 2002

  11. Credit Cards Promotion • Description - find characteristics of a card holder • deviation detection • Classification - predict score for „card value“ • k-nearest neighbour [Putten, 1999] DMLL Workshop, ICML 2002 Petr Berka, LISp, 2002

  12. Description - segmentation of clients according to transactions [Hotho, Meadche, 2000] Kohonen map + decision trees Rule #1 for Cluster 3 If ATTR5 > 9945 and ATTR13 > 0 Then -> Cluster 3 (115, 0.983) Clients Segmentation DMLL Workshop, ICML 2002 Petr Berka, LISp, 2002

  13. Challenge Organizing Lessons • To get and prepare real data is difficult • The time for analyzes should be as long as possible • The response rate was rather low (~ 10%) • No synergy effect observed DMLL Workshop, ICML 2002 Petr Berka, LISp, 2002

  14. DM Lessons (1/4) • Cooperate with experts • domain experts • data experts • . . . • … and with users DMLL Workshop, ICML 2002 Petr Berka, LISp, 2002

  15. DM Lessons (2/4) • Use knowledge intensive preprocessing methods • … • compute age and sex from birth_number • set flags for different types of operations • compute monthly characteristics of transactions (sum, avg, min, max) lbalance = 1/30 ibalance(i) days(i). • … DMLL Workshop, ICML 2002 Petr Berka, LISp, 2002

  16. DM Lessons (3/4) • Make the results understandable [Werner, Fogarty 2001] DMLL Workshop, ICML 2002 Petr Berka, LISp, 2002

  17. DM Lessons (4/4) • Show some (even preliminary) results soon • experts are interested in solutions not in applying sophisticated methods DMLL Workshop, ICML 2002 Petr Berka, LISp, 2002

  18. Discovery Challenge Benefits • Experts • deeper insight into the data • Participants • experience with analyzing large real data • motivations for further research • ML/KDD Community • prototype tasks/solutions(like the MiningMart project?) • Organizators • … invitation to DMLL Workshop :-) DMLL Workshop, ICML 2002 Petr Berka, LISp, 2002

  19. Thank You DMLL Workshop, ICML 2002 Petr Berka, LISp, 2002

  20. Contributions DMLL Workshop, ICML 2002 Petr Berka, LISp, 2002

More Related