1 / 23

Djamel A. Zighed and Nicolas Nicoloyannis ERIC Laboratory University of Lyon 2 (France)

Djamel A. Zighed and Nicolas Nicoloyannis ERIC Laboratory University of Lyon 2 (France) zighed@univ-lyon2.fr. Prague Sept. 04. About Computer science dep. In Lyon, there are 3 universities, 100000 students Lumière university Lyon 2, has 22000 students,

margot
Download Presentation

Djamel A. Zighed and Nicolas Nicoloyannis ERIC Laboratory University of Lyon 2 (France)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Djamel A. Zighed and Nicolas Nicoloyannis ERIC Laboratory University of Lyon 2 (France) zighed@univ-lyon2.fr Prague Sept. 04

  2. About Computer science dep. • In Lyon, there are 3 universities, 100000 students • Lumière university Lyon 2, has 22000 students, • Lyon 2, is mainly a liberal art university • The faculty of economic has tree departments, among them the computer science one • We belong to this department • We have Bachelor, Master and PhD programs for 300 students

  3. ERIC Lab at the University Faculties of university of Lyon 2 Economic Sociology Linguistic Law Research centers of the university ERIC Knowledge Engineering Research Center - The budget of ERIC doesn’t depend from the university, it’s given par The national ministry of education - We have a large autonomy in decision making

  4. ERIC Lab • Born in 1995, • 11 professors (N. Nicoloyannis, director) • 15 PhD Students • Grants+contracts+WK+…=200K€/year • Research topics • Data mining (theory, tools and applications) • Data warehouse management (T,T,A)

  5. Theory Induction graphs Learning and classification Tools SIPINA : Plate form for data mining Applications Medical fields Chemical applications Human science … Data Mining (T,T,A) Data mining TTA for complex data

  6. Data mining on complex data • An example : Breast cancer diagnosis

  7. Association measure : It measures the strength of the relationship betweenX and Y Contingency table Motivations

  8. Motivations Association measure : It measures the strength of the relationship betweenX and Y Contingency table

  9. Motivations Association measure : It measures the strength of the relationship betweenX and Y Contingency table

  10. Motivations Association measure : It measures the strength of the relationship betweenX and Y According to a specific association measure, may we improve the strength of the relationship by merging some rows and/or some columns ? Contingency table

  11. Association measure : It measures the strength of the relationship betweenX and Y Motivations According to a specific association measure, may we improve the strength of the relation ship by merging some rows and/or some columns ? Contingency table

  12. An example

  13. For the preceding example the maximization of the Tschuprow’s t gives Goal: Find the groupings that maximize the association between attributes Yes, we can improve the association by reducing the size of the contingency table

  14. Extension According to a specific association measure, may we find the optimal reduced contingency table ? Contingency table

  15. ( ) ( ) P W T : The set of all partitions brought about over X X ( ) ( ) P W T : The set of all partitions brought about over Y Y ( ) ( ) P P # T : the size of the set T X X ( ) ( ) P P # T : the size of the set T Y Y The number of cases we have to check is ( ) ( ) P P l = ´ # T # T X Y Optimal solution (exhaustive search) Goal:Find the best cross partition on T

  16. Optimal solution (exhaustive search)

  17. Optimal solution (exhaustive search) According to a specific association measure, may we find the optimal reduced contingency table ? Yes, but the solution is intractable in real word because of the high time complexity

  18. Heuristic Proceed successively to the grouping of 2 (row or column) values that maximizes the increase in the association criteria.

  19. Complexity

  20. Simulation Goal:How far is the quasi-optimal solution from the true optimum? Comparison tractable for tables not greater than 6 × 6. Simulation Design Randomly generate 200 tables Analysis of the distribution of the deviations between optima and quasi-optima. Generating the Tables 10000 cases distributed in the cxr cells of the table with an uniform distribution (worst case).

  21. Quasi-optimal solution

  22. Quasi-optimal solution

  23. Conclusion • Implementation for new approach induction decision tree. • Zighed, D.A., Ritschard, G., W. Erray and V.-M. Scuturici (2003), Abogodaï,a New approach for Decision Trees, in Lavrac, N., D.Gamberger, L. Todorovski and H. Blockeel (eds), Knowledge Discovery in databases: PKDD 2003 , LNAI 2838, Berlin: Springer, 495--506. • Zighed D. A., Ritschard G., Erray W., Scuturici V.-M. (2003), Decision tree with optimal join partitioning, To appear in Journal of Information Intelligent Systems, Kluwer (2004). • Divisive top-down approach • Extension to multidimensionnal case

More Related