1 / 29

Evaluating a clustering solution: An application in the tourism market

Evaluating a clustering solution: An application in the tourism market. Advisor: Dr. Hsu Graduate: Yung-Chu Lin. Outline. Motivation Objective The various paradigms The number of clusters Utility concepts Proposed approach A tourism market application Conclusion. Motivation.

janicewhite
Download Presentation

Evaluating a clustering solution: An application in the tourism market

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Evaluating a clustering solution: An application in the tourism market Advisor: Dr. Hsu Graduate: Yung-Chu Lin IDS Lab Seminar

  2. Outline • Motivation • Objective • The various paradigms • The number of clusters • Utility concepts • Proposed approach • A tourism market application • Conclusion IDS Lab Seminar

  3. Motivation • To evaluate a clustering solution IDS Lab Seminar

  4. Objective • Propose a framework for evaluating a clustering solution • Advocate a multimethodological approach IDS Lab Seminar

  5. The various paradigms • Statistical method • Measures of association, association test, Automatic Interaction Detection(AID), Classification and Regression Tree-CART, Discriminant Analysis and Logistic Regression • Machine Learning • Tree Classification algorithm-C4.5 and prepositional rules-CN2 • The conjugation of methodologies sets the stage for dealing with rich and complex problems IDS Lab Seminar

  6. Statistical methodologies • Association between two nominal variables • Cramer Statistic IDS Lab Seminar

  7. Statistical methodologies(cont’d) • Uncertainty Coefficient IDS Lab Seminar

  8. Statistical methodologies(cont’d) • Mutual Information • ANOVA • MANOVA • CART • Discriminant Analysis • Logistic Regression IDS Lab Seminar

  9. Machine learning methodologies • Decision Trees • Provide a hierarchical process and model of classification • Nonbacktracking and greedy optimisation algorithm • Propositional Rules • Provide logic models • Represented by “if condition then cluster” • Neural Networks • Navie Bayes IDS Lab Seminar

  10. The number of clusters • May be set a priori • May be an outcome of the clustering process itself • The best number is obtained by comparing measures of model fit for as alternative numbers of clusters IDS Lab Seminar

  11. The number of clusters(cont’d) • Mixture Model • Akaike Criteria(AIC) IDS Lab Seminar

  12. Utility concepts • The main question in evaluating a clustering  a question about utility • Utility is evaluated by judgement IDS Lab Seminar

  13. preprocess Proposed approach IDS Lab Seminar

  14. Proposed approach(cont’d) • The choice of a discriminant and classification methodologies  the nature of variables • Regarding discrimination, complementary dimensions offer a new perspective and understanding • An integration of methodologies and techniques based on the Statistical and Machine Learning Paradigms IDS Lab Seminar

  15. A tourism market application • The clustering solution • Evaluation of clustering solution IDS Lab Seminar

  16. Data base • The answers to a questionnaire: Portuguese clients of Pousadas de Portugal • 49 questions  200 variables • 2500 Portuguese clients IDS Lab Seminar

  17. Clustering • Model sample: 1647 clients (65%); Validation sample: 897 clients (35%) • Use a priori and a K-Means procedure • 4 variables expressing the frequency and type of Pousadas • CH, CSUP, C and B type • 3 clusters (First time user, Regular users and Heavy users) • Model: 18%, 60% and 22% • Validation: 16%, 62% and 22% IDS Lab Seminar

  18. Clustering(cont’d) • 2 clusters (Heavy users and Regular users) • Model: 16 Pousadas and 5 Pousadas • Validation: 17 Pousadas and 4 Pousadas IDS Lab Seminar

  19. A tourism market application • The clustering solution • Evaluation of clustering solution IDS Lab Seminar

  20. Evaluation of clustering solution IDS Lab Seminar

  21. Analysis of association between clusters and clustering base • Measure the degree of correction in classification • Model: 82.6% ;Validation: 91.5% • The linear combinations of the clustering base variables that maximise the ratio between-within cluster variation IDS Lab Seminar

  22. Analysis of association between clusters and clustering base(cont’d) IDS Lab Seminar

  23. Analysis of association between clusters and other variables • Chi-square the strength of association between clusters and variables • Rule Induction Procedures discriminate and classify on the base of attributes significantly associated with clusters • Rule induction provide a better comprehension of the facts discriminating the clusters • C4.5 and CN2 evaluate both Model sample and Validation sample IDS Lab Seminar

  24. Analysis of association between clusters and other variables(cont’d) • Memorize a group/beam of the best solutions IDS Lab Seminar

  25. Analysis of association between clusters and other variables(cont’d) IDS Lab Seminar

  26. Analysis of association between clusters and other variables(cont’d) IDS Lab Seminar

  27. Analysis of association between clusters and other variables(cont’d) IDS Lab Seminar

  28. Global evaluation • In Discriminant Analysis and Logistic Regressionclearly the differences between clusters • Chi-square tests association between variables and clusters • C4.5 and CN2 provides a more complex and richer perspective IDS Lab Seminar

  29. Conclusion • Identifying significant associations characterising the clustered entities guided discriminant and classification analysis • Propositional rule induction is suitable for discriminating purposes • Multimethodological approach should consider not only inference but also descriptive analysis IDS Lab Seminar

More Related