300 likes | 319 Views
Evaluating a clustering solution: An application in the tourism market. Advisor: Dr. Hsu Graduate: Yung-Chu Lin. Outline. Motivation Objective The various paradigms The number of clusters Utility concepts Proposed approach A tourism market application Conclusion. Motivation.
E N D
Evaluating a clustering solution: An application in the tourism market Advisor: Dr. Hsu Graduate: Yung-Chu Lin IDS Lab Seminar
Outline • Motivation • Objective • The various paradigms • The number of clusters • Utility concepts • Proposed approach • A tourism market application • Conclusion IDS Lab Seminar
Motivation • To evaluate a clustering solution IDS Lab Seminar
Objective • Propose a framework for evaluating a clustering solution • Advocate a multimethodological approach IDS Lab Seminar
The various paradigms • Statistical method • Measures of association, association test, Automatic Interaction Detection(AID), Classification and Regression Tree-CART, Discriminant Analysis and Logistic Regression • Machine Learning • Tree Classification algorithm-C4.5 and prepositional rules-CN2 • The conjugation of methodologies sets the stage for dealing with rich and complex problems IDS Lab Seminar
Statistical methodologies • Association between two nominal variables • Cramer Statistic IDS Lab Seminar
Statistical methodologies(cont’d) • Uncertainty Coefficient IDS Lab Seminar
Statistical methodologies(cont’d) • Mutual Information • ANOVA • MANOVA • CART • Discriminant Analysis • Logistic Regression IDS Lab Seminar
Machine learning methodologies • Decision Trees • Provide a hierarchical process and model of classification • Nonbacktracking and greedy optimisation algorithm • Propositional Rules • Provide logic models • Represented by “if condition then cluster” • Neural Networks • Navie Bayes IDS Lab Seminar
The number of clusters • May be set a priori • May be an outcome of the clustering process itself • The best number is obtained by comparing measures of model fit for as alternative numbers of clusters IDS Lab Seminar
The number of clusters(cont’d) • Mixture Model • Akaike Criteria(AIC) IDS Lab Seminar
Utility concepts • The main question in evaluating a clustering a question about utility • Utility is evaluated by judgement IDS Lab Seminar
preprocess Proposed approach IDS Lab Seminar
Proposed approach(cont’d) • The choice of a discriminant and classification methodologies the nature of variables • Regarding discrimination, complementary dimensions offer a new perspective and understanding • An integration of methodologies and techniques based on the Statistical and Machine Learning Paradigms IDS Lab Seminar
A tourism market application • The clustering solution • Evaluation of clustering solution IDS Lab Seminar
Data base • The answers to a questionnaire: Portuguese clients of Pousadas de Portugal • 49 questions 200 variables • 2500 Portuguese clients IDS Lab Seminar
Clustering • Model sample: 1647 clients (65%); Validation sample: 897 clients (35%) • Use a priori and a K-Means procedure • 4 variables expressing the frequency and type of Pousadas • CH, CSUP, C and B type • 3 clusters (First time user, Regular users and Heavy users) • Model: 18%, 60% and 22% • Validation: 16%, 62% and 22% IDS Lab Seminar
Clustering(cont’d) • 2 clusters (Heavy users and Regular users) • Model: 16 Pousadas and 5 Pousadas • Validation: 17 Pousadas and 4 Pousadas IDS Lab Seminar
A tourism market application • The clustering solution • Evaluation of clustering solution IDS Lab Seminar
Evaluation of clustering solution IDS Lab Seminar
Analysis of association between clusters and clustering base • Measure the degree of correction in classification • Model: 82.6% ;Validation: 91.5% • The linear combinations of the clustering base variables that maximise the ratio between-within cluster variation IDS Lab Seminar
Analysis of association between clusters and clustering base(cont’d) IDS Lab Seminar
Analysis of association between clusters and other variables • Chi-square the strength of association between clusters and variables • Rule Induction Procedures discriminate and classify on the base of attributes significantly associated with clusters • Rule induction provide a better comprehension of the facts discriminating the clusters • C4.5 and CN2 evaluate both Model sample and Validation sample IDS Lab Seminar
Analysis of association between clusters and other variables(cont’d) • Memorize a group/beam of the best solutions IDS Lab Seminar
Analysis of association between clusters and other variables(cont’d) IDS Lab Seminar
Analysis of association between clusters and other variables(cont’d) IDS Lab Seminar
Analysis of association between clusters and other variables(cont’d) IDS Lab Seminar
Global evaluation • In Discriminant Analysis and Logistic Regressionclearly the differences between clusters • Chi-square tests association between variables and clusters • C4.5 and CN2 provides a more complex and richer perspective IDS Lab Seminar
Conclusion • Identifying significant associations characterising the clustered entities guided discriminant and classification analysis • Propositional rule induction is suitable for discriminating purposes • Multimethodological approach should consider not only inference but also descriptive analysis IDS Lab Seminar