470 likes | 700 Views
Compound classification and stream data analysis. Michał Woźniak Department of Systems and Computer Networks. Machine Learning Team. http://www.kssk.pwr.wroc.pl/machine-learning-team/1469/?lang=en. Machine learning and data mining Methods of improving and stabilizing weak classifiers
E N D
Compound classification and stream data analysis Michał Woźniak Department of Systems and Computer Networks
Machine Learning Team http://www.kssk.pwr.wroc.pl/machine-learning-team/1469/?lang=en • Machine learning and data mining • Methods of improving and stabilizing weak classifiers • Hybrid and compound classification • Information fusion and combined classifiers (multiple classifier systems, classifier ensembles) • Big data analytics • Data stream classification and concept drift (novelty) detection • One-class classification • Imbalanced data analysis • Active learning • Distributed and parallel computing systems for data mining • Applications to real-life problems Compound classification and stream data analysis
Classification We have been using computers for automatic classification for many years e.g., speach recognition, fingerprints, DNA sequences, etc. Compound classification and stream data analysis
Hybrid classification • Wolpert's theorem, there is not a single pattern recognition algorithm appropriate for all the tasks we deal with, as each classifier has its own domain of competence. • If it is not possible to generate a universal classifier, perhaps we could use the valuable components and ideas behind classifiers around us. Compound classification and stream data analysis
Hybrid classification • Hybrid classifier means a classifier system which merges different components of individual classifiers to exploit their strengths and improve the performance of the hybrid classifier. Compound classification and stream data analysis
Hybridization levels • Use different (usually distributed) data sources for classifier training • Apply different data types and knowledge representations to merge them into one unified representation • Use trained models but take additional knowledge into consideration, e.g., additional constrains • Use trained models to achieve the common decision based on combined classifier approaches Compound classification and stream data analysis
Data and knowledge hybridization • Data and knowledge consistency, i.e., consistency of the knowledge representation and consistency knowledge between data knowledge. • Data privacy, especially in the case of decision making on the basis of distributed data. • Data and knowledge unification. • Cost of data acquisition, especially in the case if the maximum cost limit is fixed for a given task. Compound classification and stream data analysis
Classifier hybridization • Forming valuable ensemble of classifier based on diversity measures and cost of classifier exploitation. • Developing combination rules based on voting. • Developing common support functions, e.g., based on simple averaging, trained combination rules or parametric probability estimation. Compound classification and stream data analysis
Classifier ensemble Compound classification and stream data analysis
Classifier ensemble -motivations • Avoiding the selection of the worst classifier. • Improving the performance of the best individual classifier. • Computational complexity. • Distributed computational environment. • No free lunch. Compound classification and stream data analysis
Classifier ensemble - architecture Compound classification and stream data analysis
Classifier ensemble - topology Compound classification and stream data analysis
Classifier ensemble - diversity An ideal ensembleconsists of classifiers with high accuracy and high diversity, i.e., mutually complementary. Compound classification and stream data analysis
Classifier ensemble - diversity • Manipulateinput • Manipulate model • Manipulateoutput Compound classification and stream data analysis
Classifier ensemble - diversity It is hard to saywhatitmeans, but we desire to measureit, because "To measure is to know." Lord Kelvin (1824-1904) Compound classification and stream data analysis
Ensemble pruning • It is obvious that more does not mean better, especially in the case of combined classifiers. • Zhou et al. {Zhou:2002} presented an appropriate analysis for regression problems, where they formulated condition once removing one model from ensemble has the positive impact for the ensemble performance. Compound classification and stream data analysis
Ensemble pruning • Ranking-based methods • Clustering-based methods • Optimization-based pruning Compound classification and stream data analysis
Combination rule Voting, Behaviour Knowledge Space, Stacking sum, min, max, median, product, weightedaverage, ranking (Bordacount), mixture of expert, decisiontemplate Compound classification and stream data analysis
Data stream classification • The market-leading companies realize that smart analytic tools which are capable of analyzing the collected and fast-growing data could lead to business success. • In designing such solutions, we have to seriously consider that in the modern world most of the data arrive continuously, and it causes that the analytic tools should realize the relevant nature and be able to interpret so-called data streams. Compound classification and stream data analysis
Data stream classification Most of the traditional classifier design methods do not take the following points into consideration: • The statistical dependencies between the observations of the given objects and their classifications could change • Data can come flooding in the analyzer, what causes that it is impossible to label all records Compound classification and stream data analysis
Concept drift Appearance of concept drift can potentially cause a significant accuracy deterioration of an exploiting classifier Compound classification and stream data analysis
Concept drift Compound classification and stream data analysis
Concept drift The following approaches can be considered to deal with the above problem. • Rebuilding a classification model if new data becomes available. It is very expensive and impossible from a practical point of view, and especially for which the concept drift occurs rapidly. • Detecting concept changes in new data, and if these changes are sufficiently significant then rebuilding the classifier. • Adopting an incremental learning algorithm for the classification model. Compound classification and stream data analysis
Concept drift The first algorithms designed to deal with drifting data were • STAGGER proposed by Schlimmer and Granger, • IB3 proposed by Aha, • the suite of FLORA algorithms by Widmer and Kubat. Since a plethora of solutions have been proposed, the growing interest in this domain has resulted in an increasing number of publications Compound classification and stream data analysis
Concept drift We can divide these algorithms into four main groups: • Online learners • Instance based solutions (also called sliding window based solutions) • Ensemble approaches • Drift detection algorithms Compound classification and stream data analysis
Concept drift – on line learners This group relates to the family of algorithms that continuously update the classifier parameters while processing the incoming data. • Each object must be processed only once in the course of training. • The system should consume only limited memory and processing time, irrespective of the execution time and amount of data processed. • The training process can be paused at any time, and its accuracy should not be lower than that of a classifier trained on batch data collected up to the given time. Compound classification and stream data analysis
Concept drift – on line learners • Classifiers that fulfill these requirements work very fast and can adapt their model in a very flexible manner. • Among the others, the following are the most popular online learners: Naϊve Bayes, Neural Networks, and Nearest Neighbour. • A more sophisticated solution CVFDT (Concept-adaptingVery Fast DecisionTree)Hulten. • Selected online learners have been incorporated into the Massive Online Analysis framework (Bifet:2011) (MOA). Compound classification and stream data analysis
Concept drift – sliding windows • This group consists of algorithms that incorporate the forgetting mechanism. • This approach is based on the assumption that the recently arrived data are the most relevant, because they contain characteristics of the current context. • However, their relevance diminishes with the passage of time. Compound classification and stream data analysis
Concept drift – sliding windows • Therefore, narrowing the range of data to those that were most recently read may help form a dataset that embodies the actual context. • There are three possible strategies here: • selecting the instances by means of a sliding window that cuts off older instances (Widmer:1996); • weighting the data according to their relevance; • applying bagging and boosting algorithms that focus on misclassified instances (Bifet:2009, Chu:2004}. Compound classification and stream data analysis
Concept drift – sliding windows • When dealing with the sliding window the main question is how to adjust the window size. • A shorter window allows focusing on the emerging context, though data may not be representative for a longer lasting context. • A wider window may result in mixing the instances representing different contexts. Compound classification and stream data analysis
Concept drift – sliding windows • Therefore, certain advanced algorithms adjust the window size dynamically depending on the detected state (e.g., FLORA2 {Widmer:1996} and ADWIN2 {Bifet:2007}). • In more sophisticated algorithms, multiple windows may even be used {Lazarescu:2004}. • In object weighting algorithms the relevance of the instance is used to calculate its weight, which is usually inversely proportional to the time that has passed since the instance was read {Klinkenberg:1998,Koychev:2000}. Compound classification and stream data analysis
Concept drift – ensemble approach • It consists of algorithms that incorporate a set of elementary classifiers {Wang:2003,Stanley:2003,Tsymbal:2008}. • It has been shown that a collective decision can increase classification accuracy because the knowledge that is distributed among the classifiers may be more comprehensive. This premise is true if the set consists of diverse members {Shipp:2002}. • In static environments, diversity may refer to the classifier model, the feature set, or the instances used in training Compound classification and stream data analysis
Concept drift – ensemble approach • Dynamic combiners, where individual classifiers are trained in advance and their relevance to the current context is evaluated dynamically while processing subsequent data. The drawback of this approach is that all contexts must be available in advance; emergence of new unknown contexts may result in a lack of experts. • Updating the ensemble members, where each ensemble consists of online classifiers that are updated incrementally on the incoming data. • Dynamic changing line-up of ensemble e.g., individual classifiers are evaluated dynamically and the worst one is replaced by a new one trained on the most recent data. Compound classification and stream data analysis
Concept drift – ensemble approach • Among the most popular ensemble approaches, the following are worth noting: • the Streaming Ensemble Algorithm (SEA) {Street:2001} • the Accuracy Weighted Ensemble (AWE) {Wang:2003}. • Both algorithms keep a fixed-size set of classifiers. Incoming data are collected in data chunks, which are used to train new classifiers. • Dynamic Weighted Majority (DWM) algorithm {Kolter:2003} modifies the weights and updates the ensemble in a more flexible manner - the weight of the classifier is reduced when the classifier makes an incorrect decision. Compound classification and stream data analysis
Concept drift - detection • Not all classification algorithms dealing with concept drift, require drift detection. Some evolving systems continuously adjust the model to incoming data {Zliobaite:2010}. • This technique is called implicit drift detection {Kuncheva:2008} as opposed to explicit drift detection methods that raise a signal to indicate change. Compound classification and stream data analysis
Concept drift - detection • The detector can be based on changes in the probability distribution of the instances {Gaber:2006, Salganicoff:1993} or classification accuracy {Baena-Garcia:2006}. • Many detection algorithms base on a knowledge of object labels after the classification in order to detect concept drift, however as pointed out in {Zliobaite:2010}, such approach does not fit in the real scenarios. Compound classification and stream data analysis
Concept drift - detection • Concept drift detection algorithms can be divided into three types, depending on the assumption about the amount of costly knowledge regarding the true class labels available for the algorithm. Compound classification and stream data analysis
Concept drift - detection Data model Data model Classification system Classification system Sample Sample Sample Sample Sample Sample Sample Sample Sample Sample Sample Sample Detector Detector Sample Sample Sample Sample Sample Sample Expert Compound classification and stream data analysis
Concept drift - detection Compound classification and stream data analysis
Concept drift - detection Compound classification and stream data analysis
Concept drift - detection Data model Data model Detector Detector Classification system System klasyfikujący Classification system Sample Sample Sample Sample Sample Sample Sample Sample Sample Sample Sample Sample Sample Sample Sample Próbka Próbka Próbka Próbka Próbka Próbka Próbka Próbka Compound classification and stream data analysis
Team’s recent works Classifier ensemble • Wozniak M., Grana M., Corchado E., A Survey of Multiple Classifier Systems as Hybrid Systems, Information Fusion (http://dx.doi.org/10.1016/j.inffus.2013.04.006) • Krawczyk B., Wozniak M., Diversity measures for one-class classifier ensembles, Neurocomputing(http://dx.doi.org/10.1016/j.neucom.2013.01.053) • Krawczyk B., Filipczuk P., Wozniak M., and Obuchowicz A., Diversity-Based Classifier Selection for Breast Cancer Cytological Image Analysis, Biomedical Engineering: Applications, Basis and Communications (in press) • Krawczyk B., Wozniak M., Schaefer G., Cost-sensitive decision tree ensembles for effective imbalanced classification, Applied Soft Computing (http://dx.doi.org/10.1016/j.asoc.2013.08.014) • Jackowski K., Krawczyk B., Wozniak M., Application of Adaptive Splitting and Selection Classifier to the SPAM Filtering Problem, Cybernetics and Systems. Volume 44, Issue 6-7, 2013, 569-588. • Filipczuk P., Krawczyk B., Wozniak M., Classifier Ensemble for an Effective Cytological Image Analysis, Pattern Recognition Letters, Volume 34, Issue 14, 15 October 2013, 1748–1757. • Wozniak M., Krawczyk B., Combined Classifier Based on Feature Space Partitioning, International Journal of Applied Mathematics and Computer Science (AMCS), Vol. 22, No. 4, 2012, 855–866. • Kurzynski M., Wozniak M., Combining classifiers Under Probabilistic Model – Experimental Comparative Analysis of Methods, Expert Systems Volume 29, Issue 4, 2012, 374-393. Compound classification and stream data analysis
Team’s recent works Data streamclassification • Sobolewski P., Wozniak M., Concept drift detection and model selection with simulated recurrence and ensembles of statistical detectors, Journal of Universal Computer Science, vol. 19, no. 4 (2013), 462-483. • Cal P., Wozniak M., Parallel Hoeffding Decision Tree for Streaming Data, Advances in Intelligent Systems and Computing, Volume 217, 2013, pp 27-35. • Jackowski K., Fixed-size ensemble classifier system evolutionarily adapted to a recurring context with an unlimited pool of classifiers, Pattern Analysis and Applications, http://dx.doi.org/10.1007/s10044-013-0318-x • Sobolewski P., Wozniak M., Comparable Study of Statistical Tests for Virtual Concept Drift Detection, R. Burduk et al. (Eds.): CORES 2013, AISC 226, Springer, 2013, pp. 329–337. • Kurlej B., Wozniak M., Active Learning Approach to Concept Drift Problem, Logic Journal of IGPL, Vol.20, no.1, 2012, 550-559. • Sobolewski P.,Wozniak M., Data with shifting concept classification using simulated recurrence, Lecture Notes in Computer Science. 2012, vol. 7196, 403-412. • Cal P., Wozniak M., Drift detection and model selection algorithms: concept and experimental evaluation. Lecture Notes in Computer Science. 2012, vol. 7209, 558-568. • Wozniak M., A Hybrid Decision Tree Training Method using Data Streams,Knowledge and Information Systems (KAIS) journal, Vol.29, no.2,2011. Compound classification and stream data analysis
ENGINE connections • Cooperation with the teams from Defl, Salamanca, Fontainebleau, Granada (proposed) under the framework of WT2.4, WT2.8, WT2.10 • Organizing the scientific events: • CORES Conference (WT3.2) • Workshop on Nonstationary Models of Pattern Recognition and Classifier Combinations (WT3.7) • Workshop on Machine Learning in Life Sciences (WT3.8) • Workshop on Solving Classification Problems Embedded in the Nature of Data (WT3.9) • Particiation in conferences (WT3.14) • Hiring expert in Compound Pattern Recognition (9 M) and post doc in Compound Pattern Recognition (9 M) (WT4.6) • Upgrade of Distributed Computing and Data Mining Laboratory (WT5.2) Compound classification and stream data analysis
Expected outcome • Conduct cutting edge, high impact research that shapes the field of ensemble techniques and data streamanalysis. • Increase of the research expertise of the ENGINE team. • Increasing ENGINE team visibilitybu the boostingcooperation with the foreingteams and increasenumber of published and citedworks. Compound classification and stream data analysis