280 likes | 509 Views
Ordinal Classification. Rob Potharst Erasmus University Rotterdam. What is ordinal classification?. Company: catering service Swift. total liabilities / total assets 1 net income / net worth 3 … … managers’work experience 5 market niche-position 3 … .
E N D
Ordinal Classification Rob Potharst Erasmus University Rotterdam SIKS-Advanced Course on Computational Intelligence, October 2001
What is ordinal classification? SIKS-Advanced Course on Computational Intelligence, October 2001
Company: catering service Swift • total liabilities / total assets 1 • net income / net worth 3 • … … • managers’work experience 5 • market niche-position 3 • … ... bankruptcy risk + (acceptable) SIKS-Advanced Course on Computational Intelligence, October 2001
Data set: 39 companies 2 2 2 2 1 3 5 3 5 4 2 4 + 4 5 2 3 3 3 5 4 5 5 4 5 + 3 5 1 1 2 2 5 3 5 5 3 5 + 2 3 2 1 2 4 5 2 5 4 3 4 + 3 4 3 2 2 2 5 3 5 5 3 5 + 3 5 3 3 3 2 5 3 4 4 3 4 + 3 5 2 3 4 4 5 4 4 5 3 5 + 1 1 4 1 2 3 5 2 4 4 1 4 + 3 4 3 3 2 4 4 2 4 3 1 3 + 3 4 2 1 2 2 4 2 4 4 1 4 + 2 5 1 1 3 4 4 3 4 4 3 4 + 3 3 4 4 3 4 4 2 4 4 1 3 + 1 1 2 1 1 3 4 2 4 4 1 4 + 2 1 1 1 4 3 4 2 4 4 3 3 + 2 3 2 1 1 2 4 4 4 4 2 5 + 2 3 4 3 1 5 4 2 4 3 2 3 + 2 2 2 1 1 4 4 4 4 4 2 4 + 2 1 3 1 1 3 5 2 4 2 1 3 + 2 1 2 1 1 3 4 2 4 4 2 4 + 2 1 2 1 1 5 4 2 4 4 2 4 + 2 1 1 1 1 3 2 2 4 4 2 3 ? 1 1 3 1 2 1 3 4 4 4 3 4 ? 2 1 2 1 1 2 4 3 3 2 1 2 ? 1 1 1 1 1 1 3 2 4 4 2 3 ? 2 2 2 1 1 3 3 2 4 4 2 3 ? 2 2 1 1 1 3 2 2 4 4 2 3 ? 2 1 2 1 1 3 2 2 4 4 2 4 ? 1 1 4 1 3 1 2 2 3 3 1 2 ? 3 4 4 3 2 3 3 4 4 4 3 4 ? 3 1 3 3 1 2 2 3 4 4 2 3 ? 1 1 2 1 1 1 3 3 4 4 2 3 - 3 5 2 1 1 1 3 2 3 4 1 3 - 2 2 1 1 1 1 3 3 3 4 3 4 - 2 1 1 1 1 1 2 2 3 4 3 4 - 1 1 2 1 1 1 3 1 4 3 1 2 - 1 1 3 1 2 1 2 1 3 3 2 3 - 1 1 1 1 1 1 2 2 4 4 2 3 - 1 1 3 1 1 1 1 1 4 3 1 3 - 2 1 1 1 1 1 1 1 2 1 1 2 - 20: + (acceptable) 9: - (unacceptable) 10: ? (uncertain) from: Greco, Matarazzo, Slowinski (1996) SIKS-Advanced Course on Computational Intelligence, October 2001
Possible classifier if man.exp. > 4, then class = ‘+’ if man.exp. < 4 and net.inc/net.worth = 1, then class = ‘-’ all other cases: class = ‘?’ • when applied to dataset of 39: 3 mistakes SIKS-Advanced Course on Computational Intelligence, October 2001
What is classification? The act of assigning objects to classes, using the values of relevant features of those objects • So we need: • objects (individuals, cases), all belonging to some domain • classes, number and kind prescribed • features (attributes, variables) • a classifier (classification function) that assigns a class to any object SIKS-Advanced Course on Computational Intelligence, October 2001
Building classifiers • = induction from a training set of examples: • data without noise • data with noise SIKS-Advanced Course on Computational Intelligence, October 2001
induction-methods (especially from AI world) • decision trees: C4.5, CART (from 1984 on) • neural networks: backpropagation (from1986, with false start from 1974) • rule induction algorithms: CN2 (1989) • newer methods: rough sets, fuzzy methods, decision lists, pattern based methods, etc. SIKS-Advanced Course on Computational Intelligence, October 2001
Decision tree: example man.exp. < 3 y n gen.exp./sales = 1 + y n tot.liab/cashfl = 1 ? y n classifies 37 out of 39 ex’s correctly - ? SIKS-Advanced Course on Computational Intelligence, October 2001
Ordinal classification • features have ordinal scale • classes have ordinal scale • the ordering must be preserved! SIKS-Advanced Course on Computational Intelligence, October 2001
Preservation of ordering A classifier is monotone iff: if A < B, then also class(A) <class(B) SIKS-Advanced Course on Computational Intelligence, October 2001
Relevance of ordinal classification • selection-problems • credit worthiness • pricing (e.g. real estate) • etc. SIKS-Advanced Course on Computational Intelligence, October 2001
Induction of monotone decision trees • using C4.5 or CART: non-monotone trees • needed: an algorithm that guarantees to generate only monotone trees • Makino, Ibaraki, etc. (1996), • only for 2-class problems, cumbersome • Potharst & Bioch (2000) • for k-class problems, fast and efficient SIKS-Advanced Course on Computational Intelligence, October 2001
The algorithm try to split subset T: 1) update D for subset T 2) ifD T is homogeneous then assign class label to T and make T a leaf definitively else split T into two non-empty subsets TL and TR using entropy try to split subset TL try to split subset TR SIKS-Advanced Course on Computational Intelligence, October 2001
The update rule update D for T: 1) if min(T) isnot in Dthen - add min(T) to D - class ( min(T) ) = the maximal value allowed, given D 2) if max(T) is not in Dthen - add max(T) to D - class ( max(T) ) = the minimal value allowed, given D SIKS-Advanced Course on Computational Intelligence, October 2001
The minimal value allowed given D • For each x X \ D it is possible to calculate the minimal and the maximal class value possible, given D. • Let x be the downset { y X | y x } of x • Let y* be an element in D x with highest class value • Then the minimal class value possible for x is class (y*). SIKS-Advanced Course on Computational Intelligence, October 2001
The maximal value allowed given D • Let x be the upset { y X | y x } of x • Let y* be an element in D x with lowest class value • Then the maximal class value possible for x is class (y*) • if there is no such element then take the maximal class value (or the minimal, in the former case) SIKS-Advanced Course on Computational Intelligence, October 2001
Example X: 0 0 0 1 0 0 0 1 0 …. 2 2 2 D: 0 0 1 0 0 0 2 1 1 1 2 2 2 0 2 2 2 1 2 3 attr. 1: values 0,1,2 attr. 2: values 0,1,2 attr. 3: values 0,1,2 classes: 0, 1, 2, 3 Let us calculate the min and max poss value forx = 022: minvalue: y* = 002, so the min-value = 1 maxvalue: there is no y*, so the max-value = 3 SIKS-Advanced Course on Computational Intelligence, October 2001
Tracing the algorithm Try to split subset T = X: update D for X: min(X) = 000 is not in D; maxvalue of 000 is 0 add 000 with class 0 to D max(X) = 222 is not in D; minvalue of 222 is 3 add 222 with class 3 to D D X is not homogeneous so consider all the possible splits: A1 0; A1 1; A2 0; A2 1; A3 0; A3 1 0 0 0 0 0 0 1 0 0 0 2 1 1 1 2 2 2 0 2 2 2 1 2 3 2 2 2 3 SIKS-Advanced Course on Computational Intelligence, October 2001
The entropy of each split The split A1 0splits X into TL = [000,022] and TR= [100,222] D TR 1 1 2 2 2 0 2 2 2 1 2 3 2 2 2 3 Entropy = 1 D TL 0 0 0 0 0 0 1 0 0 0 2 1 Entropy = 0.92 Average entropy of this split = 3/7 x 0.92 + 4/7 x 1 = 0.97 SIKS-Advanced Course on Computational Intelligence, October 2001
Going on with the trace The split with lowest entropy is A1 0, so we go on with T = TL = [000,022]: Try to split subset T= [000,022]: update D for T: min(T) = 000 is already in D max(T) = 022 has minimum value 1, so it is added to D 0 0 0 0 0 0 1 0 0 0 2 1 0 2 2 1 1 1 2 2 2 0 2 2 2 1 2 3 2 2 2 3 D T is not homogeneous, so we go on to consider the following splits: A2 0; A2 1; A3 0; A3 1 Lowest entropy SIKS-Advanced Course on Computational Intelligence, October 2001
We now have the following tree: A1 0 A3 1 ? ? ? SIKS-Advanced Course on Computational Intelligence, October 2001
Going on... The split A3 1splits T into TL = [000,021] and TR= [002,022] We go on with T = TL = [000,021] Try to split subset T= [000,021]: min(T) = 000 is already in D max(T) = 021 has minimum value 0, so it is added to D D T is homogeneous, so we stop and make T into a leaf with class value 0 Next, we go on with T = TR = [002,022], etc. SIKS-Advanced Course on Computational Intelligence, October 2001
Finally... A1 0 A1 1 A3 1 A2 0 2 0 1 2 3 SIKS-Advanced Course on Computational Intelligence, October 2001
A monotone tree for the Bankruptcy problem • can be seen on p. 107 of the paper that was handed out with this course • a tree with 6 leaves • uses the same attributes as those that come up with an ordinal version of the rough set approach: see Viara Popova’s lecture SIKS-Advanced Course on Computational Intelligence, October 2001
Conclusions and remaining problems • We described an efficient algorithm for the induction of monotone decision trees, in case we have a monotone dataset • We also have an algorithm to repair a non-monotone decision tree, but it makes the tree larger • What if we have noise in the dataset? • Is it possible to repair by pruning? SIKS-Advanced Course on Computational Intelligence, October 2001