Ordinal Classification

Ordinal Classification Rob Potharst Erasmus University Rotterdam SIKS-Advanced Course on Computational Intelligence, October 2001

What is ordinal classification? SIKS-Advanced Course on Computational Intelligence, October 2001

Company: catering service Swift • total liabilities / total assets 1 • net income / net worth 3 • … … • managers’work experience 5 • market niche-position 3 • … ... bankruptcy risk + (acceptable) SIKS-Advanced Course on Computational Intelligence, October 2001

Data set: 39 companies 2 2 2 2 1 3 5 3 5 4 2 4 + 4 5 2 3 3 3 5 4 5 5 4 5 + 3 5 1 1 2 2 5 3 5 5 3 5 + 2 3 2 1 2 4 5 2 5 4 3 4 + 3 4 3 2 2 2 5 3 5 5 3 5 + 3 5 3 3 3 2 5 3 4 4 3 4 + 3 5 2 3 4 4 5 4 4 5 3 5 + 1 1 4 1 2 3 5 2 4 4 1 4 + 3 4 3 3 2 4 4 2 4 3 1 3 + 3 4 2 1 2 2 4 2 4 4 1 4 + 2 5 1 1 3 4 4 3 4 4 3 4 + 3 3 4 4 3 4 4 2 4 4 1 3 + 1 1 2 1 1 3 4 2 4 4 1 4 + 2 1 1 1 4 3 4 2 4 4 3 3 + 2 3 2 1 1 2 4 4 4 4 2 5 + 2 3 4 3 1 5 4 2 4 3 2 3 + 2 2 2 1 1 4 4 4 4 4 2 4 + 2 1 3 1 1 3 5 2 4 2 1 3 + 2 1 2 1 1 3 4 2 4 4 2 4 + 2 1 2 1 1 5 4 2 4 4 2 4 + 2 1 1 1 1 3 2 2 4 4 2 3 ? 1 1 3 1 2 1 3 4 4 4 3 4 ? 2 1 2 1 1 2 4 3 3 2 1 2 ? 1 1 1 1 1 1 3 2 4 4 2 3 ? 2 2 2 1 1 3 3 2 4 4 2 3 ? 2 2 1 1 1 3 2 2 4 4 2 3 ? 2 1 2 1 1 3 2 2 4 4 2 4 ? 1 1 4 1 3 1 2 2 3 3 1 2 ? 3 4 4 3 2 3 3 4 4 4 3 4 ? 3 1 3 3 1 2 2 3 4 4 2 3 ? 1 1 2 1 1 1 3 3 4 4 2 3 - 3 5 2 1 1 1 3 2 3 4 1 3 - 2 2 1 1 1 1 3 3 3 4 3 4 - 2 1 1 1 1 1 2 2 3 4 3 4 - 1 1 2 1 1 1 3 1 4 3 1 2 - 1 1 3 1 2 1 2 1 3 3 2 3 - 1 1 1 1 1 1 2 2 4 4 2 3 - 1 1 3 1 1 1 1 1 4 3 1 3 - 2 1 1 1 1 1 1 1 2 1 1 2 - 20: + (acceptable) 9: - (unacceptable) 10: ? (uncertain) from: Greco, Matarazzo, Slowinski (1996) SIKS-Advanced Course on Computational Intelligence, October 2001

Possible classifier if man.exp. > 4, then class = ‘+’ if man.exp. < 4 and net.inc/net.worth = 1, then class = ‘-’ all other cases: class = ‘?’ • when applied to dataset of 39: 3 mistakes SIKS-Advanced Course on Computational Intelligence, October 2001

What is classification? The act of assigning objects to classes, using the values of relevant features of those objects • So we need: • objects (individuals, cases), all belonging to some domain • classes, number and kind prescribed • features (attributes, variables) • a classifier (classification function) that assigns a class to any object SIKS-Advanced Course on Computational Intelligence, October 2001

Building classifiers • = induction from a training set of examples: • data without noise • data with noise SIKS-Advanced Course on Computational Intelligence, October 2001

induction-methods (especially from AI world) • decision trees: C4.5, CART (from 1984 on) • neural networks: backpropagation (from1986, with false start from 1974) • rule induction algorithms: CN2 (1989) • newer methods: rough sets, fuzzy methods, decision lists, pattern based methods, etc. SIKS-Advanced Course on Computational Intelligence, October 2001

Decision tree: example man.exp. < 3 y n gen.exp./sales = 1 + y n tot.liab/cashfl = 1 ? y n classifies 37 out of 39 ex’s correctly - ? SIKS-Advanced Course on Computational Intelligence, October 2001

Ordinal classification • features have ordinal scale • classes have ordinal scale • the ordering must be preserved! SIKS-Advanced Course on Computational Intelligence, October 2001

Preservation of ordering A classifier is monotone iff: if A < B, then also class(A) <class(B) SIKS-Advanced Course on Computational Intelligence, October 2001

Relevance of ordinal classification • selection-problems • credit worthiness • pricing (e.g. real estate) • etc. SIKS-Advanced Course on Computational Intelligence, October 2001

Induction of monotone decision trees • using C4.5 or CART: non-monotone trees • needed: an algorithm that guarantees to generate only monotone trees • Makino, Ibaraki, etc. (1996), • only for 2-class problems, cumbersome • Potharst & Bioch (2000) • for k-class problems, fast and efficient SIKS-Advanced Course on Computational Intelligence, October 2001

The algorithm try to split subset T: 1) update D for subset T 2) ifD  T is homogeneous then assign class label to T and make T a leaf definitively else split T into two non-empty subsets TL and TR using entropy try to split subset TL try to split subset TR SIKS-Advanced Course on Computational Intelligence, October 2001

The update rule update D for T: 1) if min(T) isnot in Dthen - add min(T) to D - class ( min(T) ) = the maximal value allowed, given D 2) if max(T) is not in Dthen - add max(T) to D - class ( max(T) ) = the minimal value allowed, given D SIKS-Advanced Course on Computational Intelligence, October 2001

The minimal value allowed given D • For each x X \ D it is possible to calculate the minimal and the maximal class value possible, given D. • Let  x be the downset { y  X | y  x } of x • Let y* be an element in D  x with highest class value • Then the minimal class value possible for x is class (y*). SIKS-Advanced Course on Computational Intelligence, October 2001

The maximal value allowed given D • Let  x be the upset { y  X | y  x } of x • Let y* be an element in D  x with lowest class value • Then the maximal class value possible for x is class (y*) • if there is no such element then take the maximal class value (or the minimal, in the former case) SIKS-Advanced Course on Computational Intelligence, October 2001

Example X: 0 0 0 1 0 0 0 1 0 …. 2 2 2 D: 0 0 1 0 0 0 2 1 1 1 2 2 2 0 2 2 2 1 2 3 attr. 1: values 0,1,2 attr. 2: values 0,1,2 attr. 3: values 0,1,2 classes: 0, 1, 2, 3 Let us calculate the min and max poss value forx = 022: minvalue: y* = 002, so the min-value = 1 maxvalue: there is no y*, so the max-value = 3 SIKS-Advanced Course on Computational Intelligence, October 2001

Tracing the algorithm Try to split subset T = X: update D for X: min(X) = 000 is not in D; maxvalue of 000 is 0 add 000 with class 0 to D max(X) = 222 is not in D; minvalue of 222 is 3 add 222 with class 3 to D D  X is not homogeneous so consider all the possible splits: A1 0; A1 1; A2 0; A2 1; A3 0; A3 1 0 0 0 0 0 0 1 0 0 0 2 1 1 1 2 2 2 0 2 2 2 1 2 3 2 2 2 3 SIKS-Advanced Course on Computational Intelligence, October 2001

The entropy of each split The split A1  0splits X into TL = [000,022] and TR= [100,222] D TR 1 1 2 2 2 0 2 2 2 1 2 3 2 2 2 3 Entropy = 1 D TL 0 0 0 0 0 0 1 0 0 0 2 1 Entropy = 0.92 Average entropy of this split = 3/7 x 0.92 + 4/7 x 1 = 0.97 SIKS-Advanced Course on Computational Intelligence, October 2001

Going on with the trace The split with lowest entropy is A1  0, so we go on with T = TL = [000,022]: Try to split subset T= [000,022]: update D for T: min(T) = 000 is already in D max(T) = 022 has minimum value 1, so it is added to D 0 0 0 0 0 0 1 0 0 0 2 1 0 2 2 1 1 1 2 2 2 0 2 2 2 1 2 3 2 2 2 3 D T is not homogeneous, so we go on to consider the following splits: A2 0; A2 1; A3 0; A3 1 Lowest entropy SIKS-Advanced Course on Computational Intelligence, October 2001

We now have the following tree: A1  0 A3  1 ? ? ? SIKS-Advanced Course on Computational Intelligence, October 2001

Going on... The split A3  1splits T into TL = [000,021] and TR= [002,022] We go on with T = TL = [000,021] Try to split subset T= [000,021]: min(T) = 000 is already in D max(T) = 021 has minimum value 0, so it is added to D D T is homogeneous, so we stop and make T into a leaf with class value 0 Next, we go on with T = TR = [002,022], etc. SIKS-Advanced Course on Computational Intelligence, October 2001

Finally... A1  0 A1  1 A3  1 A2  0 2 0 1 2 3 SIKS-Advanced Course on Computational Intelligence, October 2001

A monotone tree for the Bankruptcy problem • can be seen on p. 107 of the paper that was handed out with this course • a tree with 6 leaves • uses the same attributes as those that come up with an ordinal version of the rough set approach: see Viara Popova’s lecture SIKS-Advanced Course on Computational Intelligence, October 2001

Conclusions and remaining problems • We described an efficient algorithm for the induction of monotone decision trees, in case we have a monotone dataset • We also have an algorithm to repair a non-monotone decision tree, but it makes the tree larger • What if we have noise in the dataset? • Is it possible to repair by pruning? SIKS-Advanced Course on Computational Intelligence, October 2001

Ordinal Classification

Ordinal Classification

Presentation Transcript

Ordinal numbers

ORDINAL NUMBERS

Ordinal Numbers

Ordinal Numbers

Ordinal Ballots

Ordinal types

Ordinal Numbers

Ordinal Numbers

Ordinal Data

Ordinal numbers

Combining Classification and Model Trees for Handling Ordinal Problems

Ordinal Numbers

Ordinal Models

ORDINAL NUMBERS

Ordinal Data

Ordinal Numbers

Ordinal Numbers

Ordinal Numbers

Ordinal Numbers

Ordinal Numbers