1 / 26

Ordinal Classification

Ordinal Classification. Rob Potharst Erasmus University Rotterdam. What is ordinal classification?. Company: catering service Swift. total liabilities / total assets 1 net income / net worth 3 … … managers’work experience 5 market niche-position 3 … .

evonne
Download Presentation

Ordinal Classification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ordinal Classification Rob Potharst Erasmus University Rotterdam SIKS-Advanced Course on Computational Intelligence, October 2001

  2. What is ordinal classification? SIKS-Advanced Course on Computational Intelligence, October 2001

  3. Company: catering service Swift • total liabilities / total assets 1 • net income / net worth 3 • … … • managers’work experience 5 • market niche-position 3 • … ... bankruptcy risk + (acceptable) SIKS-Advanced Course on Computational Intelligence, October 2001

  4. Data set: 39 companies 2 2 2 2 1 3 5 3 5 4 2 4 + 4 5 2 3 3 3 5 4 5 5 4 5 + 3 5 1 1 2 2 5 3 5 5 3 5 + 2 3 2 1 2 4 5 2 5 4 3 4 + 3 4 3 2 2 2 5 3 5 5 3 5 + 3 5 3 3 3 2 5 3 4 4 3 4 + 3 5 2 3 4 4 5 4 4 5 3 5 + 1 1 4 1 2 3 5 2 4 4 1 4 + 3 4 3 3 2 4 4 2 4 3 1 3 + 3 4 2 1 2 2 4 2 4 4 1 4 + 2 5 1 1 3 4 4 3 4 4 3 4 + 3 3 4 4 3 4 4 2 4 4 1 3 + 1 1 2 1 1 3 4 2 4 4 1 4 + 2 1 1 1 4 3 4 2 4 4 3 3 + 2 3 2 1 1 2 4 4 4 4 2 5 + 2 3 4 3 1 5 4 2 4 3 2 3 + 2 2 2 1 1 4 4 4 4 4 2 4 + 2 1 3 1 1 3 5 2 4 2 1 3 + 2 1 2 1 1 3 4 2 4 4 2 4 + 2 1 2 1 1 5 4 2 4 4 2 4 + 2 1 1 1 1 3 2 2 4 4 2 3 ? 1 1 3 1 2 1 3 4 4 4 3 4 ? 2 1 2 1 1 2 4 3 3 2 1 2 ? 1 1 1 1 1 1 3 2 4 4 2 3 ? 2 2 2 1 1 3 3 2 4 4 2 3 ? 2 2 1 1 1 3 2 2 4 4 2 3 ? 2 1 2 1 1 3 2 2 4 4 2 4 ? 1 1 4 1 3 1 2 2 3 3 1 2 ? 3 4 4 3 2 3 3 4 4 4 3 4 ? 3 1 3 3 1 2 2 3 4 4 2 3 ? 1 1 2 1 1 1 3 3 4 4 2 3 - 3 5 2 1 1 1 3 2 3 4 1 3 - 2 2 1 1 1 1 3 3 3 4 3 4 - 2 1 1 1 1 1 2 2 3 4 3 4 - 1 1 2 1 1 1 3 1 4 3 1 2 - 1 1 3 1 2 1 2 1 3 3 2 3 - 1 1 1 1 1 1 2 2 4 4 2 3 - 1 1 3 1 1 1 1 1 4 3 1 3 - 2 1 1 1 1 1 1 1 2 1 1 2 - 20: + (acceptable) 9: - (unacceptable) 10: ? (uncertain) from: Greco, Matarazzo, Slowinski (1996) SIKS-Advanced Course on Computational Intelligence, October 2001

  5. Possible classifier if man.exp. > 4, then class = ‘+’ if man.exp. < 4 and net.inc/net.worth = 1, then class = ‘-’ all other cases: class = ‘?’ • when applied to dataset of 39: 3 mistakes SIKS-Advanced Course on Computational Intelligence, October 2001

  6. What is classification? The act of assigning objects to classes, using the values of relevant features of those objects • So we need: • objects (individuals, cases), all belonging to some domain • classes, number and kind prescribed • features (attributes, variables) • a classifier (classification function) that assigns a class to any object SIKS-Advanced Course on Computational Intelligence, October 2001

  7. Building classifiers • = induction from a training set of examples: • data without noise • data with noise SIKS-Advanced Course on Computational Intelligence, October 2001

  8. induction-methods (especially from AI world) • decision trees: C4.5, CART (from 1984 on) • neural networks: backpropagation (from1986, with false start from 1974) • rule induction algorithms: CN2 (1989) • newer methods: rough sets, fuzzy methods, decision lists, pattern based methods, etc. SIKS-Advanced Course on Computational Intelligence, October 2001

  9. Decision tree: example man.exp. < 3 y n gen.exp./sales = 1 + y n tot.liab/cashfl = 1 ? y n classifies 37 out of 39 ex’s correctly - ? SIKS-Advanced Course on Computational Intelligence, October 2001

  10. Ordinal classification • features have ordinal scale • classes have ordinal scale • the ordering must be preserved! SIKS-Advanced Course on Computational Intelligence, October 2001

  11. Preservation of ordering A classifier is monotone iff: if A < B, then also class(A) <class(B) SIKS-Advanced Course on Computational Intelligence, October 2001

  12. Relevance of ordinal classification • selection-problems • credit worthiness • pricing (e.g. real estate) • etc. SIKS-Advanced Course on Computational Intelligence, October 2001

  13. Induction of monotone decision trees • using C4.5 or CART: non-monotone trees • needed: an algorithm that guarantees to generate only monotone trees • Makino, Ibaraki, etc. (1996), • only for 2-class problems, cumbersome • Potharst & Bioch (2000) • for k-class problems, fast and efficient SIKS-Advanced Course on Computational Intelligence, October 2001

  14. The algorithm try to split subset T: 1) update D for subset T 2) ifD  T is homogeneous then assign class label to T and make T a leaf definitively else split T into two non-empty subsets TL and TR using entropy try to split subset TL try to split subset TR SIKS-Advanced Course on Computational Intelligence, October 2001

  15. The update rule update D for T: 1) if min(T) isnot in Dthen - add min(T) to D - class ( min(T) ) = the maximal value allowed, given D 2) if max(T) is not in Dthen - add max(T) to D - class ( max(T) ) = the minimal value allowed, given D SIKS-Advanced Course on Computational Intelligence, October 2001

  16. The minimal value allowed given D • For each x X \ D it is possible to calculate the minimal and the maximal class value possible, given D. • Let  x be the downset { y  X | y  x } of x • Let y* be an element in D  x with highest class value • Then the minimal class value possible for x is class (y*). SIKS-Advanced Course on Computational Intelligence, October 2001

  17. The maximal value allowed given D • Let  x be the upset { y  X | y  x } of x • Let y* be an element in D  x with lowest class value • Then the maximal class value possible for x is class (y*) • if there is no such element then take the maximal class value (or the minimal, in the former case) SIKS-Advanced Course on Computational Intelligence, October 2001

  18. Example X: 0 0 0 1 0 0 0 1 0 …. 2 2 2 D: 0 0 1 0 0 0 2 1 1 1 2 2 2 0 2 2 2 1 2 3 attr. 1: values 0,1,2 attr. 2: values 0,1,2 attr. 3: values 0,1,2 classes: 0, 1, 2, 3 Let us calculate the min and max poss value forx = 022: minvalue: y* = 002, so the min-value = 1 maxvalue: there is no y*, so the max-value = 3 SIKS-Advanced Course on Computational Intelligence, October 2001

  19. Tracing the algorithm Try to split subset T = X: update D for X: min(X) = 000 is not in D; maxvalue of 000 is 0 add 000 with class 0 to D max(X) = 222 is not in D; minvalue of 222 is 3 add 222 with class 3 to D D  X is not homogeneous so consider all the possible splits: A1 0; A1 1; A2 0; A2 1; A3 0; A3 1 0 0 0 0 0 0 1 0 0 0 2 1 1 1 2 2 2 0 2 2 2 1 2 3 2 2 2 3 SIKS-Advanced Course on Computational Intelligence, October 2001

  20. The entropy of each split The split A1  0splits X into TL = [000,022] and TR= [100,222] D TR 1 1 2 2 2 0 2 2 2 1 2 3 2 2 2 3 Entropy = 1 D TL 0 0 0 0 0 0 1 0 0 0 2 1 Entropy = 0.92 Average entropy of this split = 3/7 x 0.92 + 4/7 x 1 = 0.97 SIKS-Advanced Course on Computational Intelligence, October 2001

  21. Going on with the trace The split with lowest entropy is A1  0, so we go on with T = TL = [000,022]: Try to split subset T= [000,022]: update D for T: min(T) = 000 is already in D max(T) = 022 has minimum value 1, so it is added to D 0 0 0 0 0 0 1 0 0 0 2 1 0 2 2 1 1 1 2 2 2 0 2 2 2 1 2 3 2 2 2 3 D T is not homogeneous, so we go on to consider the following splits: A2 0; A2 1; A3 0; A3 1 Lowest entropy SIKS-Advanced Course on Computational Intelligence, October 2001

  22. We now have the following tree: A1  0 A3  1 ? ? ? SIKS-Advanced Course on Computational Intelligence, October 2001

  23. Going on... The split A3  1splits T into TL = [000,021] and TR= [002,022] We go on with T = TL = [000,021] Try to split subset T= [000,021]: min(T) = 000 is already in D max(T) = 021 has minimum value 0, so it is added to D D T is homogeneous, so we stop and make T into a leaf with class value 0 Next, we go on with T = TR = [002,022], etc. SIKS-Advanced Course on Computational Intelligence, October 2001

  24. Finally... A1  0 A1  1 A3  1 A2  0 2 0 1 2 3 SIKS-Advanced Course on Computational Intelligence, October 2001

  25. A monotone tree for the Bankruptcy problem • can be seen on p. 107 of the paper that was handed out with this course • a tree with 6 leaves • uses the same attributes as those that come up with an ordinal version of the rough set approach: see Viara Popova’s lecture SIKS-Advanced Course on Computational Intelligence, October 2001

  26. Conclusions and remaining problems • We described an efficient algorithm for the induction of monotone decision trees, in case we have a monotone dataset • We also have an algorithm to repair a non-monotone decision tree, but it makes the tree larger • What if we have noise in the dataset? • Is it possible to repair by pruning? SIKS-Advanced Course on Computational Intelligence, October 2001

More Related