1 / 108

Chapter 6 The Structural Risk Minimization Principle

Chapter 6 The Structural Risk Minimization Principle. Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University March 23, 2004. Objectives. Structural risk minimization. Two other induction principles. The Scheme of the SRM induction principle.

margot
Download Presentation

Chapter 6 The Structural Risk Minimization Principle

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University March 23, 2004

  2. Objectives

  3. Structural risk minimization

  4. Two other induction principles

  5. The Scheme of the SRM induction principle

  6. Real-Valued functions

  7. Principle of SRM

  8. SRM

  9. Minimum Description Length and SRM inductive principles • The idea about the Nature of Random Phenomena • Minimum Description Length Principle for the Pattern Recognition Problem • Bounds for the MDL • SRM for the simplest Model and MDL • The Shortcoming of the MDL

  10. The idea about the Nature of Random Phenomena • Probability theory (1930s, Kolmogrov) • Formal inference • Axiomatization hasn’t considered nature of randomness • Axioms: given probability measures

  11. The idea about the Nature of Random Phenomena • The model of randomness • Solomonoff (1965), Kolmogrov (1965), Chaitin (1966). • Algorithm (descriptive) complexity • The length of the shortest binary computer program • Up to an additive constant does not depend on the type of computer. • Universal characteristic of the object.

  12. A relatively large string describing an object is random • If algorithm complexity of an object is high • If the given description of an object cannot be compressed significantly. • MML (Wallace and Boulton, 1968)& MDL (Rissanen, 1978) • Algorithm Complexity as a main tool of induction inference of learning machines

  13. Minimum Description Length Principle for the Pattern Recognition Problem • Given l pairs containing the vector x and the binary value ω • Consider two strings: the binary string

  14. Question • Q: Given (147), is the string (146) a random object? • A: to analyze the complexity of the string (146) in the spirit of Solomonoff-Kolmogorov-Chaitin ideas

  15. Compress its description • Since ωii=1,…l are binary values, the string (146) is described by l bits. • Since training pairs were drawn randomly and independently. • The value ωi depend on the vector xibut not on the vector xj.

  16. Model

  17. General Case: not contain the perfect table.

  18. Randomness

  19. Bounds for the MDL • Q: • Does the compression coefficient K(T) determine the probability of the test error in classification (decoding) vectors x by the table T? • A: • Yes

  20. Comparison between the MDL and ERM in the simplest model

  21. SRM for the simplest Model and MDL

  22. SRM for the simplest Model and MDL

  23. The power of compression coefficient • To obtain bound for the probability of error • Only information about the coefficient need to be known.

  24. The power of compression coefficient • How many examples we used • How the structure of code books was organized • Which code book was used and how many tables were in this code book. • How many errors were made by the table from the code book we used.

  25. MDL principle • To minimize the probability of error • One has to minimize the coefficient of compression

  26. The shortcoming of the MDL • MDL uses code books with a finite number of tables. • Continuously depends on parameters, one has to first quantize that set to make the tables.

  27. Quantization • How do we make the ‘smart’ quantization for a given number of observations. • For a given set of functions, how can we construct a code book with a small number of tables but with good approximation ability?

  28. The shortcoming of the MDL • Finding a good quantization is extremely difficult and determines the main shortcoming of MDL principle. • The MDL principle works well when the problem of constructing reasonable code books has a good solution.

  29. Consistency of the SRM principle and asymptotic bounds on the rate of convergence • Q: • Is the SRM consistent? • What is the bound on the (asymptotic) rate of convergence?

More Related