1.07k likes | 1.23k Views
Advanced Computing Seminar Data Mining and Its Industrial Applications — Chapter 4 — Inductive Learning. Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr Knowledge and Software Engineering Lab Advanced Computing Research Centre School of Computer and Information Science
E N D
Advanced Computing Seminar Data Mining and Its Industrial Applications — Chapter 4 —Inductive Learning Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr Knowledge and Software Engineering Lab Advanced Computing Research Centre School of Computer and Information Science University of South Australia ) Chap4 Inductive Learning Zhongzhi Shi
Outline • Introduction • Machine learning • Version space and bias • Decision tree learning • Ripper algorithm • Summary Chap4 Inductive Learning Zhongzhi Shi
Basic Concepts • Data: Store on any media with certain format • Information: Assign meaning to concrete data • knowledge: Refine from information Chap4 Inductive Learning Zhongzhi Shi
Finance • Economic • Government • Post • Population • Life cycle • Pattern • Trends • Concept • Relation • Model • Association Rules • Sequence • E-commerce • Resource distribution • Trade • Business Intelligence • E-Science Why Data Mining? Knowledge Decision Making Data Rich Data, Poor Knowledge Chap4 Inductive Learning Zhongzhi Shi
Data Mining vs Knowledge Discovery • Data mining • Extraction of interesting (non-trivial,implicit, previously unknown and potentially useful)patterns or knowledge from huge amount of data • Knowledge discovery (mining) in databases (KDD), knowledge extraction, data/pattern analysis, data archeology, data dredging, information harvesting, business intelligence, etc. Chap4 Inductive Learning Zhongzhi Shi
Data Mining: A KDD Process Knowledge • Data mining—core of knowledge discovery process Pattern Evaluation Data Mining Task-relevant Data Selection Data Warehouse Data Cleaning Data Integration Databases Chap4 Inductive Learning Zhongzhi Shi
Meta data management • Data access • Systems Integration Data Warehouse Process Chap4 Inductive Learning Zhongzhi Shi
Mapping Rules Designed Star Schema Data Mining Approach to Data Warehouse Design Desired star schema • Attribute • Width • Type • NULL allowed • Name • Key • Numeric • Maximum • Minimum • Average • Standard deviation • Text fields • Number of spaces • Numerals used • Average length Macro Picture Chap4 Inductive Learning Zhongzhi Shi
Detailed picture Chap4 Inductive Learning Zhongzhi Shi
Knowledge Representation • Production system • Frame • Semantic networks • First order logic • Ontology Chap4 Inductive Learning Zhongzhi Shi
Production System • Rules IF (conditions) Then (conclusions) If ( animal has wing) and (animal can fly) Then (animal is a bird) Chap4 Inductive Learning Zhongzhi Shi
Production System MYCIN $<rule> = IF <antecedent> THEN <action> (ELSE <action>$ $<antecedent> = AND <condition>$ $<condition> = OR <condition> | <predicate> <associative-tripe>$ $<associative-tripe> = <attribute> <object> <value>$ $<action> = <consequent>) | <procedure>$ $<consequent> = <associative-triple> <certainty-factor>$ Chap4 Inductive Learning Zhongzhi Shi
Frame Structure FRAME FRAME-NAME SLOT-NAME-1: ASPECT-11 ASPECT-VALUE-11 ASPECT-12 ASPECT-VALUE-12 ASPECT-1m AWPECT-VALUE-1m ...... ...... SOLT-NAME-n: ASPECT-n1 ASPECT VALUE-n1 ASPECT-n2 ASPECT-VAPECT-VALUE-n2 ASPECT-n1 ASPECT-VALUE-n1 Chap4 Inductive Learning Zhongzhi Shi
Semantic Networks node: objects arc: relationships Chap4 Inductive Learning Zhongzhi Shi
First Order Logic • Student(John) • Teacher(Markus) • Father(x,y) • Father(y,z) • Grandfather(x,z):-Father(x,y),Father(y,z) • If ( animal has wing) and (animal can fly) Then (animal is a bird) Chap4 Inductive Learning Zhongzhi Shi
Ontology Semantic Web: • Ontology • OWL • Ontology schema • Description Logic Chap4 Inductive Learning Zhongzhi Shi
Outline • Introduction • Machine learning • Version space and bias • Decision tree learning • Ripper algorithm • Summary Chap4 Inductive Learning Zhongzhi Shi
The Essence of Learning • Learning denotes changes in the system that are adaptive in the sense that they enable the system to do the same task or tasks drawn from the same population more efficiently and more effectively the next time. [Simon 1983] • Machine learning is the study of how to make machines acquire new knowledge, new skills, and reorganize existing knowledge. Chap4 Inductive Learning Zhongzhi Shi
Environment Learning Element Knowledge Base Performance Element Feedback The Essence of Learning • The environment supplies the source information to the learning system. The level and quality of the information will significantly affect the learning strategy. Chap4 Inductive Learning Zhongzhi Shi
The Essence of Learning • The environment = Information source Database Text Web pages Image Video Space data Chap4 Inductive Learning Zhongzhi Shi
The Essence of Learning • The learning element uses this information to make improvements in an explicit knowledge base, and the performance element uses the knowledge base to perform its task. Inductive learning Analogical Learning Explanation Learning Genetic algorithm Neural network Chap4 Inductive Learning Zhongzhi Shi
Paradigms for Machine Learning • The inductive paradigm The most widely studied method for symbolic learning is one of inducing a general concept description from a sequence of instances of the concept and known counterexamples of the concept. The task is to build a concept description from which all the previous positive instances can be rederived by universal instantiation but none of the previous negative instances can be rederived by the same process. • The analogical paradigm Analogical reasoning is a strategy of inference that allows the transfer of knowledge from a known area into another area with similar properties. Chap4 Inductive Learning Zhongzhi Shi
Paradigms for Machine Learning • The analytic paradigm The methods attempt to formulate a generalization after analyzing few instances in terms of the systems's knowledge. Mainly deductive rather than inductive mechanisms are used for such learning. • The genetic paradigm Genetic algorithms have been inspired by a direct analogy to mutations in biological reproduction and Darwinian natural selection. In principle, genetic algorithms encode a parallel search through concept space, with each process attempting coarse-grain hill climbing. • The connectionist paradigm Connectionist learning systems, also called ``neural networks“. Connectionist learning consists of readjusting weights in a fixed-topology network via specific learning algorithms Chap4 Inductive Learning Zhongzhi Shi
The Essence of Learning • The knowledge base contains predefined concepts, domain constrains heuristic rules and so on. Knowledge representation Knowledge consistence Knowledge redundancy Chap4 Inductive Learning Zhongzhi Shi
The Essence of Learning • The performance element. The learning element is trying to improve the action of the performance element.The performance element applies knowledge to solve problems and evaluate the learning effects. Chap4 Inductive Learning Zhongzhi Shi
On Concept • The term ``concept" is an universal notion which reflects a general, abstract, and essential features. For example, ``triangle", ``animal", ``computer", all of them are concept. Horse, tiger, bird and so on are called as example of the concept ``animal". Concept contains two meanings, extension and intension. • Intension. The set of attributes which reflect the essential features of a concept is called intension. • Extension. The set of examples which satisfy the definition of a concept is called extension. Fruit Student Chap4 Inductive Learning Zhongzhi Shi
Concept Description • In general, a concept can be described by the concept name, and list of the attributes and attribute-value pairs, that is, (Concept name (Attribute 1 Value1) (Attribute2 Value2) … (Attributen Valuen) In addition, concept description can be represented by first order logic. Each attribute is a predicate, concept name and attribute value can be viewed as arguments. Concept description is represented by predicate calculus Chap4 Inductive Learning Zhongzhi Shi
Attribute Types • Nominal attribute is one that takes on a finite, unordered set of mutually exclusive values. • Linear attribute • Structured attribute Chap4 Inductive Learning Zhongzhi Shi
Attribute Types • Nominal attribute is one that takes on a finite, unordered set of mutually exclusive values. • For examples • Color: red, green, blue • Traffic: airline, railway, ship Chap4 Inductive Learning Zhongzhi Shi
Attribute Types • Linear attribute For examples • Age: 1,2,…100 • Temperature: 20, 21,… • Distance: 1km, 2km,… Chap4 Inductive Learning Zhongzhi Shi
Attribute Types • Structured attribute For examples: • Tree structure • computer Hardware Software CPU Memory Computing Control Chap4 Inductive Learning Zhongzhi Shi
Inductive Learning • From particular examples to general conclusion, principle, rule apple eat tomato eat banana eat … … fruit eat Chap4 Inductive Learning Zhongzhi Shi
Inductive Learning • Given: • Premise statements. Consists of facts, specific observations, intermediate generalizations that provide information about some objects, phenomena, processes, and so on. • Tentative inductive assertion. Provides a priori hypothesis held about the objects in the premise statement. • Background knowledge. Contains general and domain-specific concepts for interpreting the premises and inference rules relevant to the task of inference • Find: Inductive assertion (hypothesis). It strongly or weakly implies the premise statements in the context of background knowledge and satisfies the preference criterion. Chap4 Inductive Learning Zhongzhi Shi
Inductive Learning • Simplest form: learn a function from examples • f is the target function • An exampleis a pair (x, f(x)) • Problem: find a hypothesish • such that h ≈ f • given a training set of examples • (This is a highly simplified model of real learning: • Ignores prior knowledge • Assumes examples are given) Chap4 Inductive Learning Zhongzhi Shi
Inductive Learning Method • Construct/adjust h to agree with f on training set • (h is consistent if it agrees with f on all examples) • E.g., curve fitting: Chap4 Inductive Learning Zhongzhi Shi
Inductive Learning Method • Construct/adjust h to agree with f on training set • (h is consistentif it agrees with f on all examples) • E.g., curve fitting: Chap4 Inductive Learning Zhongzhi Shi
Inductive Learning Method • Construct/adjust h to agree with f on training set • (h is consistent if it agrees with f on all examples) • E.g., curve fitting: Chap4 Inductive Learning Zhongzhi Shi
Inductive Learning Method • Construct/adjust h to agree with f on training set • (h is consistentif it agrees with f on all examples) • E.g., curve fitting: Chap4 Inductive Learning Zhongzhi Shi
Best-Hypothesis • Positive example ð generalize • Negative example ð specialize • Drawbacks: check previous examples & backtrack Chap4 Inductive Learning Zhongzhi Shi
Outline • Introduction • Machine learning • Version space and bias • Decision tree learning • Ripper algorithm • Summary Chap4 Inductive Learning Zhongzhi Shi
Hypothesis Space • Concept description • Extension • a certain set of examples predicted to be satisfied by the hypothesis • Bias • any preference for one hypothesis over another Chap4 Inductive Learning Zhongzhi Shi
Training Examples for Enjoy Sport Sky Temp Humidity Wind Water Forecast EnjoySport Sunny Warm Normal Strong Warm Same YES Sunny Warm High Strong Warm Same YES Rainy Cold High Strong Warm Change NO Sunny Warm High Strong Cool Change YES What is the general concept? Chap4 Inductive Learning Zhongzhi Shi
is more_general_than_or_equal_to relation • Definition of more_general_than_or_equal_to relation: Let hj and hk be boolean-valued functions defined over X. Then hjis more_general_than_or_equal_tohk (hjg hk) iff (xX) [(hk(x)=1)(hj(x)=1)] In our case the most general hypothesis - that every day is a positive example - is represented by ?, ?, ?, ?, ?, ?, and the most specific possible hypothesis - that no day is positive example - is represented by , , , , , . Chap4 Inductive Learning Zhongzhi Shi
Example of the Ordering of Hypotheses Chap4 Inductive Learning Zhongzhi Shi
Version Space Search Chap4 Inductive Learning Zhongzhi Shi
Version Space Example Chap4 Inductive Learning Zhongzhi Shi
Representing Version Space • The General boundary, G, of version space VSH,E, is the set of its maximally general members • The Specific boundary, S, of version space VSH,E, is the set of its maximally specific members • Every member of the version space lies between these boundaries VSH,E, = {hH | (sS) (gG) (ghs)} where xy means x is more general or equal to y Chap4 Inductive Learning Zhongzhi Shi
Candidate-elimination algorithm 1 Initilize H to be the whole space. Thus, the G set contains only the null description, and the S set is consistent with the first observed positive training instance. 2. For each subsequent instance, i, BEGIN IF i is a positive instance, THEN BEGIN Retain in G only those generalizations which match I. Update S to generalize the elements in S as little as possible, so that they will match i. Chap4 Inductive Learning Zhongzhi Shi
Candidate-elimination algorithm ELSE IF i is a negative instance, THEN BEGIN Retain in S only those generalizations which do not match I. Update G to specialize the elements in G as little as possible, so that they will not match i. 3 Repeat step 2 until G = S and this is a singleton set. When this occurs, H has collapsed to include only a single concept. 4 Output H. Chap4 Inductive Learning Zhongzhi Shi
Converging Boundaries of the G and S sets Chap4 Inductive Learning Zhongzhi Shi