1 / 15

Machine Learning II

Machine Learning II. 부산대학교 전자전기컴퓨터공학과 인공지능연구실 김민호 (karma@pusan.ac.kr). Bayes’ Rule. Please answer the following question on probability.

helene
Download Presentation

Machine Learning II

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Machine Learning II 부산대학교 전자전기컴퓨터공학과 인공지능연구실 김민호 (karma@pusan.ac.kr)

  2. Bayes’ Rule • Please answer the following question on probability. • Suppose one is interested in a rare syntactic construction, perhaps parasitic gaps, which occurs on average once in 100,000 sentences. Joe Linguist has developed a complicated pattern matcher that attempts to identify sentences with parasitic gaps. It’s pretty good, it’s not perfect: if a sentence has a parasitic gap, it will say so with probability 0.95, if it doesn’t, it will wrongly say it does with probability 0.005. Suppose the test say that a sentence contains a parasitic gap. What the probability that this is true? • Sol) • G : the event of the sentence having a parasitic gap • T: the event of the test being positive

  3. Naïve Bayes- Introduction • Simple probabilistic classifiers based on applying Bayes' theorem • Strong (naive) independence assumptions between the features

  4. Naïve Bayes – Train & Test(Classification) train test

  5. Naïve Bayes Examples

  6. Naïve Bayes Examples

  7. Smoothing • Zero probabilities cause a zero probability on the entire data • So….how do we estimate the likelihood of unseen data? • Laplace smoothing • Add 1 to every type count to get an adjusted count c*

  8. Laplace Smoothing Examples • Add 1 to every type count to get an adjusted count c*

  9. Decision Tree • Flowchart-like structure • Internal node represents test on an attribute • Branch represents outcome of test • Leaf node represents class label • Path from root to leaf represents classification rules

  10. Information Gain • entropy of class distribution at a particular node • conditional entropy = average entropy of conditional class distribution, after we have partitioned the data according to the values in A • = • Simple rule in decision tree learning • At each internal node, split on the node with the largest information gain (or equivalently, with smallest )

  11. Root Node Example For the training set, 6 positives, 6 negatives, H(6/12, 6/12) = 1 bit Consider the attributes Patrons and Type: Patrons has the highest IG of all attributes and so is chosen by the learning algorithm as the root Information gain is then repeatedly applied at internal nodes until all leaves contain only examples from one class or the other

More Related