Non-Bayes classifiers.

Non-Bayes classifiers. Linear discriminants, neural networks.

Discriminant functions(1) Bayes classification rule: Instead might try to find a function: is called discriminant function. - decision surface

Class 1 Class 1 Class 2 Class 2 Discriminant functions (2) Linear discriminant function: Decision surface is a hyperplane

Linear discriminant – perceptron cost function Replace Thus now decision function is and decision surface is Perceptron cost function: where

Class 1 Class 2 Linear discriminant – perceptron cost function Perceptron cost function: Value of is proportional to the sum of distances of all misclassified samples to the decision surface. If discriminant function separates classes perfectly, then Otherwise, and we want to minimize it. is continuous and piecewise linear. So we might try to use gradient descent algorithm.

Linear discriminant – Perceptron algorithm Gradient descent: At points where is differentiable Thus Perceptron algorithm converges when classes are linearly separable with some conditions on

Sum of error squares estimation Let denote as desired output function, 1 for one class and –1 for the other. Want to find discriminant function whose output is similar to Use sum of error squares as similarity criterion:

Sum of error squares estimation Minimize mean square error: Thus

Neurons

 f Artificial neuron. Above figure represent artificial neuron calculating:

1 1 0 0 Artificial neuron. Threshold functions f: Step function Logistic function

Combining artificial neurons Multilayer perceptron with 3 layers.

Discriminating ability of multilayer perceptron Since 3-layer perceptron can approximate any smooth function, it can approximate - optimal discriminant function of two classes.

Training of multilayer perceptron f f f f f f Layer r-1 Layer r

Training and cost function Desired network output: Trained network output: Cost function for one training sample: Total cost function: Goal of the training: find values of which minimize cost function .

Gradient descent Denote: Gradient descent: Since , we might want to update weights after processing each training sample separately:

Gradient descent Chain rule for differentiating composite functions: Denote:

Backpropagation If r=L, then If r<L, then

Backpropagation algorithm • Initialization: initialize all weights with random values. • Forward computations: for each training vector x(i) compute all • Backward computations: for each i, j and r=L, L-1,…,2 compute • Update weights:

MLP issues • What is the best network configuration? • How to choose proper learning parameter ? • When training should be stopped? • Choose another threshold function f or cost function J?

Non-Bayes classifiers.

Non-Bayes classifiers.

Presentation Transcript

Bayes Classifiers and Generative Methods

Classifiers

Bayes classifiers

Naive Bayes Classifiers, an Overview

Classifiers

On Discriminative vs. Generative classifiers: Naïve Bayes

Bayes Net Classifiers The Naïve Bayes Model

Classifiers

Recap: Naïve Bayes classifiers

Non Linear Classifiers

MLE’s, Bayesian Classifiers and Naïve Bayes

Classifiers

Naïve Bayes Classifiers

A gentle introduction to the mathematics of biosurveillance: Bayes Rule and Bayes Classifiers

“Classifiers”

Learning Classifiers For Non-IID Data

Classifiers!!!

Naïve-Bayes Classifiers

Non Linear Classifiers

Classifiers