290 likes | 403 Views
CES 514 – Data Mining Lecture 8 classification (contd…). Example: PEBLS. PEBLS: Parallel Examplar-Based Learning System (Cost & Salzberg) Works with both continuous and nominal features
E N D
Example: PEBLS • PEBLS: Parallel Examplar-Based Learning System (Cost & Salzberg) • Works with both continuous and nominal features • For nominal features, distance between two nominal values is computed using modified value difference metric (MVDM) • Each record is assigned a weight factor • Number of nearest neighbor, k = 1
Distance between nominal attribute values: d(Single,Married) = | 2/4 – 0/4 | + | 2/4 – 4/4 | = 1 d(Single,Divorced) = | 2/4 – 1/2 | + | 2/4 – 1/2 | = 0 d(Married,Divorced) = | 0/4 – 1/2 | + | 4/4 – 1/2 | = 1 d(Refund=Yes,Refund=No) = | 0/3 – 3/7 | + | 3/3 – 4/7 | = 6/7 Example: PEBLS
Example: PEBLS Distance between record X and record Y: where: wX 1 if X makes accurate prediction most of the time wX> 1 if X is not reliable for making predictions
Find a linear hyperplane (decision boundary) that will separate the data Support Vector Machines
One Possible Solution Support Vector Machines
Another possible solution Support Vector Machines
Other possible solutions Support Vector Machines
Which one is better? B1 or B2? How do you define better? Support Vector Machines
Find hyperplane maximizes the margin (e.g. B1 is better than B2.) Support Vector Machines
Support Vector Machines • We want to maximize: • Which is equivalent to minimizing: • But subjected to the following constraints: • This is a constrained optimization problem • Numerical approaches to solve it (e.g., quadratic programming)
Overview of optimization • Simplest optimization problem: • Maximize f(x) (one variable) • If the function has nice properties (such as differentiable), then we can use calculus to solve the problem. • solve equation f’(x) = 0. Suppose a root is a. Then if f’’(a) < 0 then a is a maximum. • Tricky issues: • How to solve the equation f’(x) = 0? • what if there are many solutions? Each is a “local” optimum.
How to solve g(x) = 0 • Even polynomial equations are very hard to solve. • Quadratic has a closed-form. What about higher-degrees? • Numerical techniques: (iteration) • bisection • secant • Newton-Raphson etc. • Challenges: • initial guess • rate of convergence?
Functions of several variables Consider equation such as F(x,y) = 0 To find the maximum of F(x,y), we solve the equations and If we can solve this system of equations, then we have found a local maximum or minimum of F. We can solve the equations using numerical techniques similar to the one-dimensional case.
When is the solution maximum or minimum? • Hessian: • if the Hessian is positive definite in the neighborhood of a, then a is a minimum. • if the Hessian is negative definite in the neighborhood of a, then a is a maximum. • if it is neither, then a is a saddle point.
Application - linear regression Problem: given (x1,y1), … (xn, yn), find the best linear relation between x and y. Assume y = Ax + B. To find A and B, we will minimize Since this is a function of two variables, we can solve by setting and
Constrained optimization Maximize f(x,y) subject to g(x,y) = c Using Lagrange multiplier, the problem is formulated as maximizing: h(x,y) = f(x,y) + l(g(x,y) – c) Now, solve the equations:
Support Vector Machines (contd) • What if the problem is not linearly separable?
Support Vector Machines • What if the problem is not linearly separable? • Introduce slack variables • Need to minimize: • Subject to:
Nonlinear Support Vector Machines • What if decision boundary is not linear?
Nonlinear Support Vector Machines • Transform data into higher dimensional space
Artificial Neural Networks (ANN) Output Y is 1 if at least two of the three inputs are equal to 1.
Artificial Neural Networks (ANN) • Model is an assembly of inter-connected nodes and weighted links • Output node sums up each of its input value according to the weights of its links • Compare output node against some threshold t Perceptron Model or
General Structure of ANN Training ANN means learning the weights of the neurons
Algorithm for learning ANN • Initialize the weights (w0, w1, …, wk) • Adjust the weights in such a way that the output of ANN is consistent with class labels of training examples • Objective function: • Find the weights wi’s that minimize the above objective function • e.g., backpropagation algorithm
WEKA implementations • WEKA has implementation of all the major data mining algorithms including: • decision trees (CART, C4.5 etc.) • naïve Bayes algorithm and all variants • nearest neighbor classifier • linear classifier • Support Vector Machine • clustering algorithms • boosting algorithms etc.