220 likes | 238 Views
M.Tech. (CS), Semester III, Course B50. Functional Brain Signal Processing: EEG & fMRI Lesson 7. Kaushik Majumdar Indian Statistical Institute Bangalore Center kmajumdar@isibang.ac.in. EEG Coherence Measures. Cross-correlation. Covariance:. EEG Feature Extraction.
E N D
M.Tech. (CS), Semester III, Course B50 Functional Brain Signal Processing: EEG & fMRILesson 7 Kaushik Majumdar Indian Statistical Institute Bangalore Center kmajumdar@isibang.ac.in
EEG Coherence Measures • Cross-correlation. • Covariance:
EEG Feature Extraction Features of EEG signals can be in myriad different forms, such as: • Amplitude • Phase • Fourier coefficients • Wavelet coefficients, etc.
Two Most Fundamental Aspects of Machine Learning • Differentiation: decomposing the data into features, and • Integration: classification of those features.
Duda, Hart & Stork, 2006 Fisher’s Discriminant
Fisher’s Discriminant (cont.) There are n d-dimensional data vectors x1, ….., xn, out of which n1 vectors belong to a set D1 and n2 vectors belong to another set D2. n1 + n2 =n. w is a d-dimensional weight vector such that ||w|| = 1. That is w can apply rotation only. The rotation will have to be such that D1 and D2 are optimally separable by a projection on a straight line in the d-dimensional space.
Fisher’s Discriminant (cont.) • Sample mean is an unbiased estimate of the population mean. So difference in mean ensures difference in population.
Fisher’s Discriminant (cont.) Fisher’s discriminant employs that particular value of the expression for which the criterion function D1 D2 is to be maximized.
Fisher’s Discriminant (cont.) Let us define and Since and
Fisher’s Discriminant (cont.) Similarly where Sw is called within class scatter matrix and SB is called between class scatter matrix.
Fisher’s Discriminant (cont.) J(w) is always a scalar quantity and therefore must hold for a scalar valued function f of a vector variable w, because wT(SB– f(w)Sw)w = 0. Clearly, maximum f(w) will make J(w) maximum. Let maximum f(w) = . Then we can write where w is the vector for which J(w) is maximum. SBw is in direction of m1 – m2 (elaborated in the next slide). Also scale of w does not matter, only direction does. So we can write
Fisher’s Discriminant (cont.) or Note that Here all vectors are by default column vector, if not stated otherwise. So, all transpose operations give row vectors. (m1 – m2)T is a row vector and w is a column vector. Therefore the value within the second bracket above is a scalar. That is SBw = (m1 – m2)s, where s is a scalar. This implies SBw is in the direction of m1 – m2.
Dimensionality Reduction by Fisher’s Discriminant • From we get , where is a d-dimensional identity matrix. and are d-dimensional square matrices. • For the purpose of classification (or pattern recognition) we only need those eigenvectors of whose associated eigenvalues are large enough. The rest of the vectors (and therefore dimensions) we can ignore.
Parra et al., NeuroImage, 22: 342 – 452, 2005 Logistic Regression (cont.) p(y) 1 - p(y)
Logistic Regression vs. Fisher’s Discriminant • Theoretically it has been shown that logistic regression is shown to be between one half and two thirds as effective as normal discrimination for statistically interesting values of parameters (B. Effron, The efficiency of logistic regression compared to normal discriminant analysis, JASA (1975) 892-898).
Logistic Regression (cont.) to be maximized, N is number of data points
Logistic Regression (cont.) Note that is a monotonically increasing function and so any set which increases will lead us closer to the optimal value of . Even if we take and the end result for EEG signal separation for target and non-target or for different targets will almost be similar to the case when a convergence technique for as described is followed. The two classes of data will be separated by the hyperplane normal to and the perpendicular distance of the hyperplane from origin is . In other words the equation of the hyperplane is .
Logistic Regression vs. Fisher’s Discriminant • FD projects the multidimensional data on a line, whose orientation is such that the separation of the projected data becomes maximum on that line. • LR assigns probability distribution to the two different data sets in a way that the distribution approaches 1 on one class and 0 on another, exponentially fast. This makes LR a better separator or classifier than FD.
References • R. Q. Quiroga, A. Kraskov, T. Kreuz and P. Grassberger, On performance of differnet synchronization measures in real data: a case study on EEG signals, Phys. Rev. E, 65(4): 041903, 2002. • R. O. Duda, P. E. Hart and D. G. Stork, Pattern Classification, 4e, John Wiley & Sons, New York, 2007, p. 117 – 121.
THANK YOUThis lecture is available at http://www.isibang.ac.in/~kaushik