• 150 likes • 362 Views
ECE 539 Project. Kalman Filter Based Algorithms for Fast Training of Multilayer Perceptrons: Implementation and Applications. Dan Li Spring, 2000. Introduction. Multilayer perceptron (MLP) A feedforward neural network model Extensively used in pattern classification
E N D
ECE 539 Project Kalman Filter Based Algorithms for Fast Training of Multilayer Perceptrons: Implementation and Applications • Dan Li • Spring, 2000
Introduction • Multilayer perceptron (MLP) • A feedforward neural network model • Extensively used in pattern classification • Essential issue: training/learning algorithm • MLP training algorithms • Error backpropogation (EBP) • A conventional iterative gradient algorithm • Easy to implement • Long and uncertain training process • An algorithm proposed by Scalero and Tepedelenlioglu [1]: S.T. Algorithm (based on Kalman filter techniques) • Modified S.T. algorithm proposed by Wang and Chen [2] : Layer-by-layer (LBL) Algorithm (based on Kalman filter techniques)
1 1 u1 y1 v1 Fo(.) Fh(.) z1 x1 u2 v2 y2 x2 z2 Fo(.) Fh(.) xM uH vH yH Fo(.) zN Fh(.) . . . . . . . . . . . . . . . . . . EBP Algorithm For the hidden layer For the output layer
- - - 1 1 + + + e e e y1 u1 v1 Fo(.) Fh(.) z1 - x1 + F-1o(.) t1 u1* v1* e u2 y2 v2 x2 z2 Fo(.) Fh(.) - + t2 F-1o(.) v2* u2* e xM yH uH vN Fo(.) zN Fh(.) . . . - . . . . . . . . . + . . . F-1o(.) tN . . . uM* vN* e S.T. Algorithm For the hidden layer For the output layer
1 1 u1 v1 y1 Fo() Fh() z1 - - + + x1 y1* F-1o() F-1h() t1 v1* u1* e e y2 v2 u2 x2 z2 Fo() Fh() - - + + y2* F-1o() t2 F-1h() v2* u2* e e xM yH vN uH Fo() Fh() zN - - + + yH* F-1h() F-1o() tN . . . vN* . . . uN* . . . e e . . . . . . . . . LBL Algorithm For the hidden layer For the output layer
Learning Curve 4.5 4 3.5 3 MSE LBL EBP 2.5 S.T. 2 1.5 1 0 200 400 600 800 1000 Epoch Experiment #1: 4-4 Encoding/Decoding • MLP Structure: 4-3-4; =0.16 • EBP: =0.3; =0.8; • S.T.: =0.3; H= o=0.9; • LBL: =0.15; H= o=0.9;
Experiment #2: Pattern Classification (IRIS) 4 input features 3 classes (001, 010, 100) 75 training patterns 75 testing patterns • MLP Structure: 4-3-3; =0.01 • EBP: =0.3; =0.8; • S.T.: =20; H= o=0.9;
Experiment #3: Pattern Classification (wine) 13 input features 3 classes (001, 010, 100) 60 training patterns 118 testing patterns • MLP Structure: 13-15-3; • EBP: =0.3; =0.8; • S.T.: =20; H= o=0.9; • LBL: =0.2; H= o=0.9;
Learning Curve 25 10 20 20 30 40 15 EBP (bat) 50 MSE 60 10 EBP (seq) LBL (seq) 20 40 60 LBL (bat) 5 0 0 100 200 300 400 500 Epoch LBL (bat) EBP (seq) EBP (bat) LBL (seq) 10 10 10 10 20 20 20 20 30 30 30 30 40 40 40 40 50 50 50 50 60 60 60 60 20 40 60 20 40 20 40 60 60 20 40 60 Experiment #4: Image Restoration • Raw image 64 648 bit • MLP structure: 64-16-64 • EBP: =0.3; =0.8; • S.T.: =0.3; H= o=0.9; • LBL: =0.15; H= o=0.9;
1 32 256 1 32 1 64 256 1 64 256 256 Experiment #5: Image Reconstruction (I) Original Image (2562568 bit) * Schemes of selecting training subsets (shaded area) A 32 input features B 64 input features
Restored: LBL (bat) 60 Restored: LBL (seq) 50 EBP (bat) 40 MSE 30 EBP (seq) LBL (seq) LBL (bat) 20 10 0 0 50 100 150 200 Restored: EBP (seq) Epoch Experiment #5: Image Reconstruction (II) Scheme A • MLP structure: 32-16-32 • Convergence threshold: MSE=5 • EBP: =0.3; =0.8; • LBL: =0.15; H= o=0.9;
Restored: EBP (seq) Restored: LBL (bat) Restored: LBL (seq) 90 80 70 EBP (bat) 60 50 EBP (seq) MSE 40 30 LBL (seq) LBL (bat) 20 10 0 0 50 100 150 200 Epoch Experiment #5: Image Reconstruction (III) Scheme B • MLP structure: 64-32-64 • Convergence threshold: MSE=5 • EBP: =0.3; =0.8; • LBL: =0.15; H= o=0.9;
80 70 60 50 LBL (bat) EBP (bat) MSE 40 ST (seq) EBP (seq) EBP (seq) LBL (seq) 30 20 10 0 0 20 40 60 80 100 Epoch Restored: EBP (seq) Restored: S.T. (seq) Restored: LBL (seq) Experiment #5: Image Reconstruction (IV) Scheme A, Noisy Image for Training • MLP structure: 32-16-32 • Convergence threshold: MSE=5 • EBP: =0.3; =0.8; • LBL: =0.15; H= o=0.9;
Conclusions • Compared with EBP algorithm, Kalman-filter-based S.T. and LBL algorithms generally induce a lower MSE in the training process in a significantly smaller number of epochs. • However, the CPU time needed to run one iteration is longer for the S.T. and LBL algorithms, due to the computation of Kalman gain, the inverse of correlation matrices and the (pseudo)inverse of the output in each layer. LBL often required even longer computation time than the S.T. algorithm. • Therefore, the total computation time required is determined by the user’s demand: how well the training result would you like? This is in fact the issue of assigning the “convergence threshold of MSE”. Our examples showed that in various applications, the choice of this threshold generally results a shorter overall training time for the Kalman-filter-based method than for the EBP method. • There is no definite answer to the question “which algorithm converges faster, the LBL or the S.T.?”. Essentially it is case-related. Especially in the S.T. algorithm, the learning rate has a more flexible range not bounded to [0, 1], in contrast to the EBP algorithm.
References • Robert S. Scalero and Nazif Tepedelenlioglu, “A fast new algorithm for training feedforward neural networks”, IEEE Transactions on Signal Processing, Vol. 40, No. 1, pp. 202-210, 1992. • Gou-Jen Wang and Chih-Cheng Chen, “A fast multilayer neural-network training algorithm based on the layer-by-layer optimizing procedures”, IEEE Transactions on Neural Networks, Vol. 7, No. 3, pp. 768-775, 1996. • Brijesh Verma, “Fast training of multilayer perceptrons”, IEEE Transactions on Neural Networks, Vol. 8, No. 6, pp. 1314-1320, 1997. • Adriana Dumitras and Vasile Lazarescu, “The influence of the MLP’s output dimension on its performance in image restoration”, ISCAS ’96, Vol. 1, pp. 329-332