Introduction to Neural Networks

Introduction to Neural Networks

What are connectionist neural networks? • Connectionism refers to a computer modeling approach to computation that is loosely based upon the architecture of the brain • Many different models, but all include: • Multiple, individual “nodes” or “units” that operate at the same time (in parallel) • A network that connects the nodes together • Information is stored in a distributed fashion among the links that connect the nodes • Learning can occur with gradual changes in connection strength

History of Neural Networks (1) • Attempts to mimic the human brain date back to work in the 1930s, 1940s, & 1950s by Alan Turing, Warren McCullough, Walter Pitts, Donald Hebb and James von Neumann • 1943 McCulloch-Pitts: neuron as computingelement • 1948 Wiener: cybernetics • 1949 Hebb: learning rule • 1957 Rosenblatt at Cornell developed Perceptron, a hardware neural net for character recognition • 1959 Widrow and Hoff at Stanford developed Adaline for adaptive control of noise on telephone lines • 1960 Widrow-Hoff: least mean square algorithm

History of Neural Networks (2) • Recession • 1969 Minsky-Papert: limitations of perceptron model  Linear Separability in Perceptrons

History of Neural Networks (3) • Revival, mathematically tied together many of the ideas from previous research • 1982 Hopfield: recurrent network model • 1982 Kohonen: self-organizing maps • 1986 Rumelhart et. al.: backpropagation • universial approximation • Since then, growth has exploded. Over 80% of “Fortune 500” have neural net R&D programs • Thousands of research papers • Commercial software applications

Application with Neural Network • Forecasting/Market Prediction: finance and banking • Manufacturing: quality control, fault diagnosis • Medicine: analysis of electrocardiogram data, RNA & DNA sequencing, drug development without animal testing • Pattern/Image recognition: handwriting recognition, airport bomb detection • Optimization: without Simplex • Control: process, robotics

200 billion neurons, 32 trillion synapses Element size: 10-6m Energy use: 25W Processing speed: 100 Hz Parallel, Distributed Fault Tolerant Learns: Yes Intelligent/Conscious: Usually 1 billion bytes RAM but trillions of bytes on disk Element size: 10-9 m Energy watt: 30~90W (CPU) Processing speed: 109 Hz Serial, Centralized Generally not Fault Tolerant Learns: Some Intelligent/Conscious: Generally No Comparison of Brains and Traditional Computers

Biological Inspiration “My brain: It's my second favorite organ.” - Woody Allen, from the movie Sleeper Idea : To make the computer more robust, intelligent, and learn, … Let’s model our computer software (and/or hardware) after the brain

Neurons in the Brain • Although heterogeneous, at a low level the brain is composed of neurons • A neuron receives input from other neurons (generally thousands) from its synapses • Inputs are approximately summed • When the input exceeds a threshold the neuron sends an electrical spike that travels from the body, down the axon, to the next neuron(s)

x2 x1 xn w1 w2 wn y g(ξ) ξ Biological Neuron • 3 major functional units • Dendrites • Cell body • Axon • Synapse • Amount of signal passing through a neuron depends on: • Intensity of signal from feeding neurons • Their synaptic strengths • Threshold of the receiving neuron • Hebb rule (plays key part in learning) • A synapse which repeatedly triggers the activation of a postsynaptic neuron will grow in strength, others will gradually weaken • Learn by adjusting magnitudes of synapses’ strengths

Learning in the Brain • Brains learn • Altering strength between neurons • Creating/deleting connections • Hebb’s Postulate (Hebbian Learning) • When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A's efficiency, as one of the cells firing B, is increased • Long Term Potentiation (LTP) • Cellular basis for learning and memory • LTP is the long-lasting strengthening of the connection between two nerve cells in response to stimulation • Discovered in many regions of the cortex

Artificial Neurons (basic computational entities of an ANN) • Analogy between artificial and biological (connection weights represent synapses) • In 1958 Rosenblatt introduced mechanics (perceptron) • Input to output: y=g(∑iwixj) • Only when sum exceeds the threshold limit will neuron fire • Weights can enhance or inhibit • Collective behaviour of neurons is what’s interesting for intelligent data processing y g( ) ∑w.x w1 w3 w2 x3 x1 x2

Model of a Neuron

Activation Function

oi wij xj g(ξ) ξ threshold Perceptrons • Can be trained on a set of examples using a special learning rule (process) • Weights are changed in proportion to the difference (error) between target output and perceptron solution for each example • Minimize summed square error function: E = 1/2 ∑p∑i(oi(p) - ti(p))2 with respect to the weights • Error is function of all the weights and forms an irregular multidimensional complex hyperplane with many peaks, saddle points and minima • Error minimized by finding set of weights that correspond to global minimum • Done with gradient descent method – weights incrementally updated in proportion to δE/δwij • Updating reads: wij(t + 1) = wij(t) – Δwij • Aim is to produce a true mapping for all patterns

Perceptron Structure

Learning for Perceptron • Initialize wijwith random values • Repeat until wij(t + 1) ≈ wij(t): • Pick pattern p from training set • Feed input to network and calculate the output • Update the weights according to wij(t + 1) = wij(t) – Δwij where Δwij = -ηδE/δwij. • When no change (within some accuracy) occurs, the weights are frozen and network is ready to use on data it has never seen

AND OR Example • Perceptron learns these rules easily (ie, sets appropriate weights and threshold)  w=(w0,w1,w2) = (-1.5,1.0,1.0) and (-0.5,1.0,1.0) where w0 corresponds to the threshold term

x1 x2 x1 x2 Problem & Solution • Perceptrons can only perform accurately with linearly separable classes • linear hyperplane can place one class of objects on one side of plane and other class on other • ANN research put on hold for 20yrs • Solution: additional (hidden) layers of neurons, MLP architecture • Able to solve non-linear classification problems

oi wij hj wjk xk Multilayer Perceptrons (MLPs) • Learning procedure is extension of simple perceptron algorithm • Response function: oi=g(∑iwijg(∑kwjkxk)) Which is non-linear so network able to perform non-linear mappings • Theory tells us that a neural network with at least 1 hidden layer can represent any function • Vast number of ANN types exist

MLP Structure

Geometric Interpretation of Perceptron Learning

Backpropagation ANNs • Most widely used type of network • Feedforward • Supervised (learns mapping from one data space to another using examples) • Error propagated backwards • Versatile. Used for data modelling, classification, forecasting, data and image compression and pattern recognition.

BP Learning Algorithm • Like Perceptron, uses gradient descent to minimize error (generalized to case with hidden layers) • Each iteration constitutes two sweeps • To minimize Error we need δE/δwijbut also need δE/δwjk (which we get using the chain rule) • Training of MLP using BP can be thought of as a walk in weight space along an energy surface, trying to find global minimum and avoiding local minima • Unlike for Perceptron, there is no guarantee that global minimum will be reached, but most cases energy landscape is smooth

Backpropagation Net Structure

BP Learning Algorithm • Initialize wij and wjk with random values • Repeat until wij and wjk have converged or the desired performance level is reached: • Pick pattern p from training set • Present input and calculate the output • Update weights according to: wij(t + 1) = wij(t) –Δwij wjk(t + 1) = wjk(t) –Δwjk where Δw = -ηδE/δw. (…etc…for extra hidden layers)

Training • Generalization: network’s performance on a set of test patterns it has never seen before (lower than on training set) • Training set used to let ANN capture features in data or mapping • Initial large drop in error is due to learning, but subsequent slow reduction is due to: • Network memorization (too many training cycles used) • Overfitting (too many hidden nodes) (network learns individual training examples and loses generalization ability) Error (eg SSE) Testing Optimum network Training No. of hidden nodes or training cycles

Other Popular ANNs Some applications may be solved using variety of ANN types, some only via specific. (problem logistics) • Hopfield networks: optimization Presented with incomplete/noisy pattern, network responds by retrieving an internally stored pattern it most closely resembles • Kohonen networks: (self-organizing) Trained in an unsupervised manner to form clusters in the data. Used for pattern classification and data compression

Feedforward Recurrent Summary of ANN Learning Artificial Neural Networks Unsupervised Supervised Unsupervised Supervised Elman, Jordan, Hopfield MLP, RBF ART Kohonen, Hebbian

홉필드 망: 구조와 작동식 • 제약조건 • 작동식 구조

홉필드 망: 특성과 목적 • 특성 • 피드백이 있는 recurrent네트워크 • 동적 네트워크 • 목적 • 입력에 가장 가까운 패턴 출력 • 응용분야 • 연상기억 (Associative memory) • 최적화 (Optimization)

예 (1) • 문제 : 두 패턴벡터 를 저장 • 학습 : 연결강도를 에 의해 구함

예 (2) • 연상실험1. 학습 데이터의 복구 능력

예 (3) • 불완전한 데이터의 복구 능력 fh(x) +1 0 x -1

실제 예와 문제점 • 연상시킬 패턴들의 유사도가 적어야 함 • 네트워크의 용량 : 노드수의 약 15% • 예 : 10개 패턴의 경우 70개 이상의 노드가 필요 • 5000개 이상의 연결 필요

Boltzman Machine • 시뮬레이티드 어닐링 • At temperature T, output value is determined • Stochastically by Boltzman distribution • With carefully designed Annealing schedule • 볼쯔만 분포 • 특성 • 시뮬레이티드 어닐링 등에 의해 통계학적으로 작동하는 신경망 • 전역 최적화가 가능한 네트워크

에너지 곡선

Self-Organizing Map • Self-organizing map (SOM) • Unsupervised learning • Preserves the topology of data • Widely used in data visualization or topology-preserving mapping • Selection of winner: • Weight update

SOM Structure

SASOM 1. Start with a basic SOM (4X4 map) 2. Train the current network with the Kohonen’s algorithm 3. Calibrate the network using known I/O patterns to determine  Node should be replaced with a submap of several nodes (2X2 map)  Node should be deleted 4. Unless every node represents a unique class, go to step 2

Learning Procedure Input data Initialize map as 4X4 Train with Kohonen’s algorithm Structure adaptation Find nodes whose hit_ratio is less than 95.0% Split the nodes to 2X2 submap Train the split nodes with LVQ algorithm Remove nodes not participated in learning Stop condition satisfied? No Yes Map generated

Kohonen’s Learning • Initialization • 4X4 rectangular map using Kohonen’s learning algorithm • Learning • Winner node • Kohonen’s learning algorithm Neighborhood function Learning rate

0 0 0 0 1 1 1 0 1,0 1 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1,0 1,0 1 1 1 0 1 1,0 1 0 1 1 Dynamic Node-Splitting • Determining whether a node is to be split or not • Hit ratio • Nodes less than 95.0% of hit ratio are split

Initial Weight of Split Nodes : Child node : Parent node : Weights of neighbors : Total number of nodes that participate in weight initialization

LVQ Learning for Modified Map • , and belong to the same class • , and belong to different classes • Neighborhood function is used to preserve the topological order

Homework #1 • Information Geometry에 근거한 MLP 학습원리 설명 및 학습성능 향상을 위한 방법론을 조사하시오. • MLP를 실제문제 해결에 사용하기 위한 Tips를 네트워크의 구조, 학습 알고리즘, 학습데이터 전처리로 나누어 조사하시오.

Introduction to Neural Networks