1 / 97

Jaringan Syaraf Tiruan (JST)

Jaringan Syaraf Tiruan (JST). Susunan syaraf manusia. Model sel syaraf. Sel syaraf ( neuron ) adalah unit pemrosesan informasi yang merupakan dasar dari operasi JST. Terdapat tiga elemen dasar dari model neuron, yaitu:

upton
Download Presentation

Jaringan Syaraf Tiruan (JST)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Jaringan Syaraf Tiruan (JST)

  2. Susunan syaraf manusia

  3. Model sel syaraf • Sel syaraf (neuron) adalah unit pemrosesan informasi yang merupakan dasar dari operasi JST. • Terdapat tiga elemen dasar dari model neuron, yaitu: • Sekumpulan sinapsis atau jalur hubungan, di mana masing-masing sinapsis memiliki bobot atau kekuatan hubungan. • Suatu adder untuk menjumlahkan sinyal-sinyal input yang diberi bobot oleh sinapsis neuron yang sesuai. Operasi-operasi yang digambarkan di sini mengikuti aturan linear combiner. • Suatu fungsi aktivasi untuk membatasi amplituda output setiap neuron.

  4. Pengertian JST • JST merupakan salah satu upaya manusia untuk memodelkan cara kerja atau fungsi sistem syaraf manusia dalam melaksanakan tugas tertentu. • JST mempunyai struktur tersebar paralel yang sangat besar dan mempunyai kemampuan belajar, sehingga bisa melakukan generalisasi, yaitu bisa menghasilkan output yang benar untuk input yang belum pernah dilatihkan.

  5. Model sel syaraf

  6. Sebuah neuron bisa memiliki banyak masukan tetapi hanya memiliki satu keluaran yang bisa menjadi masukan bagi neuron-neuron yang lain. • Pada gambar terlihat serangkaian sinyal masukan x1, x2, …, xp. • Tiap sinyal masukan dikalikan dengan suatu bobot (wk1, wk2, …, wkp) dan kemudian semua masukan yang telah diboboti tadi dijumlahkan untuk mendapatkan output kombinasi linear uk.

  7. Selanjutnya uk akan diinputkan ke suatu fungsi aktivasi(.) untuk menghasilkan output dari neuron tersebut yk. • Suatu nilai threshold atau bias() dapat ditambahan untuk menyesuaikan nilai masukan ke fungsi aktivasi.

  8. Model sel syaraf (dengan menyertakan threshold)

  9. Model sel syaraf (dengan menyertakan nilai bias)

  10. Fungsi Aktivasi • Fungsi aktivasi yang dinotasikan dengan (.) mendefinisikan nilai output dari suatu neuron dalam level aktivitas tertentu berdasarkan nilai output pengkombinasi linier ui. • Ada beberapa macam fungsi aktivasi yang biasa digunakan, di antaranya adalah: • Hard Limit • Threshold • Symetric Hard Limit • Fungsi linear (identitas) • Fungsi Saturating Linear • Fungsi Sigmoid Biner • Fungsi Sigmoid Bipolar

  11. Fungsi Hard Limit • Fungsi hard limit dirumuskan sebagai:

  12. Hard Limit (dengan threshold) • Fungsi hard limit dengan threshold  dirumuskan sebagai:

  13. Symetric Hard Limit • Fungsi symetric hard limit dirumuskan sebagai:

  14. Fungsi linear (identitas) • Fungsi linear dirumuskan sebagai:

  15. Fungsi Saturating Linear • Fungsi saturating linear dirumuskan sebagai:

  16. Fungsi Sigmoid Biner • Fungsi sigmoid biner dirumuskan sebagai:

  17. Fungsi Sigmoid Bipolar • Fungsi sigmoid bipolar dirumuskan sebagai:

  18. Arsitektur Jaringan • Pola di mana neuron-neuron pada JST disusun berhubungan erat dengan algoritma belajar yang digunakan untuk melatih jaringan. • Secara umum arsitektur jaringan dapat dibagi menjadi empat, yaitu: • Single layer feedforward networks • Multi layer feedforward networks • Recurrent Networks • Lattice Structure

  19. Single layer feedforward networks • Suatu JST biasanya diorganisasikan dalam bentuk lapisan-lapisan (layer). • Pada bentuk paling sederhana hanya terdapat input layer dan output layer, seperti pada gambar berikut:

  20. Multi layer feedforward networks • Kelas kedua dari feedforward neural network adalah jaringan dengan satu atau lebih lapis tersembunyi (hidden layer).

  21. Recurrent Networks • Recurrent neural network adalah jaringan yang mempunyai minimal satu feedback loop. • Sebagai contoh, suatu recurrent network bisa terdiri dari satu lapisan neuron tunggal dengan masing-masing neuron memberikan kembali outputnya sebagai input pada semua neuron yang lain

  22. Recurrent Networks

  23. Recurrent Networks • Berikut adalah jenis lain dari recurrent network dengan hidden neurons.

  24. Lattice Structure • Sebuah lattice (kisi-kisi) terdiri dari array neuron berukuran satu dimensi, atau dua dimensi, atau lebih. • Berikut adalah lattice satu dimensi dengan 3 neuron yang mendapatkan masukan dari 3 node sumber.

  25. Lattice Structure • Berikut adalah lattice dua dimensi dengan 3 kali 3 neuron yang mendapatkan masukan dari 3 node sumber.

  26. Proses belajar • Proses belajar dalam konteks JST dapat didefinisikan sebagai berikut: • Belajar adalah suatu proses di mana parameter-parameter bebas JST diadaptasikan melalui suatu proses perangsangan berkelanjutan oleh lingkungan di mana jaringan berada. • Pada pembahasan selanjutnya akan dibahas dua metode belajar, yaitu: • Supervised learning (belajar dengan pengawasan) • Unsupervised learning (belajar tanpa pengawasan).

  27. Supervised Learning • Supervised learning adalah proses belajar yang membutuhkan guru, yaitu sesuatu yang memiliki pengetahuan tentang lingkungan. • Guru bisa direpresentasikan sebagai sekumpulan sampel input-output. • Pembangunan pengetahuan dilakukan oleh guru dengan memberikan respons yang diinginkan kepada JST. • Parameter-parameter jaringan berubah-ubah berdasarkan vektor latih dan sinyal kesalahan, yaitu perbedaan antara keluaran JST dan respons yang diinginkan. • Proses perubahan dilakukan dalam bentuk iterasi.

  28. Unsupervised Learning • Unsupervised Learning atau self-organized learning tidak membutuhkan guru untuk membantu proses belajar. • Dengan kata lain, tidak ada sekumpulan sampel input-output atau fungsi tertentu untuk dipelajari oleh jaringan. • Salah satu contoh unsupervised learning adalah competitive learning, di mana neuron-neuron saling bersaing untuk menjadi pemenang.

  29. INTRODUCTIONBiological Motivation • Human brain is a densely interconnected network of approximately 1011 neurons, each connected to, on average, 104 others. • Neuron activity is excited or inhibited through connections to other neurons. • The fastest neuron switching times are known to be on the order of 10-3 sec.

  30. The cell itself includes a nucleus (at the center). • To the right of cell 2, the dendrites provide input signals to the cell. • To the right of cell 1, the axon sends output signals to cell 2 via the axon terminals. These axon terminals merge with the dendrites of cell 2.

  31. Portion of a network: two interconnected cells. • Signals can be transmitted unchanged or they can be altered by synapses. A synapse is able to increase or decrease the strength of the connection from the neuron to neuron and cause excitation or inhibition of a subsequence neuron. This is where information is stored. • The information processing abilities of biological neural systems must follow from highly parallel processes operating on representations that are distributed over many neurons. One motivation for ANN is to capture this kind of highly parallel computation based on distributed representations.

  32. 2. NEURAL NETWORK REPRESENTATION • An ANN is composed of processing elements called or perceptrons, organized in different ways to form the network’s structure. Processing Elements • An ANN consists of perceptrons. Each of the perceptrons receives inputs, processes inputs and delivers a single output. The input can be raw input data or the output of other perceptrons. The output can be the final result (e.g. 1 means yes, 0 means no) or it can be inputs to other perceptrons.

  33. The network • Each ANN is composed of a collection of perceptrons grouped in layers. A typical structure is shown in Fig.2. Note the three layers: input, intermediate (called the hidden layer) and output. Several hidden layers can be placed between the input and output layers.

  34. Appropriate Problems for Neural Network • ANN learning is well-suited to problems in which the training data corresponds to noisy, complex sensor data. It is also applicable to problems for which more symbolic representations are used. • The backpropagation (BP) algorithm is the most commonly used ANN learning technique. It is appropriate for problems with the characteristics: • Input is high-dimensional discrete or real-valued (e.g. raw sensor input) • Output is discrete or real valued • Output is a vector of values • Possibly noisy data • Long training times accepted • Fast evaluation of the learned function required. • Not important for humans to understand the weights • Examples: • Speech phoneme recognition • Image classification • Financial prediction

  35. 3. PERCEPTRONS • A perceptron takes a vector of real-valued inputs, calculates a linear combination of these inputs, then outputs • a 1 if the result is greater than some threshold • –1 otherwise. • Given real-valued inputs x1 through xn, the output o(x1, …, xn) computed by the perceptron is o(x1, …, xn) = 1 if w0 + w1x1 + … + wnxn > 0 -1 otherwise where wi is a real-valued constant, or weight. • Notice the quantify (-w0) is a threshold that the weighted combination of inputs w1x1 + … + wnxn must surpass in order for perceptron to output a 1.

  36. To simplify notation, we imagine an additional constant input x0 = 1, allowing us to write the above inequality as n i=0 wixi >0 • Learning a perceptron involves choosing values for the weights w0, w1,…, wn. Figure 3. A perceptron

  37. Representation Power of Perceptrons • We can view the perceptron as representing a hyperplane decision surface in the n-dimensional space of instances (i.e. points). The perceptron outputs a 1 for instances lying on one side of the hyperplane and outputs a –1 for instances lying on the other side, as in Figure 4. The equation for this decision hyperplane is Some sets of positive and negative examples cannot be separated by any hyperplane. Those that can be separated are called linearly separated set of examples. Figure 4. Decision surface

  38. A single perceptron can be used to represent many boolean functions.AND function : Decision hyperplane : w0 + w1 x1 + w2 x2 = 0 -0.8 + 0.5 x1 + 0.5 x2 = 0

  39. OR function • The two-input perceptron can implement the OR function when we set the weights: w0 = -0.3, w1 = w2 = 0.5 Decision hyperplane : w0 + w1 x1 + w2 x2 = 0 -0.3 + 0.5 x1 + 0.5 x2 = 0

  40. XOR function : It’s impossible to implement the XOR function by a single perception. A two-layer network of perceptrons can represent XOR function. Refer to this equation,

  41. Perceptron training rule • Although we are interested in learning networks of many interconnected units, let us begin by understanding how to learn the weights for a single perceptron. • Here learning is to determine a weight vector that causes the perceptron to produce the correct +1 or –1 for each of the given training examples. • Several algorithms are known to solve this learning problem. Here we consider two: the perceptron rule and the delta rule.

  42. One way to learn an acceptable weight vector is to begin with random weights, then iteratively apply the perceptron to each training example, modifying the perceptron weights whenever it misclassifies an example. This process is repeated, iterating through the training examples as many as times needed until the perceptron classifies all training examples correctly. • Weights are modified at each step according to the perceptron training rule, which revises the weight wi associated with input xi according to the rule. wiwi + wi where wi = (t – o) xi • Here: t is target output value for the current training example o is perceptron output  is small constant (e.g., 0.1) called learning rate

  43. Perceptron training rule (cont.) • The role of the learning rate is to moderate the degree to which weights are changed at each step. It is usually set to some small value (e.g. 0.1) and is sometimes made to decrease as the number of weight-tuning iterations increases. • We can prove that the algorithm will converge • If training data is linearly separable • and  sufficiently small. • If the data is not linearly separable, convergence is not assured.

  44. Gradient Descent and the Delta Rule • Although the perceptron rule finds a successful weight vector when the training examples are linearly separable, it can fail to converge if the examples are not linearly separatable. A second training rule, called the delta rule, is designed to overcome this difficulty. • The key idea of delta rule: to use gradient descent to search the space of possible weight vector to find the weights that best fit the training examples. This rule is important because it provides the basis for the backpropagration algorithm, which can learn networks with many interconnected units. • The delta training rule: considering the task of training an unthresholded perceptron, that is a linear unit, for which the output o is given by: o = w0+ w1x1 + ··· + wnxn (1) • Thus, a linear unit corresponds to the first stage of a perceptron, without the threhold.

  45. In order to derive a weight learning rule for linear units, let specify a measure for the training error of a weight vector, relative to the training examples. The Training Error can be computed as the following squared error (2) where D is set of training examples, td is the target output for the training example d and od is the output of the linear unit for the training example d. Here we characterize E as a function of weight vector because the linear unit output O depends on this weight vector.

  46. Hypothesis Space • To understand the gradient descent algorithm, it is helpful to visualize the entire space of possible weight vectors and their associated E values, as illustrated in Figure 5. • Here the axes wo,w1 represents possible values for the two weights of a simple linear unit. The wo,w1 plane represents the entire hypothesis space. • The vertical axis indicates the error E relative to some fixed set of training examples. The error surface shown in the figure summarizes the desirability of every weight vector in the hypothesis space. • For linear units, this error surface must be parabolic with a single global minimum. And we desire a weight vector with this minimum.

  47. Figure 5. The error surface How can we calculate the direction of steepest descent along the error surface? This direction can be found by computing the derivative of E w.r.t. each component of the vector w.

  48. Derivation of the Gradient Descent Rule • This vector derivative is called the gradient of E with respect to the vector <w0,…,wn>, written E . (3) Notice E is itself a vector, whose components are the partial derivatives of E with respect to each of the wi. When interpreted as a vector in weight space, the gradient specifies the direction that produces the steepest increase in E. The negative of this vector therefore gives the direction of steepest decrease. Since the gradient specifies the direction of steepest increase of E, the training rule for gradient descent is ww + w where (4)

  49. Here  is a positive constant called the learning rate, which determines the step size in the gradient descent search. The negative sign is present because we want to move the weight vector in the direction that decreases E. This training rule can also be written in its component form wiwi + wi where (5) which makes it clear that steepest descent is achieved by altering each component wi of weight vector in proportion to E/wi. The vector of E/wi derivatives that form the gradient can be obtained by differentiating E from Equation (2), as

  50. (6) where xid denotes the single input component xi for the training example d. We now have an equation that gives E/wi in terms of the linear unit inputs xid, output od and the target value td associated with the training example. Substituting Equation (6) into Equation (5) yields the weight update rule for gradient descent.

More Related