Learning to Predict by the Methods of TD

Learning to Predict by the Methods of TD Presented by Alp Sardağ

Learning to Predict • Using past experience with an incompletely known system to predict its future behavior. • Examples, through exprience: • For particular chess positions whether they will lead to a win. • For particular cloud formations whether there will be rain. • For particular economic conditions how much the stock market will rise or fall.

TD methods • Incremental learning procedures specialized for prediction problems. • Conventional prediction-learning methods are driven by the error between predicted and actual outcomes, • TD methods are similarly driven by the error between temporally successive predictions.

TD Approach • A weatherman will predict whether it will rain on Saturday: • Conventional approach is to compare each prediction to the actual outcome. Either increase or decrease depending on actual outcome. • TD approach, if %50 chance of rain predicted on Monday, and %75 chance on Tuesday, then TD increases predictions for days similar to Monday.

Advantages of TD • Conventional method must wait until Saturday, and then make changes for all days of the week. Need more storage. • Converge faster, make more efficient use of their experience.

Supervised-learning • Example : • To predict Saturday’s weather, form a pair (measurements taken on Monday ,the actual observed weather on Saturday). • Another pair from (measurements taken on Tuesday ,the actual observed weather on Saturday) • This approach ignores the sequential structure of the problem. Since it is easy to understand and analyse and has been widely used.

Single and Multi Step Predictions • To predict weather of next Saturday is a multi-step prediction problem. • To predict next day’s weather is a single- step prediction problem, assuming no further observation made between the time of each day’s prediction and its confirmation or refutation on the following day.

Real World Problems • Although, for single-step TD and supervised methods are not distinctive, multi-step prediction problems dominate the world. • Prediction’s about next year economic performance are not confirmed at all once. • When people hear or see, constantly update their hypotheses about what they are seeing or hearing.

Widrow-Hoff Rule • Let x1,x2,...,xm,z where xt is a vector of observation at time t and z is the outcome of the sequence. • For each observation sequence, the learner produces a corresponding sequence of predictions P1, P2, ..., Pm. • The predictions is also based on a vector of modifiable parameters or weights, w. • Pt is a function of P(xt,w).

Widrow-Hoff Rule • Perceptron. • Unit: xt1 w1 . . . P(xi.wi) wm xtm

Widrow-Hoff Rule • All learning procedure will be expressed as rules for updating w. • Assume that w is updated only once and thus does not change during a sequence.

Widrow-Hoff Rule • Supervised-learning approach treats each sequence of observations and its outcome as a sequence of observation-outcome pairs: • (x1,z),(x2,z),...,(xm,z) • Supervised-learning update procedure is: • In our case reduces to:

TD Approach • Previous procedure, before applying the update need to wait until z becomes known, and all observations and predictions made during a sequence must be remembered. • TD procedure exactly the same result and can be computed incrementally.

TD Approach • The key is to represent the error, z-Pt as sum of changes in predictions:

TD Learning • The final equation: • This equation can be computed incrementally. • No longer necessary to individually remember all past values. • This is called TD(1) procedure.

TD() Learning Procedures • Various TD methods are classified according their sensitivity to changes in successive predictions. • For 01:

TD() Learning Procedures • Advantage of exponential weighted form, can be computed incrementally: • For <1, TD() produces weight changes different from those made by any supervised-learning method.

TD() Learning Procedures • Difference is greatest in the case TD(0), in which weight increment is determined only by its effect on the prediction associated with the most recent observation: • Since TD(0) is simple, it is widely used.

Learning to Predict by the Methods of TD

Learning to Predict by the Methods of TD

Presentation Transcript

Machine learning methods – Introduction The main properties of learning algorithms

Learning to Predict Readability using Diverse Linguistic Features

PREDICT

Methods of Pattern Recognition chapter 5 of: Statistical learning methods by Vapnik

Unable to predict

Statistical Learning Methods

predict

Predict the enterobacteriaceae decontamination of feed by pelleting system

LEARNING METHODS

TD-Learning

Performance of Statistical Learning Methods

Comparison of different statistical methods to predict Intensive Care Length of Stay

Statistical Learning Methods

Dopamine, Uncertainty and TD Learning

Dopamine, Uncertainty and TD Learning

Machine Learning Methods

Learning to Predict

Learning to Predict Structures with Applications to Natural Language Processing

TD Education 2006 TD REPORTS

Predict

Evolution of TD-SCDMA

Teaching learning methods