1 / 18

Learning to Predict by the Methods of TD

Learning to Predict by the Methods of TD. Presented by Alp Sardağ. Learning to Predict. Using past experience with an incompletely known system to predict its future behavior. Examples, through exprience: For particular chess positions whether they will lead to a win.

kasi
Download Presentation

Learning to Predict by the Methods of TD

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning to Predict by the Methods of TD Presented by Alp Sardağ

  2. Learning to Predict • Using past experience with an incompletely known system to predict its future behavior. • Examples, through exprience: • For particular chess positions whether they will lead to a win. • For particular cloud formations whether there will be rain. • For particular economic conditions how much the stock market will rise or fall.

  3. TD methods • Incremental learning procedures specialized for prediction problems. • Conventional prediction-learning methods are driven by the error between predicted and actual outcomes, • TD methods are similarly driven by the error between temporally successive predictions.

  4. TD Approach • A weatherman will predict whether it will rain on Saturday: • Conventional approach is to compare each prediction to the actual outcome. Either increase or decrease depending on actual outcome. • TD approach, if %50 chance of rain predicted on Monday, and %75 chance on Tuesday, then TD increases predictions for days similar to Monday.

  5. Advantages of TD • Conventional method must wait until Saturday, and then make changes for all days of the week. Need more storage. • Converge faster, make more efficient use of their experience.

  6. Supervised-learning • Example : • To predict Saturday’s weather, form a pair (measurements taken on Monday ,the actual observed weather on Saturday). • Another pair from (measurements taken on Tuesday ,the actual observed weather on Saturday) • This approach ignores the sequential structure of the problem. Since it is easy to understand and analyse and has been widely used.

  7. Single and Multi Step Predictions • To predict weather of next Saturday is a multi-step prediction problem. • To predict next day’s weather is a single- step prediction problem, assuming no further observation made between the time of each day’s prediction and its confirmation or refutation on the following day.

  8. Real World Problems • Although, for single-step TD and supervised methods are not distinctive, multi-step prediction problems dominate the world. • Prediction’s about next year economic performance are not confirmed at all once. • When people hear or see, constantly update their hypotheses about what they are seeing or hearing.

  9. Widrow-Hoff Rule • Let x1,x2,...,xm,z where xt is a vector of observation at time t and z is the outcome of the sequence. • For each observation sequence, the learner produces a corresponding sequence of predictions P1, P2, ..., Pm. • The predictions is also based on a vector of modifiable parameters or weights, w. • Pt is a function of P(xt,w).

  10. Widrow-Hoff Rule • Perceptron. • Unit: xt1 w1 . . . P(xi.wi) wm xtm

  11. Widrow-Hoff Rule • All learning procedure will be expressed as rules for updating w. • Assume that w is updated only once and thus does not change during a sequence.

  12. Widrow-Hoff Rule • Supervised-learning approach treats each sequence of observations and its outcome as a sequence of observation-outcome pairs: • (x1,z),(x2,z),...,(xm,z) • Supervised-learning update procedure is: • In our case reduces to:

  13. TD Approach • Previous procedure, before applying the update need to wait until z becomes known, and all observations and predictions made during a sequence must be remembered. • TD procedure exactly the same result and can be computed incrementally.

  14. TD Approach • The key is to represent the error, z-Pt as sum of changes in predictions:

  15. TD Learning • The final equation: • This equation can be computed incrementally. • No longer necessary to individually remember all past values. • This is called TD(1) procedure.

  16. TD() Learning Procedures • Various TD methods are classified according their sensitivity to changes in successive predictions. • For 01:

  17. TD() Learning Procedures • Advantage of exponential weighted form, can be computed incrementally: • For <1, TD() produces weight changes different from those made by any supervised-learning method.

  18. TD() Learning Procedures • Difference is greatest in the case TD(0), in which weight increment is determined only by its effect on the prediction associated with the most recent observation: • Since TD(0) is simple, it is widely used.

More Related