360 likes | 785 Views
Adaptive Routing. Rienforcement Learning Approaches. Contents. Routing Protocols Reinforcement Learning Q-Routing PQ-Routing Ant Routing Summary. Routing Classification. Distributed. Centralized. A Main controller updates all node’s routing tables. Fault Tollerent.
E N D
Adaptive Routing Rienforcement Learning Approaches
Contents • Routing Protocols • Reinforcement Learning • Q-Routing • PQ-Routing • Ant Routing • Summary
Routing Classification Distributed Centralized • A Main controller updates all node’s routing tables. • Fault Tollerent. • Suitable for small networks. • Route computation shared among nodes by exchanging. • Widley used.
Routing Classification… Adaptive Static • Routing based only on source and destination. • Current network state - Ignored • Adapt policy to time and trafic. • More attractive. • Ossilations in path.
Routing Classification Based on Optimization. Non-Minimal Minimal Optimal Shortest Path LS DV
Short falls of Static Routing • Dynamic networks are subjected to the following changes. • Topologies changes, as nodes are added and removed • Traffic patterns change cyclically • Overall network load changes • So, routing algorithms that assume that the network is static don’t work in this setting
Tackling Dynamic Networks • Periodic Updates? • Routing traffic? • When to update? Adaptive Routing’s the Answer?
Reinforcement Learning Agent Playing against a player- Chess and Tic-Tac-Toe Learning a Value Function
Learning Value Function • Temporal Difference • V(e) = V(e) + K [ V(g) – V(e) ] For K = 0.4 We have V(e) = 0.5 + 0.2 = 0.7 • Exploration Vs Exploitation • e and e* 0.5 1
Rienfocement Learning -Networks S V(s) = V(s) + K [ V(s’) – V(s) ] R R R R R R D
Q-Routing • Qx(d, y) is the time that node x estimates it will take to deliver a packet to node d through its neighbor y • When y receives the packet, it sends back a message (to node x), containing its (i.e. y’s) best estimate of the time remaining to get the packet to d, i.e. • t = min(Qy(d, z)) over all z neighbors( y ) • x then updates Qx(d, y) by: • [Qx(d, y)]NEW = [Qx(d, y)]OLD + K.(s+q+t - [Qx(d,y)]OLD) Where • s = RTT from x to y • q = Time spent in queue at x • T = new estimate by y DQ
message message to d x y w min(Qy(d, zi)) = 13; RTT = s = 11 [Qx(d, y)] += (0.25).[(11+17) - 20] 22 Q-Routing… to d Qy(d, z1) = 25 Qy(d, z2) = 17 estimated RTT = 3 Qy(d, ze) = 70
Short falls • Shortest path algorithm – better than Q Routing under low load. • Failure to converge back to shortest paths when network load decreases again. Failure to explore new shortcuts
message message to d x y w Short falls… to d Qy(d, z1) = 25 Qy(d, z2) = 17 Qy(d, ze) = 70 20 Even if route via y reduces later, It never gets used untill route via W gets cunjusted
Predictive Q-Routing • DQ = s+q+t - [Qx(d,y)]OLD • [Qx(d, y)]NEW = [Qx(d, y)]OLD + K.DQ • Bx(d,y) = MIN[Bx(d,y), Qx(d,y)] • If(DQ < 0) //Path is improving • DR = DQ/(currentTime – lastUpdatedTime) • Rx(d,y) = Rx(d,y) + B.DR //Decrease in R • Else • Rx(d,y) = G.Rx(d,y) //Increase of R • End If • lastUpdatedTime = currentTime
PQ-Routing Policy… Finding neighbour y • For each neighbour y of x • Dt = currentTime – lastUpdatedTime • Qx-pred(d,y) = Qx(d,y) + Dt.Rx(d,y) • Choose y with MIN[Qx-pred(d,y)]
PQ-Routing Results • Performs better than Q-Routing under low, high and varying network loads. • Adapts faster if “probing inactive paths” for shortcuts introduced. • Under high loads, behaves like Q-Routing. • Uses more memory than Q-Routing.
Ant- RoutingStigmergy - Inspirations From Nature… • Sorts brood and food items • Explore particular areas for food, and preferentially exploits the richest available food source • Cooperates in carrying large items • Leaves pheromones on their way back • Always finds the shortest paths to their nests or food source • Are blind, can not foresee future, and has very limited memory
Ants • Each router x in the network maintains for each destination node d a list of the form: • <d, <y1, p1>, <y2, p2>, …, <ye, pe>>, • where y1, y2, …, ye are the neighbors of x, and • p1 + p2 + …+ pe = 1 • This is a parallel (multi-path) routing scheme • This also multiplies the number of degrees of freedom the system has by a factor of |E|
Ants… • Every destination host hd periodically generates an “ant” to a random source host hs • An “ant” is a 3-tuple of the form: • < hd, hs, cost> • cost is a counter of the cost of the path the ant has covered so far
Ant Routing Example 0 1 2 3 4 < 4,0,cost > Routing Table for 1
1+ p 1+ p normalizing sum of probabilities to 1 Ants: Updation When a router x receives an ant < hd, hs, cost> from neighbor yi, it: • Updates cost by the cost of traversing the link from xtoyi (i.e. the cost of the link in reverse) • Updates entry for host (<hd, <y1, p1>, <y2, p2>, …, <ye, pe>>) p = k / cost, for some k pi = pi+ p for j i, pj = pj
Ants: Propagation • Two sub-species of ant: • Regular Ants: P( ant sent to yi ) = pi • Uniform Ants: P( ant sent to yi ) = 1 / e • Regular ants use learned tables to route ants • Uniform ants explore randomly
Q-Routing vs. Ants • Q-Routing only changes its currently selected route when the cost of that route increases, not when the cost of an alternate route decreases • Q-Routing involves overhead linear in the volume of traffic in the network; ants are effectively free in moderate traffic • Q-Routing cannot route messages by parallel paths; uniform ants can
Ants with Evoperation • Evaporation is a real life scenario - Where pheromone laid by real ants evaporates. • Link usage statistics are used to evaporate (E(x)). • It is the proportion of number of ants from node x over the total ants received by the current node.
Summary • Routing algorithms that assume a static network don’t work well in real-world networks, which are dynamic • Adaptive routing algorithms avoid these problems, at the cost of a linear increase in the size of the routing tables • Q-Routing is a straightforward application of Q-Learning to the routing problem • Routing with ants is more flexible than Q-Routing
Reference • Boyan, J., & Littman, M. (1994). Packet routing in dinamically changing networks: A rein-forcement learning approach. In Advances in Neural Information Processing Systems 6 (NIPS6), pp. 671-678. San Francisco, CA:Morgan Kaufmann. • Di Caro, G., & Dorigo, M. (1998). Two ant colony algorithms for best-eort routing in datagram networks. In Proceedings of the Tenth IASTED International Conference on Parallel and Distributed Computing and Systems (PDCS'98), pp. 541-546. IASTED/ACTA Press. • Choi, S., & Yeung, D.-Y. (1996). Predictive Q-routing: A memory-based reinforcement learning approach to adaptive trac control. In Advances in Neural Information Processing Systems 8 (NIPS8), pp. 945-951. MIT Press. • Dorigo, M., Maniezzo, V., & Colorni, A. (1996). The ant system: Optimization by a colony of cooperating agents. IEEE Transactions on Systems, Man, and Cybernetics-Part B, 26 (1), 29-41.