Operations Research

Prepared by: Abed Alhameed Mohammed Alfarra Supervised by: Dr. Sana’a Wafa Al-Sayegh 2nd Semester 2008-2009 Operations Research University of Palestine ITGD4207

ITGD4207 Operations Research Chapter 14 Markov Decision Processes

Outline • Introduction to MDPs • Definition MDP • Solution • MDP Basics and Terminology • Markov Assumption • A prototype Example 1 • Example 2

Introduction to MDPs • a Markov Decision Process is a discrete time stochastic control process characterized by a set of states; in each state there are several actions from which the decision maker must choose. • For a state s and an action a, a state transition function Pa(s) determines the transition probabilities to the next state. The decision maker earns a reward for each state transition. • Roots in operations research • Also used in economics, communications engineering, ecology, performance modeling

Definition MDP • Defined formal as a tuple: <S, A, T, R> • S: State • A: Action • T: Transition function • Table P(s’| s, a), prob of s’ given action “a” in state “s” • R: Reward • R(s, a) = cost or reward of taking action a in state s is the probability that action a in state s at time t will lead to state s' at time t + 1,

Definition MDP • The goal is to maximize some cumulative function of the rewards, typically the discounted sum over a potentially infinite horizon:

Solution • The solution to a Markov Decision Process can be expressed as a policy π, a function from states to actions. Note that once a Markov decision process is combined with a policy in this way, this fixes the action for each state and the resulting combination behaves like a Markov Chain.

MDP Basics and Terminology • Goal is to choose a sequence of actions for optimality • Defined as <S, A, T, R> • MDP models: • Finite horizon: Maximize the expected reward for the next n steps • Infinite horizon: Maximize the expected discounted reward. • Transition model: Maximize average expected reward per transition. • Goal state: maximize expected reward (minimize expected cost) to some target state G.

Markov Assumption • Markov Assumption: Transition probabilities (and rewards) from any given state depend only on the state and not on previous history • Where you end up after action depends only on current state • Choose a sequence of actions (not just one decision or one action) • Utility based on a sequence of decisions

A prototype Example 1 A manufacturer has one key machine at the core of one of its production processes. Because of heavy use, the machine deteriorates rapidly in both quality and output. Therefore, at the end of each week, a thorough inspection is done those results in classifying the condition of the machine into one of four possible states:

The following matrix shows the relative frequency (probability) of each possible transition from the state in one month (a row of the matrix) to the state in the following month (a column of the matrix).

The expected costs per week from this source are as follows: Total cost when machine enter state 3 = 6.000$ Find the expected average cost per unit time:

Solution

π0 = π0π1= 7/8 π0 + ¾ π1π1- ¾ π1 = 7/8 π0 0.25 π1 = 7/8 π0π1 =3.5 π0π2= 1/16 π0 + 1/8 π1 + 1/2 π2π2- 1/2 π2 = 1/16 π0 + 1/8 π10.5π2= 1/16 π0 + 1/8 π1π2= 0.125 π0 + 0.25 π1π3= 1/16 π0 + 1/8 π1 + 1/2 π2 = π0 1=π0+π1+π2+π3 1=π0+ 3.5π0+0.125 π0+ 0.25 (3.5 π0)+π0 1= (1+3.5+0.125+0.878+1)+π01= 6.5π0π0 = 0.15 (2/13) (1) π1 =3.5(2/13)= 7/13 (2) π2 =0.125(2/13)+0. 25(7/13)= 2/13 (3) π3 = 2/13

Example 2 • Assume we have 3 types of household detergents • Ariel, Tide, Omo • Compacting for attract customers • After studying the market situation at the widely found that the three types of current shares in the market as follows: -- • Ariel = 40% • Tide = 35% • Omo= 25%

The study showed changes in the demand for all three species were estimated for the regular 6 weeks. The conversion rates were measured from one species to another during the study periodWere as in the following table

Find Identification of the market share of sales volume for each of the detergent during the next periodic periods based on current estimates of shares and the transition matrix of possibilities.

Solution • Market for Tide = (0.40*0.05+0.35*0.8+0.25*0.15)=0.3375 • Market share for Ariel = (0.40*0.9+0.35*0.1+0.25*0.1)=0.42 • Market for Omo = (0.40*0.05+0.35*0.1+0.25*0.75)=0.2425

Comparing the ratios of these ratios, we find that the new means: -- - Increase the share of cleaner Ariel from the local market increased = 2% - Tide Detergent decline in the share of the domestic market = 1.25% - Decline in the share of Omo = 0.75%

Thank You . . .

Operations Research