ARl711S - SECONDOPPORTUNITYEXAMINATION JULY2023
QUESTION 5: MDP
[20)
The following problems take place in various scenarios of the gridworld MDP (as in
Assignment P3). In all cases, A is the start state and double-rectangle states are exit states.
From an exit state, the only action available is Exit, which results in the listed reward and
ends the game (by moving into a terminal state X, not shown).
From non-exit states, the agent can choose either Left or Right actions, which move the
agent in the corresponding direction. There are no living rewards; the only non-zero
rewards come from exiting the grid.
Throughout this problem, assume that value iteration begins with initial values V0(s) = 0 for
all states s.
First, consider the following mini-grid. For now, the discount is Y = 1 and legal movement
actions will always succeed (and so the state transition function is deterministic).
IEJI
5.1 What is the optimal value V*(A)?
(1)
5.2 When running value iteration, remember that we start with Vo(s)= 0 for alls. What (2)
is the first iteration k for which Vk(A)will be non-zero?
5.3 What will Vk(A)be when it is first non-zero?
( 1)
5.4 After how many iterations k will we have Vk(A)= V*(A)? If they will never become
(2)
equal, write never.
Now the situation is as before, but the discount is less than 1.
5.5 If Y = 0.5, what is the optimal value V*(A)?
(3)
5.6 For what range of values Yof the discount will it be optimal to go Right from A?
(4)
Remember that 0 :5 Y:5 1. Write all or none if all or no legal values of have this
property.
Let's kick it up a notch! The Left and Right movement actions are now stochastic and fail with
probability f. When an action fails, the agent moves up or down with probability f/2 each. When
there is no square to move up or down into (as in the one-dimensional case), the agent stays in
place. The Exit action does not fail.
For the following mini-grid, the failure probability is f = 0.5 and the discount is Y = 1
A IEJI
5.7 What is the optimal value V*(A)?
(1)
5.8 When running value iteration, what is the smallest value of k for which Vk(A)will (1)
be non-zero?
5.9 What will Vk(A)be when it is first non-zero?
(2)
5.10 After how many iterations k will we have Vk(A)= V*(A)? If they will never become (3)
equal, write never. Motivate your answer.
7