ARl711S
Third Exam (continued)
July 2022
Question 1 ..................................................................
[25 points]
(a) Consider an air cargo transport problem involving loading and unloading cargo and flying [15)
it from place to place. We use three actions in this problem: load, unload and fly. We
use two predicates to define the actions: in(x, y), which means that cargo x is inside
plane y; at(z, x), which means that object z (either cargo or plane) is at airport x. Note
that once inside a plane, a cargo is not considered at an airport any longer. Additionally,
the predicate cargo(x) means that x is a cargo; the predicate airport(y) means that y is
an airport and the predicate plane(z) means that z is a plane.
Initially we have three planes: Pi, P2 and P3 . We also have two cargos: C3 and C4 and
three airports: Loci and Loc4 and Loc5 . C3 is at Loci and C4 is at Loc4 . As well, Pi is at
Loci, P2 is at Loc4 and P3 is at Loe;;.
Using the STRIPS notation and first-order logic, define the actions and the initial knowl-
edge base.
(b) Consider the goal of moving C3 to Loc4 and C4 to Loci, update the partial plan
[10)
{unload(C 3 , Pi, Loc4 )} to satisfy the goal. Each step during the update must be discussed
and justified.
Question 2 ..................................................................
[20 points]
(a) The Millionaire is your favourite TV show. It is a ten-round game. Except for the first
[7]
round, the player can choose to play or quit at each round. When the player quits, the
game ends, and s/he can collect the rewards that s/he has earned so far. When the
player plays, s/he can succeed and move to the next round or fail, leading to the end of
the game. Note that ifs/he loses, all the rewards s/he has accumulated so far are lost.
Note also that when the player reaches the last round, whether s/he plays or not the
game ends with the appropriate reward.
Table 1: Millionaire - Rewards and success probability
Round Success Probability Reward
1
0.99
10
2
0.9
50
3
0.8
100
4
0.7
500
5
0.6
1000
6
0.5
5000
7
0.4
10000
8
0.3
50000
9
0.2
100000
10
0.1
500000
Model this problem as a Markov Decision process and evaluate the following policy: 1r =
{roundi H play, round 2 H play, round 3 H quit}. You will use a discount factor of
0.95.
Page 1 of 2
Please turn over to the next page ...