ARI711S - ARTIFICIAL INTELLIGENCE - 2ND OPP - JULY 2023 :: NUST past examination papers between 2020 and 2024

Expand document

Collapse document

ARI711S - ARTIFICIAL INTELLIGENCE - 2ND OPP - JULY 2023

1 Page 1

2 Page 2

3 Page 3

4 Page 4

5 Page 5

6 Page 6

7 Page 7

8 Page 8

ARI711S - ARTIFICIAL INTELLIGENCE - 2ND OPP - JULY 2023

1 Page 1

▲back to top

nAm I BIA un IVE RSITY

OF SCIEnCE Ano TECHn OLOGY

Faculty of Computing and Informatics

Department of Computer Science

QUALIFICATION: Bachelor of Computer Science

QUALIFICATION CODE: 07BACS

COURSE: Artificial Intelligence

DATE: JULY2023

DURATION: 3 HOURS

LEVEL: 7

COURSE CODE: ARl711S

SESSION: 2

MARKS: 100

SUPPLEMENTARY/ SECOND OPPORTUNITY EXAMINATION QUESTION PAPER

EXAMINER

MR STANTIN SIEBRITZ

MODERATOR:

MS PAULINA SHIFUGULA

THIS EXAM PAPER CONSISTS OF 8 PAGES

{Including this front page)

INSTRUCTIONS

1. This paper contains 6 questions.

2. Answer ALL questions on the examination booklet provided.

3. Marks/scores are indicated at the right end of each question.

4. Calculators are permitted.

5. NUST examination rules and regulations apply.

2 Page 2

▲back to top

ARl711S - SECONDOPPORTUNITYEXAMINATIONJULY2023

QUESTION 1: SHORT ANSWERS

[20]

State whether the following statements are either TRUEor FALSE.

1.1 Uniform-cost search will never expand more nodes than A*-search.

(2)

1.2 Depth-first search will always expand more nodes than breadth-first search.

(2)

1.3 The most-constrained variable heuristic provides a way to select the next variable to (2)

assign in a backtracking search for solving a CSP.

1.4 By using the most-constrained variable heuristic and the least-constraining value

(2)

heuristic we can solve every CSPin time linear in the number of variables.

1.5 When using alpha-beta pruning, it is possible to get an incorrect value at the root

(2)

node by choosing a bad ordering when expanding children.

1.6 When using alpha-beta pruning, the computational savings are independent of the (2)

order in which children are expanded.

1.7 For an MDP (S; A; T; Y; R) if we only change the reward function Rthe optimal policy (2)

is guaranteed to remain the same.

1.8 Policies found by value iteration are superior to policies found by policy iteration.

(2)

1.9 Q-learning can learn the optimal Q-function Q* without ever executing the optimal (2)

policy.

1.10 If an MDP has a transition model T that assigns non-zero probability for all T(s; a; s0) (2)

then Q-learning will fail.

3 Page 3

▲back to top

ARl711S - SECONDOPPORTUNITYEXAMINATIONJULY2023

QUESTION 2: SEARCH

[20)

For each of the following graph search strategies, work out the order in which states are

expanded, as well as the path returned by graph search. In all cases, assume ties resolve in

such a way that states with earlier alphabetical order are expanded first. The start and goal

state are Sand G, respectively.

2.1 Depth-First Search

(4)

2.2 Breadth-First Search

(4)

2.3 Uniform Cost Search

(4)

2.4 Greedy search with the heuristic h shown on the graph.

(4)

2.5 A* search with the same heuristic.

(4)

4 Page 4

▲back to top

ARl711S - SECONDOPPORTUNITYEXAMINATION JULY2023

QUESTION 3: CSP

[6]

You are in charge of scheduling for computer science classes that meet Mondays,

Wednesdays and Fridays. There are 5 classes that meet on these days and 3 professors who

will be teaching these classes. You are constrained by the fact that each professor can only

teach one class at a time.

The classes are:

• Class 1- Intro to Programming: meets from 8:00-9:00am

• Class 2 - Intro to Artificial Intelligence: meets from 8:30-9:30am

• Class 3 - Natural Language Processing: meets from 9:00-10:00am

• Class4 - Computer Vision: meets from 9:00-10:00am

• Class 5 - Machine Learning: meets from 10:30-11:30am

The professors are:

• Professor A, who is qualified to teach Classes 1, 2, and 5.

• Professor B, who is qualified to teach Classes 3, 4, and 5.

• Professor C, who is qualified to teach Classes 1, 3, and 4.

3.1 Formulate this problem as a CSPproblem in which there is one variable per class,

(2)

stating the domains, and constraints. Constraints should be specified formally and

precisely, but may be implicit rather than explicit.

3.2 Draw the constraint graph associated with your CSP.

(2)

3.3 Provide a possible solution to the CSPscheduling problem. All variables should be

(2)

assigned and no constraints broken.

5 Page 5

▲back to top

ARl711S - SECONDOPPORTUNITYEXAMINATIONJULY2023

QUESTION 4: CSP

[14]

After years of struggling through mazes, Pacman has finally made peace with the ghosts,

Blinky, Pinky, Inky, and Clyde, and invited them to live with him and Ms. Pacman. The move

has forced Pacman to change the rooming assignments in his house, which has 6 rooms.

He has decided to figure out the new assignments with a CSPin which the variables are

Pacman (P), Ms. Pacman (M), Blinky (B), Pinky (K),Inky (I), and Clyde (C), the values are

which room they will stay in, from 1-6, and the constraints are:

i) No two agents can stay in the same room vi) Bis even

ii) P > 3

vii) I is not 1 or 6

iii) K is less than P

viii) (1-C)= 1

iv) M is either 5 or 6

ix) (P-B) = 2

v) P> M

4.1 Unary constraints: On the grid below cross out the values from each domain that are (4)

eliminated by enforcing unary constraints.

p 1 2 :3 4 5 6

B 1234 56

C 123456

K 123456

I 1 2 :3 4 5 6

M123456

4.2 MRV:According to the Minimum Remaining Value (MRV) heuristic, which variable

(2)

should be assigned to first?

4.3 Forward Checking: For the purposes of decoupling this problem from your solution

(4)

to the previous problem, assume we choose to assign P first, and assign it the value

6. What are the resulting domains after enforcing unary constraints (from part 4.1)

and running forward checking for this assignment?

B 123456

C 123456

K 1234 56

I 1 2 :3 4 5 6

M12 3 4 56

4.4 Iterative Improvement: Instead of running backtracking search, you decide to start (4)

over and run iterative improvement with the min-conflicts heuristic for value

selection. Starting with the following assignment: P:6, B:4, C:3, K:2, 1:1,M:5

First, for each variable write down how many constraints it violates in the table

below. Then, in the table on the right, for all variables that could be selected for

assignment, put an [X] in any box that corresponds to a possible value that could be

assigned to that variable according to min-conflicts. When marking next values a

variable could take on, only mark values different from the current one.

6 Page 6

▲back to top

ARl711S- SECOND OPPORTUNITY EXAMINATION JULY 2023

Variable

# violated

1 2 :3 4 5 ('i

l'vI

7 Page 7

▲back to top

ARl711S - SECONDOPPORTUNITYEXAMINATION JULY2023

QUESTION 5: MDP

[20)

The following problems take place in various scenarios of the gridworld MDP (as in

Assignment P3). In all cases, A is the start state and double-rectangle states are exit states.

From an exit state, the only action available is Exit, which results in the listed reward and

ends the game (by moving into a terminal state X, not shown).

From non-exit states, the agent can choose either Left or Right actions, which move the

agent in the corresponding direction. There are no living rewards; the only non-zero

rewards come from exiting the grid.

Throughout this problem, assume that value iteration begins with initial values V0(s) = 0 for

all states s.

First, consider the following mini-grid. For now, the discount is Y = 1 and legal movement

actions will always succeed (and so the state transition function is deterministic).

IEJI

5.1 What is the optimal value V*(A)?

(1)

5.2 When running value iteration, remember that we start with Vo(s)= 0 for alls. What (2)

is the first iteration k for which Vk(A)will be non-zero?

5.3 What will Vk(A)be when it is first non-zero?

( 1)

5.4 After how many iterations k will we have Vk(A)= V*(A)? If they will never become

(2)

equal, write never.

Now the situation is as before, but the discount is less than 1.

5.5 If Y = 0.5, what is the optimal value V*(A)?

(3)

5.6 For what range of values Yof the discount will it be optimal to go Right from A?

(4)

Remember that 0 :5 Y:5 1. Write all or none if all or no legal values of have this

property.

Let's kick it up a notch! The Left and Right movement actions are now stochastic and fail with

probability f. When an action fails, the agent moves up or down with probability f/2 each. When

there is no square to move up or down into (as in the one-dimensional case), the agent stays in

place. The Exit action does not fail.

For the following mini-grid, the failure probability is f = 0.5 and the discount is Y = 1

A IEJI

5.7 What is the optimal value V*(A)?

(1)

5.8 When running value iteration, what is the smallest value of k for which Vk(A)will (1)

be non-zero?

5.9 What will Vk(A)be when it is first non-zero?

(2)

5.10 After how many iterations k will we have Vk(A)= V*(A)? If they will never become (3)

equal, write never. Motivate your answer.

8 Page 8

▲back to top

ARl711S - SECONDOPPORTUNITYEXAMINATIONJULY2023

QUESTION 6: REINFORCEMENT LEARNING

[10]

Assume we have an MOP with state space 5, action space A, reward function R(s; a; s'), and

discount Y.

Our eventual goal is to learn a policy that can be used by a robot in the real world.

However, we only have access to simulation software, not the robot directly. We know that

the simulation software is built using the transition model T5;m(s;a; s') which is

unfortunately different than the transition model that governs our real robot, T,eai(s;a; s').

Without changing the simulation software, we want to use the samples drawn from the

simulator to learn Q-values for our real robot.

Recall the Q-learning update rule. Given a sample (s; a; s'; r}, it performs the following

update:

6.1 Assuming the samples are drawn from the simulator, which new update rule will

(6)

learn the correct Q-value functions for the real-world robot? Provide the correct

update rule and explain.

6.2 Now consider the case where we haven real robots with transition models

(4)

T1,eai(s;a; s'}, ... , T",eai(sa; ;s') and still only one simulator. Is there a way to learn

policies for all n robots simultaneously by using the same samples from the

simulator? If yes, explain how. If no, explain why not. (1-2 sentences)

END OF EXAMINATION PAPER