DSA821S - DATA SCIENCE AND ANALYTICS - 1ST OPP - NOV 2025 :: NUST past examination papers between 2021 and 2025

Expand document

Collapse document

DSA821S - DATA SCIENCE AND ANALYTICS - 1ST OPP - NOV 2025

1 Page 1

2 Page 2

3 Page 3

4 Page 4

DSA821S - DATA SCIENCE AND ANALYTICS - 1ST OPP - NOV 2025

1 Page 1

▲back to top

n Am I BI A u n IVE Rs ITY

OF SCIEnCE Ano TECHnOLOGY

FACULTY OF COMPUTING AND INFORMATICS

DEPARTMENT OF INFORMATICS

QUALIFICATIONS: Bachelor of Informatics Honours

QUALIFICATION CODE: 08BIFH,

08BIHB

LEVEL: 8

COURSE CODE: DSA821S

COURSE: Data Science and Analytics

DATE: November 2025

DURATION: 3 Hours

SESSION: 1

MARKS: 100

FIRST OPPORTUNITY EXAMINATION QUESTION PAPER

EXAMINERS:

Prof. Stephen Fashoto

MODERATOR(S):

Ms. Emilia Shikeenga

THIS EXAMINATION PAPER CONSISTS OF 4 PAGES

(INCLUDING THIS FRONT PAGE)

INSTRUCTIONS FOR THE CANDIDATE

1. Answer any four QUESTIONS.

2. When writing, take into account: The style should inform than impress, it should be

formal, in third person, paragraphs set out according to ideas or issues, and the

paragraphs flowing in a logical order.

3. Information should be brief and accurate.

Please ensure that your writing is legible, neat and presentable

2 Page 2

▲back to top

QUESTION ONE

[25 marks]

a) Python environment is in two modes. Name them

2marks

b) What is another name for !<-means clustering?

lmark

c) Differentiate between inter-cluster and Intra-cluster with the support of a diagram

d) List three limitations of !<-means clustering

e) List two methods for choosing optimal va lu e of I< in !<-means Clustering

f) List four clustering evaluation metrics

3marks

2marks

4marks

g) Given the data point in the table below, initi alize the k-means clustering algorithm with

two cluster centers cl =(2,10) and c2=(5,8) using Manhattan distance. What are the

values of cl and c2 after one iteration of k-means clustering? What are the values of

cl, and c2 after the second iteration of k-means clustering?

Manhattan distance formula d(x,y)=Ilxi - yd

lOmarks

Point

Coordinates

(2, 10)

(2,5)

(8,4)

(5,8)

(7,5)

(6,4)

(1,2)

(4,9)

QUESTION TWO

[25Marks]

a) List and explain three cha llenges of supervised learning models

6marks

b) Consider the binary classification problem in the Table below to calculate the

following

i)Label the confusion matrix table appropriately first

3 Page 3

▲back to top

ii) Accuracy

ii)Precision

iii)Recall

iv) Fl-score

2marks

v) specificity

2marks

vi) Interpret the results based on the findings on precision and recall from the

calcu latio ns

2marks

Predicted :spam

Predicted:Not spam

Actual:spam

Actual:Not spam

110

c) What will happen if you deploy an Al model without evaluating its performance with

known test set data? Support your answer with only three reasons

6marks

QUESTION THREE

[25Marks]

a) Differentiate between the following

i) Overfitting and underfitting

2marks

ii) Supervised and unsupervised learning

2marks

b) I would like you to perform 5-fold cross-va lidation on any 10 data points

6marks

c) Write short notes on the steps involved in CRISP-OM

7marks

d) Assuming gender is the target variable in the Table below. what will be its implication

when you carry out an exploratory data analysis on it and explain three ways it can be

resolved from data quality perspective?

8marks

Student_num Programme Age

ber

Religion

gender

001

Informatics 23

002

Computer

science

Christianity

Muslim

Male

003

Cybersecurity 32

Christianity Male

004

Informatics 30

005

Informatics 22

Christianity

Female

Male

006

Software

engineering

Christianity Male

4 Page 4

▲back to top

007

Informatics 26

008

Computer

science

009

Informatics 22

010

informatics 23

Muslim

Christianity

Male

Muslim

Christianity

Female

Male

QUESTION FOUR

[25Marks]

a) Explain three key skills required in Data Science with the support of a venn diagram

b) Write short note on any five cha llenges of Data Science

9marks

Smarks

c) List and explain any five data quality problems that can affect classification model

performance

Smarks

d) Explain any three issues of data science methodologies

6marks

QUESTION FIVE

[25Marks]

a) A database has five transactions as shown in the Table be low, Apply Apriori algorithm

on the Transaction data using 40% minimum support threshold and 60% of minimum

confidence threshold. You are expected to stop at 3-itemset.

Transaction ID

Items

B,M

B,D,E,G

C,D,G,M

B,D,G,M

B,C,D,M

b) Write out the output of using TransactionEncoder() on the Table above

c) Proof that the association rules below are commutative or not

{B,D} ➔ {G}

ii)

{G} ➔ {B,D}

9marks

Sm arks

1mark

2marks

d) Write short note on the following

i) Lift

ii) Conviction

2marks

e) Differentiate between hemset and frequent itemset

2marks