DSA821S - DATA SCIENCE AND ANALYTICS - 1ST OPP - NOV 2025


DSA821S - DATA SCIENCE AND ANALYTICS - 1ST OPP - NOV 2025



1 Page 1

▲back to top


n Am I BI A u n IVE Rs ITY
OF SCIEnCE Ano TECHnOLOGY
FACULTY OF COMPUTING AND INFORMATICS
DEPARTMENT OF INFORMATICS
QUALIFICATIONS: Bachelor of Informatics Honours
QUALIFICATION CODE: 08BIFH,
08BIHB
LEVEL: 8
COURSE CODE: DSA821S
COURSE: Data Science and Analytics
DATE: November 2025
DURATION: 3 Hours
SESSION: 1
MARKS: 100
FIRST OPPORTUNITY EXAMINATION QUESTION PAPER
EXAMINERS:
Prof. Stephen Fashoto
MODERATOR(S):
Ms. Emilia Shikeenga
THIS EXAMINATION PAPER CONSISTS OF 4 PAGES
(INCLUDING THIS FRONT PAGE)
INSTRUCTIONS FOR THE CANDIDATE
1. Answer any four QUESTIONS.
2. When writing, take into account: The style should inform than impress, it should be
formal, in third person, paragraphs set out according to ideas or issues, and the
paragraphs flowing in a logical order.
3. Information should be brief and accurate.
Please ensure that your writing is legible, neat and presentable

2 Page 2

▲back to top


QUESTION ONE
[25 marks]
a) Python environment is in two modes. Name them
2marks
b) What is another name for !<-means clustering?
lmark
c) Differentiate between inter-cluster and Intra-cluster with the support of a diagram
d) List three limitations of !<-means clustering
e) List two methods for choosing optimal va lu e of I< in !<-means Clustering
f) List four clustering evaluation metrics
3marks
3marks
2marks
4marks
g) Given the data point in the table below, initi alize the k-means clustering algorithm with
two cluster centers cl =(2,10) and c2=(5,8) using Manhattan distance. What are the
values of cl and c2 after one iteration of k-means clustering? What are the values of
cl, and c2 after the second iteration of k-means clustering?
Manhattan distance formula d(x,y)=Ilxi - yd
lOmarks
Point
Coordinates
Xl
(2, 10)
X2
(2,5)
X3
(8,4)
X4
(5,8)
XS
(7,5)
X6
(6,4)
X7
(1,2)
X8
(4,9)
QUESTION TWO
[25Marks]
a) List and explain three cha llenges of supervised learning models
6marks
b) Consider the binary classification problem in the Table below to calculate the
following
i)Label the confusion matrix table appropriately first
2

3 Page 3

▲back to top


ii) Accuracy
ii)Precision
iii)Recall
iv) Fl-score
2marks
2marks
2marks
2marks
v) specificity
2marks
vi) Interpret the results based on the findings on precision and recall from the
calcu latio ns
2marks
Predicted :spam
Predicted:Not spam
Actual:spam
75
30
Actual:Not spam
15
110
c) What will happen if you deploy an Al model without evaluating its performance with
known test set data? Support your answer with only three reasons
6marks
QUESTION THREE
[25Marks]
a) Differentiate between the following
i) Overfitting and underfitting
2marks
ii) Supervised and unsupervised learning
2marks
b) I would like you to perform 5-fold cross-va lidation on any 10 data points
6marks
c) Write short notes on the steps involved in CRISP-OM
7marks
d) Assuming gender is the target variable in the Table below. what will be its implication
when you carry out an exploratory data analysis on it and explain three ways it can be
resolved from data quality perspective?
8marks
Student_num Programme Age
ber
Religion
gender
001
Informatics 23
002
Computer
21
science
Christianity
Muslim
Male
Male
003
Cybersecurity 32
Christianity Male
004
Informatics 30
005
Informatics 22
Christianity
Christianity
Female
Male
006
Software
25
engineering
Christianity Male
3

4 Page 4

▲back to top


007
Informatics 26
008
Computer
27
science
009
Informatics 22
010
informatics 23
Muslim
Christianity
Male
Male
Muslim
Christianity
Female
Male
QUESTION FOUR
[25Marks]
a) Explain three key skills required in Data Science with the support of a venn diagram
b) Write short note on any five cha llenges of Data Science
9marks
Smarks
c) List and explain any five data quality problems that can affect classification model
performance
Smarks
d) Explain any three issues of data science methodologies
6marks
QUESTION FIVE
[25Marks]
a) A database has five transactions as shown in the Table be low, Apply Apriori algorithm
on the Transaction data using 40% minimum support threshold and 60% of minimum
confidence threshold. You are expected to stop at 3-itemset.
Transaction ID
Tl
T2
T3
T4
TS
Items
B,M
B,D,E,G
C,D,G,M
B,D,G,M
B,C,D,M
b) Write out the output of using TransactionEncoder() on the Table above
c) Proof that the association rules below are commutative or not
i)
{B,D} ➔ {G}
ii)
{G} ➔ {B,D}
9marks
Sm arks
1mark
2marks
2marks
d) Write short note on the following
i) Lift
ii) Conviction
2marks
2marks
e) Differentiate between hemset and frequent itemset
2marks
4