DSA821S - DATA SCIENCE AND ANALYTICS - 2ND OPP - DEC 2025


DSA821S - DATA SCIENCE AND ANALYTICS - 2ND OPP - DEC 2025



1 Page 1

▲back to top


nAm I BIA un IV ERSITY
OF SCIEnCE Ano TECH noLOGY
FACULTY OF COMPUTING AND INFORMATICS
DEPARTMENT OF INFORMATICS
QUALIFICATIONS: Bachelor of Informatics Honours
QUALIFICATION CODE: 08BIFH,
08BIHB
LEVEL: 8
COURSE CODE: DSA821S
COURSE: Data Science and Analytics
DATE: December 2025
DURATION: 3 Hours
SESSION: 1
MARKS: 100
SUPPLEMENTARY/SECOND OPPORTUNITY EXAMINATION QUESTION PAPER
EXAMINERS:
Prof. Stephen Fashoto
MODERATOR(S):
Ms. Emilia Shikeenga
THIS EXAMINATION PAPER CONSISTS OF 4 PAGES
(INCLUDING THIS FRONT PAGE)
INSTRUCTIONS FOR THE CANDIDATE
1. Answer any four QUESTIONS.
2. When writing, take into account: The style should inform than impress, it should be
formal, in third person, paragraphs set out according to ideas or issues, and the
paragraphs flowing in a logical order.
3. Information should be brief and accurate.
Please ensure that your writing is legible, neat and presentable

2 Page 2

▲back to top


QUESTION ONE
[25Marks]
a) Why is SV's the standard characteristics for the big data technologies and not 3V's?
Explain
10marks
b) Write out how to represent the binary class below using a numpy array in python
Index
1
2
3
4
5
6
7
8
9
10
Actual
Dog Not Dog Dog Not Dog Dog Dog Not Dog
dog
dog
dog
predicted Dog Dog Not Dog Not Not Dog Dog Not Not
dog
dog dog
dog dog
Smarks
c) Write short note on how to apply the following with your Data science knowledge
i)
Normalization
2marks
ii)
Di sc retization
2marks
iii) Feature selection
2marks
iv) Feature importance
2marks
v)
Standardization
2marks
QUESTION TWO
[25Marks]
a) Differentiate between a binary and multiclass in supervised learning 2marks
b) A set of 1100 pens contains 700 pens of the Parker brand, and the remaining pens are of
other brands. A binary classifier correctly id entified the 700 Parker pens and incorrectly
identified 100 non-Parker pens as Parker.
(i) How many non-Parker pens were correctly id entified?
2marks
(ii) Construct the confusion matrix of th e classifier
2marks
(iii) Calculate the following based on the confusion matrix in question lb(ii)
1) Accuracy
2marks
2) Recall
2marks
3) Precision
2marks
4) Fl-Score
2marks
5) Specificity
2marks
c) Write short note on the key components of Reinforcement learning with the support of a
diagram.
9marks
QUESTION THREE
[25Marks]
a) Write out the algorithm for implementing K-means clustering Smarks
b) List and explain five reasons why Data Quality is important in Big data technologies?
10marks
c) Given the data point in the table below, initialize the k-means clu stering algorithm with
two cluster centers cl =(2,10) and c2=(8,4) using Squared Euclidean distance. What
are the values of cl and c2 after one iteration of k-means clustering? What are the
values of cl, and c2 after the second iteration of k-means clustering?
10marks
3

3 Page 3

▲back to top


1 Squared Euclidean distance formula d(x,y)=I 1=1 (x1 - Yi)2
Point
Coordinates
Xl
(2,10)
X2
(2,5)
X3
(8,4)
X4
(5,8)
XS
(7,5)
X6
(6,4)
X7
( 1,2)
X8
(4,9)
QUESTION FOUR
[25Marks]
a) List and explain the three key skills of a data scientist with the support of a diagram
7marks
b) Explain how SEMMA as a data science methodology can be applied in research
10marks
c) Write short note on the associative rules' terminologies listed below
i.
Antecedent
2marks
ii.
Consequent
2marks
iii. Support
2marks
iv. Confidence
2marks
QUESTION FIVE
[25Marks]
Consider the following dataset in the table below using Apriori algorithm with a minimum
support threshold of 55% and minimum confidence threshold of 60%.
Transaction ID
T1
T2
T3
T4
TS
Items bought
Bread, butter, milk
Bread, butter
Bread, milk
Butter, milk
Bread, milk
4

4 Page 4

▲back to top


i)
Find all the frequent itemsets
10marks
ii)
Generate the association rules
10marks
iii) Calculate the lift results with the interpretation Smarks
5