Question 1: Regression Analysis [15 marks]
Answer the following questions:
(a) Explain the difference between linear regression and multiple regression. Provide an example for
each. (4 marks)
(b) What is the purpose of the R-square (R2) statistic in regression analysis, and how is it interpreted?
(3 marks)
(c) What is overfitting in regression analysis, and how can it be prevented? (3 marks)
(d) Suppose you have the following regression equation to predict a student's exam score based on
hours of study (Hours) and their attendance (Attendance, measured in days):
ExamScore = 50 + 3 x Hours + 2 x Attendance
(i) Interpret the coefficients for Hours and Attendance. (3 marks)
(ii) Predict the exam score for a student who studied for 10 hours and attended 20 days of classes.
(2 marks)
Question 2: Association Analysis [15 marks]
An online retail store has collected data on user purchases across two product categories: Electronics
and Clothing. The following incomplete table summarises the data:
Purchased
~ Purchased
Total
Electronics
300
550
~ Electronics
150
100
Clothing
400
500
~ Clothing
200
50
250
Total
500
800
(a) Complete the table. (4 marks)
(b) Calculate the support and confidence for the association rule 'Electronics Clothing'. Does the
rule meet the thresholds of 10% minimum support and 50% minimum confidence? (6 marks)
(c) Calculate the lift for the association rule 'Electronics Clothing' and interpret the result. (5 marks)
Question 3: Machine Learning [15 marks]
Consider a dataset containing customer transaction history at an e-commerce company. You are tasked
with using machine learning techniques to predict whether a customer will make a purchase in the next
30 days. The available features include customer demographics, browsing history, previous purchases,
and time spent on the website.
(a) Explain the difference between supervised and unsupervised learning. Which type of learning
would you use for this problem, and why? (4 marks)
(b) Define overfitting in the context of machine learning. What strategies can you use to prevent
overfitting in this customer purchase prediction model? (3 marks)
(c) You decide to use logistic regression to solve this problem. What are the key assumptions made
by logistic regression? Does this model have any limitations for predicting customer purchases?
(3 marks)
(d) You have a dataset of 10,000 customers and decide to split the data into 80% training and 20%
testing sets. Explain the purpose of this data split and how you would evaluate the model's perfor-
mance. (3 marks)