Question 1: Exploratory Data Analysis (EDA) [15 marks]
A dataset contains infom1ation on the sales of a retail store for a year, including columns for Date,
Product, Quantity Sold, and Revenue. The first few rows of the dataset are as follows:
Date
2023-01-01
2023-01-0 I
2023-01-02
2023-01-02
2023-01-03
Product
A
B
A
B
A
Quantity Sold
30
20
25
15
35
Revenue
600
400
500
300
700
(a) Calculate the total revenue generated from each product for the first three days of sales. (3 marks)
(b) Determine the average quantity sold per day for each product. (4 marks)
(c) Identify the day with the highest total revenue and specify the revenue amount. (4 marks)
(d) Discuss two potential insights you can draw from the EDA of this dataset. (2 marks)
(e) Propose one data cleaning step that could improve the quality of this dataset. (2 marks)
Question 2: Hypothesis Testing [15 marks]
A retail company wants to know if there is a significant association between the type of advertisement
used and the purchase decision made by customers. The data collected is summarised in the following
contingency table:
Advertisement Type
Online
TV
Print
Purchase (Yes)
40
30
20
Purchase (No)
10
20
30
(a) State the null and alternative hypotheses for the above mentioned market research. (2 marks)
(b) Calculate the expected frequencies for each cell in the contingency table. Show your calculations.
(5 marks)
(c) Perform the Chi-square test. Calculate the Chi-square statistic and determine the p-value. What
can you conclude? (5 marks)
Use the following formula for chi-square statistic calculation:
Where O = observed frequencies, E = expected frequencies.
(d) Discuss one assumption of the Chi-square test tha~ should be considered. (2 marks)
(e) Suggest a follow-up analysis if a significant association is found. (1 mark)