Question 1
1.1 A statistical model is described by the following equation and is used to predict
numerical values:
y =a+ bx
Name the above model and brieflv describe the meaning of each variable in the
above equation.
{5 marks)
1.2 Data visualisation is a powerful approach for presenting complex analytics results to
the audience. Consider five basic types of charts such as pie chart, bar chart, line chart,
histogram, and scatterplot.
Select the most suitable type of chart for visualising each of the following types of data:
{3 marks)
a. Correlation
b. Time series
c. Components {i.e. showing parts of whole)
1.3 Label each data below according to its structure type (structured, semi-structured,
quasi-structured, or unstructured).
{5 marks)
a. Web clicks stream.
b. Relational database files
c. Textual documents
d. XML files
e. JSONfiles
1.4 List the six {6) phases in the Data Mining Processin a chronological order {i.e. from the
initial phase to the final phase).
{3 marks)
1.5 There are various key algorithm for classification. Identify any three {3) of these roles
and briefly describe their main contributions to a data science project.
{6 marks)
1.6 For each of the following business problems, decide whether they should be solved
using a supervised model or an unsupervised model.
a. Predicting which product to cross-sell to an individual customer based on
historical sales data.
b. Identifying different segments of customers according to similarity of their
demographics, purchasing patterns, etc.
c. Discovering product items that are normally bought together with other items
based on supermarket transaction data.
{3 marks)