DTA621S - DATA ANALYTICS - 1ST OPP - NOV 2022 :: NUST past examination papers between 2020 and 2024

Expand document

Collapse document

DTA621S - DATA ANALYTICS - 1ST OPP - NOV 2022

1 Pages 1-10

1.1 Page 1

1.2 Page 2

1.3 Page 3

1.4 Page 4

1.5 Page 5

1.6 Page 6

1.7 Page 7

1.8 Page 8

1.9 Page 9

1.10 Page 10

2 Pages 11-20

2.1 Page 11

2.2 Page 12

2.3 Page 13

2.4 Page 14

2.5 Page 15

2.6 Page 16

DTA621S - DATA ANALYTICS - 1ST OPP - NOV 2022

1 Pages 1-10

▲back to top

1.1 Page 1

▲back to top

n Am I B I A u n IVE Rs I TY

OF SCIEnCE Ano TECHnOLOGY

FACULTY OF COMPUTING AND INFORMATICS

DEPARTMENT OF INFORMATICS

QUALIFICATION: BACHELOR OF INFORMATICS, BACHELOR OF COMPUTER SCIENCE

QUALIFICATION CODE: 07BAIT, 07BCMS LEVEL: 6

COURSE: DATA ANALYTICS

COURSE CODE: DTA621S

DATE: NOVEMBER 2022

SESSION: 1

DURATION: 3 HOURS

MARKS: 100

FIRST OPPORTUNITY EXAMINATION QUESTION PAPER

EXAMINER(S) MRS RUUSA IPINGE

MODERATOR: Dr. JACOB ONGALA

THIS QUESTION PAPER CONSISTS OF 8 PAGES

(Including this front page)

INSTRUCTIONS

• Answer ALL questions in Question1, Question2, Question3, Question 4 and Question 5

• NUST examinations rules apply

• DO NOT open this examination cover until you are instructed to do so.

• DO NOT FORGET to write down your student number at the designated places in the

examination page

1.2 Page 2

▲back to top

1.3 Page 3

▲back to top

Question1: MULTIPLE QUESTIONS (20 MARKS MAXIMUM 1 MARK FOR EACH

CORRECT ANSWER)

Answer all questions. Select ONLY ONE BEST ASWER to each questions.

1. Who has overall accountability for compliance with the GDPR?

a) Data subject,

b) The Data Controller,

c) Data processor,

d) ICO

2. The process of quantifying data is referred to a

a) Decoding

b) Structure

c) Enumeration

d) Coding

3. Are used when we want to visually examine the relationship between two

quantitative variables.

a. Bar graph

b. Scatterplot

c. Line graph

d. Pie chart

4. A graph that uses vertical bars to represent data is called a __ .

a) Bar graph

b) Line graph

c) Scatterplot

d) All of the mentioned above

s. What is the name given by the GDPR for the deletion of all personal data?

a) The right to be forgotten

b) The right to withdraw consent

c) The right to access

d) The right to objection

1.4 Page 4

▲back to top

1.5 Page 5

▲back to top

6. This is the process of removal of noise from the dataset and helping in

knowing the features that are important of to build a machine learning

a) Smoothing

b) Data aggregation

c) Discretization

d) Normalisation

7. This is the type of research that It answers key questions such as "how many,

"how much" and "how often".

a) Quantitative

b) Qualitative

c) Nominal

d) Category

8. This is not an examples of continuous data:

a) The amount of time required to complete a project.

b) The weight of children.

c) The square footage of a two-bedroom house.

d) The number of injections or vaccine you received in your lie

9. Which statements is true about ordinal data

a) You cannot do arithmetic with ordinal numbers because they only show

sequence.

b) Ordinal variables are considered as "in between" qualitative and quantitative

data

c) The ordinal data is qualitative data for which the values are ordered.

d) All of the above

10. To predict a quantity value. use_

a) Regression

b) Clustering

c) Classification

d) Dimensionality reduction

1.6 Page 6

▲back to top

1.7 Page 7

▲back to top

11. How do machine learning algorithms make more precise predictions?

a) The algorithms are typically run more powerful servers.

b) The algorithms are better at seeing patterns in the data.

c) Machine learning servers can host larger databases.

d) The algorithms can run on unstructured data

12. What does this image illustrate?

a) A decision tree

b) Reinforcement learning

c) K-nearest neighbour

d) A clear trend line

13. When must high risk data security breaches be reported to the ICO?

a) Within 24 hours

b) Within 72 hours

c) Within 36 hours

d) A week

14. Once the GDPR takes effect, serious breaches could result in fines of what

percentage of the company's global annual turnover?

a) 20 %,

b) 10%,

c) 2%

d) 4%

1.8 Page 8

▲back to top

1.9 Page 9

▲back to top

15. In this method, the representation of the data is made smaller by reducing the

volume

a) Numerosity Reduction

b) Dimensionality reduction

c) Data compression

d) Data reduction

16. This is created by placing the sequence inside the square brackets

a) List

b) String

c) Dictionary

d) Tuples

17. Which one is not true about Standard Deviation

a) Most commonly used measure of variation

b) Shows variation about the mean

c) Can be used to compare two or more sets of data measured in different units

d) Is the square root of the variance?

18. You want to identify global weather patterns that may have been affected by

climate change. To do so, you want to use machine learning algorithms to find

patterns that would otherwise be imperceptible to a human meteorologist.

What is the place to start?

a) Find labelled data of sunny days so that the machine will learn to identify bad

weather.

b) Use unsupervised learning have the machine look for anomalies in a massive

weather database.

c) Create a training set of unusual patterns and ask the machine learning

algorithms to classify them.

d) Create a training set of normal weather and have the machine look for similar

patterns.

1.10 Page 10

▲back to top

2 Pages 11-20

▲back to top

2.1 Page 11

▲back to top

19. Why is naive Bayes called naive?

a) It naively assumes that you will have no data.

b) It does not even try to create accurate predictions.

c) It naively assumes that the predictors are independent from one another.

d) It naively assumes that all the predictors depend on one another.

20. What is one reason not to use the same data for both your training set and

your testing set?

a. You will almost certainly underfit the model.

b. You will pick the wrong algorithm.

c. You might not have enough data for both.

d. You will almost certainly overfitt the model.

2.2 Page 12

▲back to top

2.3 Page 13

▲back to top

Question 2

[10 Marks]

Look at the following boxplots about Liina's performance in physics and chemistry and answer

the following questions.

--rn

--------<[I]--

0 10 20 30 40 50 60 70

Physics

Chen1istry

80 90 100 Scale

(a) State the median mark for each subject.

(2)

(b) Find the range of marks in each subject.

(2)

(3)

(d) What are the three benefits of using a box plot during data cleaning in a normal distribution

data?

(3)

Question 3

[32 Marks]

a) What does Skewed and Symmetric distribution on a data set means

(4)

b) List and explain the methods of data transformation.

(8)

c) Pre-processing of data is mainly to check the data quality. Explain 5 of the elements of

pre-processing

(10)

d) Using Examples, explain the process of Data Science

(10)

Question 4

[18 Marks]

a) What is Overfitting, and How can you avoid It?

(4)

b) What is the difference between Supervised and Unsupervised Machine Learning? (4)

c) What is Bias and Variance in a Machine Learning Model

(4)

d) Name 6 key pieces of information to be contained in privacy notices based on the

GDPR (6)

2.4 Page 14

▲back to top

2.5 Page 15

▲back to top

Question 5

4.1 Based on the Jupiter notebook, explain what the following command means

a) df. dtypes

b) df.shape

c) df.drop duplicates()

d) df.samples(frac=0.5)

e) df.tail(10)

[20 Marks)

(2)

4.2 write a command in Jupiter notebook that will allow you to perform the following tasks

a) Count number of rows with each unique values of

(2)

b) Sum the values of each object in a data frame.

(2)

c) Combine two columns named df1 and df2

(2)

d) Sort the index of the dataframe

(2)

e) Select row 10 to 20

(2)

THE END OF EXAM

2.6 Page 16

▲back to top