DTA621S - DATA ANALYTICS - 2ND OPP - JAN 2024 :: NUST past examination papers between 2020 and 2024

Expand document

Collapse document

DTA621S - DATA ANALYTICS - 2ND OPP - JAN 2024

1 Page 1

2 Page 2

3 Page 3

4 Page 4

5 Page 5

6 Page 6

7 Page 7

8 Page 8

9 Page 9

DTA621S - DATA ANALYTICS - 2ND OPP - JAN 2024

1 Page 1

▲back to top

nAmI BI AunIVER s ITY

OF SCIEnCE Ano TECHnOLOGY

FACULTYOF COMPUTING AND INFORMATICS

DEPARTMENTOF INFORMATICS

QUALIFICATIONS:BACHELOR OF COMPUTER SCIENCE;BACHELOR OF INFORMATICS

QUALIFICATIONCODE:07BCMS; 07BAIT LEVEL:6

COURSECODE:DTA621S

COURSE: DATA ANALYTICS

DATE: JANUARY 2024

SESSION:1

DURATION: 3 HOURS

MARKS: 70

SUPPLEMENTARY/SECONDOPPORTUNITYEXAMINATION QUESTION PAPER

EXAMINERS:

Mrs Ruusa lpinge

MODERATOR(S):

Dr Jacob Ongala

THIS EXAMINATION PAPERCONSISTSOF 9 PAGES

(INCLUDING THIS FRONTPAGE)

INSTRUCTIONSFORTHE CANDIDATE

1. Answer all questions.

2. When writing, consider the following: The style should be to inform rather than

impress.

3. Information should be brief and accurate.

4. Please ensure that your writing is legible, neat and presentable.

2 Page 2

▲back to top

PART 1: MULTIPLE QUESTIONS (20 MARKS MAXIMUM 1 MARK FOR EACH CORRECT

ANSWER)

Answer all questions. Select ONLY ONE BEST ANSWER to each question.

1. _ is a category, also called supervised machine learning methods in which the

data is split on two parts.

a) Classification

b) Clustering

c) Data mining

d) None of the mentioned above

2. An advantage of using computer programs for qualitative data is that they_.

A Can reduce time required to analyse data.

B. Help in storing and organizing data!

C. Make many procedures available that are rarely done by hand due to time constraints.

D. All of the mentioned above

3. Logistic regression is used to find the probability of event = Success and event=

a) Failure

b) Success

c) Both A and B

d) None of the mentioned above

4. This is the process of reorganising data and cleaning data by removing

redundant and unstructured data and making the data look similar across all

records

a) Smoothing

b) Data aggregation

c) Discretization

d) Normalisation

3 Page 3

▲back to top

5. This is the type of research that It answers key questions such as "how many,

"what" and "why".

a) Quantitative

b) Qualitative

c) Nominal

d) Category

6. _ are used when we want to visually examine the relationship between two

quantitative variables.

a. Bar graph

b. Scatterplot

C. Line graph

d. Pie chart

7. A graph that uses vertical bars to represent data is called a __ .

A. Bar graph

B. Line graph

C. Scatterplot

D. All the mentioned above

8. Data Analytics uses_ to get insights from data.

a) Statistical figures

b) Numerical aspects

c) Statistical methods

d) None of the mentioned above

9. Least Square Method uses_.

a) Linear polynomial

b) Linear regression

c) Linear sequence

d) None of the mentioned above

4 Page 4

▲back to top

10. Take a look at the confusion matrix above containing 263 observations. What is the

accuracy of the predictions?

O.erl-

Truth

A. The accuracy is equal to (165 + 51)/263 (82.1%).

B. The accuracy is equal to (165 + 8)/263 (65.8%).

C. The accuracy is equal to (51)/263 (19.4%).

D. The accuracy is equal to (39 + 8)/263 (17.9%)

11. What is Machine learning?

a) The autonomous acquisition of knowledge using manual programs.

b) The selective acquisition of knowledge using manual programs.

c) The autonomous acquisition of knowledge using computer programs.

d) The selective acquisition of knowledge using computer program.

12. Machine Learning is a field of Al consisting of learning algorithms that.___ _

a. At executing some task

b. Over time with experience

c. Improve their performance.

d. All mentioned above.

5 Page 5

▲back to top

13. Which of the following is not a supervised learning?

a. PCA

b. Naive Bayesian

c. Linear Regression

d. Decision Tree

14. Machine Learning technique that helps in detecting the outliers in data.

a) Clustering

b) Classification

c) Anomaly Detection

d) All the above

15. Which answer best describes standard deviation?

a) Standard deviation is a measure of the spread of a dataset.

b) Standard deviation indicates how much individual values vary from the mean.

c) Standard deviation helps scientists summarize how much variation there is in a

dataset or population.

d) All the above

16. If the mean score for two different datasets is the same, the standard deviation

will necessarily be the same.

a) True

b) False

17. If an experiment is repeated correctly several times, it should yield

a) a distribution of measurements around some central value.

b) a single value that is obtained each and every time.

c) widely and randomly varying results.

d) Unpredictable results

18. In Python, what is the result of the following operation '1 '+'2'

a. '2'

b. '3'

C. 3

d. '12'

6 Page 6

▲back to top

19. In Python, if you executed name= 'Lizz', what would be the output of print(name

[0:2])?

a. Lizz

b. L

C. LI

d. Liz

20. What is the output of the following lines of code:

x=1

if(x!=1):

print('Hello')

else:

print('Hi')

print('Mike')

a) Mike

b) Hello Mike

c) The Mike

d) Hi Mike

7 Page 7

▲back to top

PART 2: STRUCTURED QUESTIONS

ANSWER ALL QUESTIONS

Questions 1

1. Explain the difference between the following term

[10)

Supervised and Unsupervised machine learning.

Training and Test data sets

Logistic and Polynomial Regression

Tuple and List

Variance and Standard Deviation

Question 2

a) A class contains 39 children. The following children were chosen at random, and

their weight were recorded in cm: 38, 51, 46, 79, and 57. Calculate their weight'

standard deviation.

[6]

b) Why Is Standard Deviation Often Used More Than Variance?

[2]

Question 3

1. Explain the output of the following codes written in python programming language. [1O]

a) a= 2

b = 330

print("A") if a > b else print("B")

b) Gemuse = ["apple", "banana", "cherry"]

print(len(Gemuse))

c) Gemuse1 = ("apple", "banana", "cherry")

print(type(Gemuse1 ))

d) import pandas as pd

df = pd.read_csv('data.csv')

df.fillna(130, inplace = True)

8 Page 8

▲back to top

e) i = 1

while i < 6:

print(i)

i += 1

PART 3: APPLICATION OF MACHINE LEARNING

Question 4

a) Identify and explain the types of neural network algorithm presented in the pictures

bellow

[4]

lliddenlayerI

Hiddenlayer2

Inputlayer

Outputlayer

a) Look at the following diagram of Neural Network (NN). Given input 1 and input 2 that

are independent of Input 2 and Input 3 . The output is donated by S, and the bias is 5 in

both cases.

Calculate the Activation Function given the threshold below and state

what will be the output. Show your work. [7]

Threshold

0 = S > 10

1 = S::;; 10

9 Page 9

▲back to top

lnput1=3

lnput2=1

lnput3=1

Bias=5

output(S)

lnput4=0.5

PART 4: DATA PROTECTION

Question 5

Under the GDPR, organisations must meet six data protection principles whenever they process

personal data. Explain the principles of the General Data Protection Regulation (GDPR) [1 O]

END OF QUESTION PAPER