DTA621S - DATA ANALYTICS - 1ST OPP - NOV 2025 :: NUST past examination papers between 2021 and 2025

Expand document

Collapse document

DTA621S - DATA ANALYTICS - 1ST OPP - NOV 2025

1 Page 1

2 Page 2

3 Page 3

4 Page 4

5 Page 5

6 Page 6

7 Page 7

DTA621S - DATA ANALYTICS - 1ST OPP - NOV 2025

1 Page 1

▲back to top

n Am I BI A u n IVE Rs ITY

OF SC IEn CE Ano TEC HnOLOGY

FACULTY OF CO MPUTING AN D INFORMATICS

DEPARTM ENT OF INFORM ATICS

QUALIFICATIONS: Bachelor of Informatics

QUALIFICATION CODE: 06BENT/06BAIF LEVEL: 6

COURSE CODE: DTA621S

COURSE: Data Analyti cs

DATE: October 2025

DURATION: 3 Hours

SESSION: 1

MARKS: 100

FIRST OPPORTUNITY EXAMINATION QUESTION PAPER

EXAMIN ERS:

Dr Clopas Kwenda

M ODE RATOR(S):

Professor St ephen Fashoto

THIS EXAMINATION PAPER CONSISTS OF 7 PAGES

(INCLUDING THIS FRONT PAG E)

INSTRUCTIONS FOR THE CANDIDATE

1. Answer ALL QUESTIONS.

2. When writing, take into account: The style should inform than impress, it should be

formal, in third person, paragraphs set out according to ideas or issues, and the

paragraphs flowing in a logical order.

3. Information should be brief and accurate.

Please ensure that your writing is legible, neat and presentable

2 Page 2

▲back to top

Question One

1. What is the primary purpose of the .plot.scatter() function in Pandas?

a) To show the distribution of a single numerical variable.

b) To visualize the relationship between two numerical variables.

c) To compare the proportion of different categories in a dataset.

d) To identify missing values in a Data Frame

2. The code data2.isnull().sum() is used to:

a) Remove all missing values from the DataFrame data2.

b) Replace all missing values in data2 with the value 0.

c) Count and display the number of missing values in each column of data 2.

d) Create a histogram showing the frequency of missing values.

3. According to the notes, why is it generally "not advisable" to use dropna() to remove all

observations with missing values?

a) Because the dropna() function has syntax errors and doesn't work.

b) Because it can only be used on categorical variables, not numerical ones.

c) Because it can significantly reduce the dataset size and lead to less effective or biased

analysis.

d) Because it is slower than imputing the values with the mean.

4. The process of filling in missing values with estimated ones (like the mean, median, or a

specified value) is known as:

a) Deletion

b) Validation

c) Imputation

d) Visualization

S. Look at the following line of code:

loan['LoanAmou nt'].fillna( loan['Loa nAmount'].median(), inplace=True)

What is the result of executing this code?

a) It creates a median value for the 'LoanAmount' column.

b) It deletes all rows where the 'LoanAmount' value is missing.

c) It replaces all missing values in the 'LoanAmount' column with the median value of that

column.

d) It counts how many missing values are in the 'LoanAmount' column.

6. The equation Y = b0 + b1x + b2x2 is an example of which type of regression?

a) Linear Regression

b) Polynomial Regression

3 Page 3

▲back to top

c) Logistic Regression

d) Ridge Regression

7. Predicting the price of a house based on its features like size and location is a classic

example of a:

a) Regression problem

b) Classification problem

c) Clustering problem

d) Dimensionality reduction problem

8. A dataset where each row represents a person and columns include 'Age' (numerical),

'Purchased' (Yes/No), and 'Country' (text) is an example of:

a) A tree-like dataset

b) A tabular dataset

c) A JSON dataset

d) An unstructured dataset

9. An agent that learns to perform a task by receiving rewards for good actions and penalties

for bad ones is an example of:

a) Supervised Learning

b) Unsupervised Learning

c) Reinforcement Learning

d) Regression Analysis

10. In the machine learning life cycle, which step involves cleaning the data and handling

missing values?

a) Gathering Data

b) Data Analysis

c) Data Wrangling

d) Deployment

11. A model that uses curves, jumps, or twists in math to fit messy, real-world data (like

predicting viral social media posts) is called a:

a) Linear Model

b) Non-Linear Model

c) Descriptive Model

d) Deterministic Model

12. Which modeling principle involves updating probabilities as new evidence becomes

available, much like a spam filter?

a) Occam's Razor

4 Page 4

▲back to top

b) Seek Consensus

c) Use Bayesian Reasoning

d) Update Forecasts Dynamically

13. A weather forecast that says "70% chance of rain" is an example of a:

a) Deterministic Model

b) Stochastic Model

c) First-Principles Model

d) Linear Model

14. Which of Nate Silver's principles involves combining multiple models or data sources to

improve reliability?

a) Think Probabilistically

b) Update Forecasts Dynamically

c) Seek Consensus

d) Use Bayesian Reasoning

15. A model that provides a single, exact prediction with no element of probability is called a:

a) Stochastic Model

b) Deterministic Model

c) Probabilistic Model

d) Data-Driven Model

16. Descriptive analytics mainly answers the question:

a) What will happen?

b) Why did it happen?

c) What should we do?

c) What happened?

17. A sales report showing monthly revenue trends is an example of:

a) Diagnostic analytics

b) Descriptive analytics

c) Predictive analytics

d) Prescriptive analytics

18. Which source is an example of internal data?

a) Public datasets

b) Web scraping

c) CRM software

d) Surveys

5 Page 5

▲back to top

19. Which statement best describes semi-structured data?

a) Data always stored in SQL databases

b) Has some organizational tags but no strict schema

c) Has no format or organization at all

d) Randomized raw data

20.Which method is NOT a valid way to add a new key-value pair to a dictionary?

a) dict['new_key'] = 'new_value'

b) dict.update{{'new_key': 'new_value'})

c) dict.add{'new_key', 'new_value')

d) dict.setdefault{'new_key', 'new_value')

Question two

1. Elements of a tuple can be accessed using indexing and slicing, similar to lists.

True/ False

2. The sorted{) function modifies the original tuple when it sorts it.

True/ False

3. You can create a tuple from a list using the tuple{) function.

True/ False

4. You can use the append{) method to add a new element to an existing tuple.

True/ False

5. The main goal of the bias-variance trade-off is to maximize both bias and variance for

the most robust model.

True/ False

6. A model that performs well on training data but poorly on new, unseen data is likely

overfitting.

True/ False

7. Relational operators always give the same type of output when applied to both 1-D

and multi-dimensional arrays.

True/ False

8. The transpose() function changes the orientation of a multidimensional array, while

the flatten() function collapses it into a 1-D array.

True/ False

9. The zeros() function creates an array filled with zeros, while the ones() function creates

an array filled with ones.

True/ False

10. The arange() function is used to create arrays with regularly spaced values within a

specified interval.

6 Page 6

▲back to top

True/ False

Question three

a) With regard to supervised learning, explain the following terms

i. Regression task

(2 marks)

ii. Classification task

(2 marks)

iii. Multicollinearity

iv. Overfitting

(2 marks)

b) Differentiate between the two types of "missing values" mentioned in the

notes: NA and NaN. Provide an example of how a NaN value might be generated. (5

marks)

c) What does the dropna(inplace=True) method do to a DataFrame? (2 marks)

d) Define the term imputation in the context of data preprocessing. (2 marks)

e) Give two examples each for

(8 marks)

i. Structured data

ii. Semi-structured data

iii. Unstructured data

iv. Ordina l data

Question four

a) The manager of "The Daily Grind" coffee shop wants to analyze the consistency of their

morning sales. The sales revenue (in dollars) for the past week (Monday to Saturday)

was recorded as follows:

Sales: 102, 115,98, 107,110,108

Calculate the following statistics to summarize the spread of this data:

I. The range.

II. The sample standard deviation.

111. The sample variance.

IV. The coefficient of variation (CV}.

(2 marks)

(5 marks)

(2 marks)

7 Page 7

▲back to top

b) The students in Mr. Smith's Period 1 math class were asked how many hours they spent

studying for their last test. Their responses, listed in order from least to greatest, are as

follows:

5, 6, 6, 7, 8, 9, 10, 10, 11, 12, 15

1. Calculate the following statistics needed to create a box and whisker plot:

i. Minimum

(1 mark)

ii. First quartile (Ql)

(1 mark)

iii. Median

(1 mark)

iv. Third quartile{Q3)

(1 mark)

v. Maximum

(1 mark)

2. Draw a box and whisker plot that represents the data above (Hint use a number

line from Oto 18 )

marks)

3. Use your box plot to answer the following questions:

i. What is the interquartile range (IQR)?

(2 marks)

ii. Would you describe the distribution of study times as symmetrica l, skewed

left, or skewed right? Explain your reason ing based on the shape of the box

plot.

(2 marks)

Question five

a) A survey was conducted to eva luate a student's performance in five skills:

Skill

llcommunicationllreamworkllProblem-Solvingllcreativityl\\rechnical Skills!

lscore (out of 10)118

Jl7

119

116

118

Task:

I. Create a radar chart (spider chart) to visualize the student's performance across

the five skills.

(10 marks)

b) Data wrangling involves cleaning and transforming raw data into a usable format. List

and explain four techniques commonly used in the .data wrangling process (6 marks)

c) Supervised learning models can be categorized into regression and classification. Explain

the difference between these two types of models.

(4 marks)