Question One
1. What is the primary purpose of the .plot.scatter() function in Pandas?
a) To show the distribution of a single numerical variable.
b) To visualize the relationship between two numerical variables.
c) To compare the proportion of different categories in a dataset.
d) To identify missing values in a Data Frame
2. The code data2.isnull().sum() is used to:
a) Remove all missing values from the DataFrame data2.
b) Replace all missing values in data2 with the value 0.
c) Count and display the number of missing values in each column of data 2.
d) Create a histogram showing the frequency of missing values.
3. According to the notes, why is it generally "not advisable" to use dropna() to remove all
observations with missing values?
a) Because the dropna() function has syntax errors and doesn't work.
b) Because it can only be used on categorical variables, not numerical ones.
c) Because it can significantly reduce the dataset size and lead to less effective or biased
analysis.
d) Because it is slower than imputing the values with the mean.
4. The process of filling in missing values with estimated ones (like the mean, median, or a
specified value) is known as:
a) Deletion
b) Validation
c) Imputation
d) Visualization
S. Look at the following line of code:
loan['LoanAmou nt'].fillna( loan['Loa nAmount'].median(), inplace=True)
What is the result of executing this code?
a) It creates a median value for the 'LoanAmount' column.
b) It deletes all rows where the 'LoanAmount' value is missing.
c) It replaces all missing values in the 'LoanAmount' column with the median value of that
column.
d) It counts how many missing values are in the 'LoanAmount' column.
6. The equation Y = b0 + b1x + b2x2 is an example of which type of regression?
a) Linear Regression
b) Polynomial Regression