Knowee
Questions
Features
Study Tools

23. A data analyst is cleaning data in preparation of training a machine learning model. Whilst cleaning the data, she has observed that there are missing values in the data. Which of the following lines of code can she write to find the percentage of missing values in each column? - i. data.isnull().sum(axis = 1) / len(data) * 100- ii. data.isnull().sum(axis = 0) / len(data) * 100- iii. data.isnull().mean(axis = 1) * 100- iv. data.isnull().mean(axis = 0 ) * 100ii. and ivi. and iii.i onlyii. onlyiii. onlyiv. onlyNone of the above

Question

  1. A data analyst is cleaning data in preparation of training a machine learning model. Whilst cleaning the data, she has observed that there are missing values in the data. Which of the following lines of code can she write to find the percentage of missing values in each column? - i. data.isnull().sum(axis = 1) / len(data) * 100- ii. data.isnull().sum(axis = 0) / len(data) * 100- iii. data.isnull().mean(axis = 1) * 100- iv. data.isnull().mean(axis = 0 ) * 100ii. and ivi. and iii.i onlyii. onlyiii. onlyiv. onlyNone of the above
...expand
🧐 Not the exact question you are looking for?Go ask a question

Solution

The correct answer is "ii. and iv." These lines of code will calculate the percentage of missing values in each column of the data.

Here's why:

  • ii. data.isnull().sum(axis = 0) / len(data) * 100 : This line of code first identifies the null or missing values in the data (data.isnull()), then sums these up for each column (sum(axis = 0)). This sum is then divided by the total number of rows (len(data)) to get the proportion of missing values, which is then multiplied by 100 to convert it into a percentage.

  • iv. data.isnull().mean(axis = 0 ) * 100 : This line of code works similarly, but instead of summing the missing values, it calculates the mean. Since isnull() returns 1 for missing values and 0 for non-missing values, the mean of these will give the proportion of missing values, which is then converted into a percentage.

The other lines of code (i. and iii.) would calculate the percentage of missing values in each row, not each column, because they use axis = 1.

This problem has been solved

Similar Questions

When in the data preprocessing phase, how should one treat missing/null values?Remove the rows with missing values.Fill Missing values with the most common value in the column.Remove the whole column if it has more than 2% of the total size of the dataset as missing values.

In which of the following step the missing values are addressed ?  A. Data Cleaning  B. Data Collection  C. Data Arrangement  D. Data Gathering

Next, we need to check if the data contains any null values because missing values can disrupt the modelling process by causing errors or biases in our analysis.Write code that will check if our dataset contains any null values. If so, fill them with zero's. True or false. Our dataset contains no null values.Options:TrueFalse

Which of the following is NOT a recommended way of dealing with missing values:Group of answer choicesPut a NULL where there is a missing valueRemove the whole column if there are missing values in some row of that columnUse a model with predicts the missing value from the other fieldsRemove the whole row if there are missing values in some column of that row

18. A data scientist within an insurance company is training a model to predict the probability of claims on motor insurance book. The train data set has 5000 samples. One of the variable in the training data is the location. The experts in the company have adviced the data scientist that the location is an important variable in increasing or decreasing the chances of claiming. Upon analyzing the data, the data scientist observed that there are 550 samples where the location has missing values. Which of the following can the data scientist do to deal with problem that he has observed?drop the the location columns since it has more that 10% missing valuesdrop all the rows with missing valuesimpute missing values using the most frequent locationuse KNN imputer

1/2

Upgrade your grade with Knowee

Get personalized homework help. Review tough concepts in more detail, or go deeper into your topic by exploring other relevant questions.