Knowee
Questions
Features
Study Tools

Write a function named clean_mean_df that will take as its input the name of a Seaborn dataset contained in a CSV file and the name of a dataset column containing numerical data. Your function will: read the dataset into a pandas dataframe;remove any missing values from the chosen column; andcompute the mean of the cleaned column. Finally, your function will return the total number of values removed from the column and the computed mean. You may assume import pandas as pd has been invoked.Function call syntax: tr, mn = clean_mean_df(name=dataset_name, col_name=data_col), where:dataset_name = a text string containing the name of the CSV file containing the datasetdata_col = a text string containing a column containing numerical datatr = the total number of values removedmn = column mean after cleaningExample usage:dataset_filename = 'penguins.csv'data_col = 'bill_depth_mm'tr, mn = clean_mean_df(name=dataset_filename, col_name=data_col)should return the values 11 and 17.1648.Note: The data path for the CSV file location is already set for you in the answer preload.Answer:(penalty regime: 10, 20, ... %)

Question

Write a function named clean_mean_df that will take as its input the name of a Seaborn dataset contained in a CSV file and the name of a dataset column containing numerical data. Your function will: read the dataset into a pandas dataframe;remove any missing values from the chosen column; andcompute the mean of the cleaned column. Finally, your function will return the total number of values removed from the column and the computed mean. You may assume import pandas as pd has been invoked.Function call syntax: tr, mn = clean_mean_df(name=dataset_name, col_name=data_col), where:dataset_name = a text string containing the name of the CSV file containing the datasetdata_col = a text string containing a column containing numerical datatr = the total number of values removedmn = column mean after cleaningExample usage:dataset_filename = 'penguins.csv'data_col = 'bill_depth_mm'tr, mn = clean_mean_df(name=dataset_filename, col_name=data_col)should return the values 11 and 17.1648.Note: The data path for the CSV file location is already set for you in the answer preload.Answer:(penalty regime: 10, 20, ... %)

...expand
🧐 Not the exact question you are looking for?Go ask a question

Solution

Here is a Python function that fulfills the requirements:

import pandas as pd

def clean_mean_df(name, col_name):
    # Read the dataset into a pandas dataframe
    df = pd.read_csv(name)
    
    # Count the number of missing values in the chosen column
    total_removed = df[col_name].isnull().sum()
    
    # Remove any missing values from the chosen column
    df = df.dropna(subset=[col_name])
    
    # Compute the mean of the cleaned column
    mean = df[col_name].mean()
    
    return total_removed, mean

You can use this function as follows:

dataset_filename = 'penguins.csv'
data_col = 'bill_depth_mm'
tr, mn = clean_mean_df(name=dataset_filename, col_name=data_col)

This will return the total number of missing values removed from the column and the mean of the cleaned column.

This problem has been solved

Similar Questions

How do you calculate the mean of a dataset?

Test Expected Got import pandas as pddataset_filename = 'penguins.csv'data_col = 'bill_depth_mm'tr, mn = clean_mean_df(name=dataset_filename, col_name=data_col)dataset_path = '/var/lib/seaborn-data/'filename = dataset_path + dataset_filenamedf = pd.read_csv(filename)sbefore =df.shapedf.dropna(inplace = True)safter =df.shapetotal_removed = sbefore[0]-safter[0]mean_val = df[data_col].mean()print(total_removed==tr)print(abs(mn-mean_val)/mn>=0 and abs(mn-mean_val)/mn<.1)TrueTrueFalseTrueimport pandas as pddataset_filename = 'mpg.csv'data_col = 'weight'tr, mn = clean_mean_df(name=dataset_filename, col_name=data_col)dataset_path = '/var/lib/seaborn-data/'filename = dataset_path + dataset_filenamedf = pd.read_csv(filename)sbefore =df.shapedf.dropna(inplace = True)safter =df.shapetotal_removed = sbefore[0]-safter[0]mean_val = df[data_col].mean()print(total_removed==tr)print(abs(mn-mean_val)/mn>=0 and abs(mn-mean_val)/mn<.1)TrueTrueFalseTrueimport pandas as pddataset_filename = 'iris.csv'data_col = 'petal_length'tr, mn = clean_mean_df(name=dataset_filename, col_name=data_col)dataset_path = '/var/lib/seaborn-data/'filename = dataset_path + dataset_filenamedf = pd.read_csv(filename)sbefore =df.shapedf.dropna(inplace = True)safter =df.shapetotal_removed = sbefore[0]-safter[0]mean_val = df[data_col].mean()print(total_removed==tr)print(abs(mn-mean_val)/mn>=0 and abs(mn-mean_val)/mn<.1)TrueTrueTrueTrueYour code must pass all tests to earn any marks. Try again.

In Statistics, Mean (average) is a common term used to analyze the data. Thus, we should know how to program this Statistical term using Python. For this, write a Python program to calculate the sum and average (Mean) of n integer numbers (input from the user). Input 0 to finish the list.Sample:Input some integers to calculate their sum and average. Input 0 to exit .1516120Average and Sum of the above numbers are: 14.333333333333334 43.0

Which function would you use in Excel to calculate the mean of a given set of numbers?  A. MEDIAN  B. AVERAGE  C. MODE  D. STDEV

Select the correct term for the following definition: "The average of a data set, found by adding up all the data values and dividing by the number of values."

1/1

Upgrade your grade with Knowee

Get personalized homework help. Review tough concepts in more detail, or go deeper into your topic by exploring other relevant questions.