Knowee
Questions
Features
Study Tools

Perform medical data analysis by creating statistical tests on a given data set.Medical data analysisIn this assignment, you will perform medical data analysis by creating statistical tests on a given data set. You will check which variables are potentially a cause of a patient's death.1. For categorical variables, you should perform the chi-squared test of independence between each categorical variable and death variable. Treat variables as categorical if dtype is int64.2. For numerical variables, perform two Shapiro–Wilk tests: one for each sample that was created by splitting the data by death variable.2.1. If p-values from Shapiro–Wilk tests indicate that both samples have a normal distribution (p-values greater than 0.05), perform the unpaired t-test with the parameter equal_var = False.2.2. Otherwise perform the Mann-Whitney U test.RequirementsImplement a function perform_tests which accepts one argument:data: a pandas DataFrame consisting of the following columns: death (an indicator of whether a patient died less than a year after the operation) and 17 other variables (either categorical or numerical) describing health condition after operation and taken medicaments.The function returns a dictionary with the following four keys:mann_whitney, ttest, chi_square: each of these consists of a list of tuples with (variable name, p-value from the corresponding test). For chi_square, these should be categorical variables; for mann_whitney, numerical variables that don't have a normal distribution; and for ttest, numerical variables with a normal distribution.shapiro_wilk: a list of tuples with (variable name, (p-value for sample with deaths=0, p-value for sample with deaths=1)). These should be all numerical variables.Round all p-values in the output to four decimal places.ExampleWith data limited to the following columns:example_data = data[["death", "Na+", "DBP", "PLT", "ivabradine", "MRA"]]so that example_data.head() looks as follows: death Na+ DBP PLT ivabradine MRA0 0 136.0 126.0 196.0 1 01 0 147.0 108.0 245.0 0 12 0 133.0 109.0 219.0 0 13 0 150.0 114.0 294.0 1 04 0 151.0 95.0 293.0 0 1the function perform_tests(example_data) will return:{'mann_whitney': [('Na+', 0.2143)], 'ttest': [('DBP', 0.0), ('PLT', 0.4739)], 'chi_square': [('ivabradine', 0.0144), ('MRA', 0.2884)], 'shapiro_wilk': [('Na+', (0.0, 0.0071)), ('PLT', (0.2361, 0.6935)), ('DBP', (0.5272, 0.3715))]}HintsUse the scipy.stats package to perform all tests.In addition to the Python 3.8 standard library you can use SciPy 1.5.2.If you would like to access CSV data sets locally you can download zipped files.

Question

Perform medical data analysis by creating statistical tests on a given data set.Medical data analysisIn this assignment, you will perform medical data analysis by creating statistical tests on a given data set. You will check which variables are potentially a cause of a patient's death.1. For categorical variables, you should perform the chi-squared test of independence between each categorical variable and death variable. Treat variables as categorical if dtype is int64.2. For numerical variables, perform two Shapiro–Wilk tests: one for each sample that was created by splitting the data by death variable.2.1. If p-values from Shapiro–Wilk tests indicate that both samples have a normal distribution (p-values greater than 0.05), perform the unpaired t-test with the parameter equal_var = False.2.2. Otherwise perform the Mann-Whitney U test.RequirementsImplement a function perform_tests which accepts one argument:data: a pandas DataFrame consisting of the following columns: death (an indicator of whether a patient died less than a year after the operation) and 17 other variables (either categorical or numerical) describing health condition after operation and taken medicaments.The function returns a dictionary with the following four keys:mann_whitney, ttest, chi_square: each of these consists of a list of tuples with (variable name, p-value from the corresponding test). For chi_square, these should be categorical variables; for mann_whitney, numerical variables that don't have a normal distribution; and for ttest, numerical variables with a normal distribution.shapiro_wilk: a list of tuples with (variable name, (p-value for sample with deaths=0, p-value for sample with deaths=1)). These should be all numerical variables.Round all p-values in the output to four decimal places.ExampleWith data limited to the following columns:example_data = data[["death", "Na+", "DBP", "PLT", "ivabradine", "MRA"]]so that example_data.head() looks as follows: death Na+ DBP PLT ivabradine MRA0 0 136.0 126.0 196.0 1 01 0 147.0 108.0 245.0 0 12 0 133.0 109.0 219.0 0 13 0 150.0 114.0 294.0 1 04 0 151.0 95.0 293.0 0 1the function perform_tests(example_data) will return:{'mann_whitney': [('Na+', 0.2143)], 'ttest': [('DBP', 0.0), ('PLT', 0.4739)], 'chi_square': [('ivabradine', 0.0144), ('MRA', 0.2884)], 'shapiro_wilk': [('Na+', (0.0, 0.0071)), ('PLT', (0.2361, 0.6935)), ('DBP', (0.5272, 0.3715))]}HintsUse the scipy.stats package to perform all tests.In addition to the Python 3.8 standard library you can use SciPy 1.5.2.If you would like to access CSV data sets locally you can download zipped files.

...expand
🧐 Not the exact question you are looking for?Go ask a question

Solution

This is a detailed assignment that requires you to perform medical data analysis using statistical tests on a given data set. The goal is to identify which variables could potentially cause a patient's death.

Here's a step-by-step guide on how to approach this:

  1. Import the necessary libraries: pandas for data manipulation and scipy.stats for statistical tests.

  2. Load your data into a pandas DataFrame.

  3. Identify the categorical variables in your data. These are the ones with dtype as int64. For each of these variables, perform a chi-squared test of independence with the death variable. You can use the chi2_contingency function from scipy.stats for this.

  4. Identify the numerical variables in your data. For each of these variables, split the data by the death variable and perform two Shapiro–Wilk tests, one for each sample. You can use the shapiro function from scipy.stats for this.

  5. Based on the p-values from the Shapiro–Wilk tests, decide which test to perform next. If the p-values indicate that both samples have a normal distribution (p-values greater than 0.05), perform an unpaired t-test with the parameter equal_var = False. You can use the ttest_ind function from scipy.stats for this. If the p-values indicate that the samples do not have a normal distribution, perform a Mann-Whitney U test. You can use the mannwhitneyu function from scipy.stats for this.

  6. Implement a function called perform_tests that accepts a pandas DataFrame as an argument and returns a dictionary with the p-values from the tests performed. The dictionary should have four keys: mann_whitney, ttest, chi_square, and shapiro_wilk. The values for these keys should be lists of tuples with the variable name and the p-value from the corresponding test.

  7. Round all p-values in the output to four decimal places.

Remember to check the requirements and the example provided in the assignment to ensure your function is working as expected.

This problem has been solved

Similar Questions

Prior to conducting a brain death exam, what factors must be considered? A. The family members agreed for the brain death determination exam to be started B. The patient's Core temperature >must be 36 oC C. No paralytics or sedatives have been given within 24 hours. D. The patient's GCS must be < 8

A study records the following patient characteristics; age, family history of cancer, blood type. This information is representative of what type of data? Question 11Select one:a.Categorical, continuous, continuousb.Categorical, discrete, continuousc.Numerical, discrete, ordinald.Numerical, continuous, nominale.Numerical, categorical, nominal

What Medical Imaging Examinations are use for Brain Death Determination? A. Radionuclide Imaging of the brain B. All of the above C. Cerebral Angiography (4 vessels) of the brain D. Computed Tomography of the brain E. Magnetic Resonance Imaging of the brain

Different types of Data Analysis

Detail the significance of different data types in Python. Provide examples of at least three datatypes and scenarios where each is appropriately used.

1/1

Upgrade your grade with Knowee

Get personalized homework help. Review tough concepts in more detail, or go deeper into your topic by exploring other relevant questions.