Text cell <DWCaVVK_9e6U># %% [markdown]## using modulesCode cell <Y1vg9vhu8cJo># %% [code]import numpy as npimport pandas as pdimport seaborn as snsimport matplotlib.pyplot as pltimport tensorflow as tffrom tensorflow import kerasfrom sklearn.model_selection import train_test_splitfrom sklearn.preprocessing import StandardScalerfrom google.colab import drivedrive.mount('/content/drive')Execution output from 4 Nov 2023 17:120KB Stream Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).Text cell <6cY0JG6H9tSC># %% [markdown]## reading CSV file from my driveCode cell <rOrjEm6X8lGA># %% [code]df = pd.read_csv('/content/drive/My Drive/Colab Notebooks/CustomerChurn_dataset.csv')Text cell <Xw-OKlOY904G># %% [markdown]## Creating dataframe and showing itCode cell <syN2an5l9_4s># %% [code]dataframe = pd.DataFrame(df)dataframeExecution output from 4 Nov 2023 17:2818KB text/plain customerID gender SeniorCitizen Partner Dependents tenure \ 0 7590-VHVEG Female 0 Yes No 1 1 5575-GNVDE Male 0 No No 34 2 3668-QPYBK Male 0 No No 2 3 7795-CFOCW Male 0 No No 45 4 9237-HQITU Female 0 No No 2 ... ... ... ... ... ... ... 7038 6840-RESVB Male 0 Yes Yes 24 7039 2234-XADUH Female 0 Yes Yes 72 7040 4801-JZAZL Female 0 Yes Yes 11 7041 8361-LTMKD Male 1 Yes No 4 7042 3186-AJIEK Male 0 No No 66 PhoneService MultipleLines InternetService OnlineSecurity ... \ 0 No No phone service DSL No ... 1 Yes No DSL Yes ... 2 Yes No DSL Yes ... 3 No No phone service DSL Yes ... 4 Yes No Fiber optic No ... ... ... ... ... ... ... 7038 Yes Yes DSL Yes ... 7039 Yes Yes Fiber optic No ... 7040 No No phone service DSL Yes ... 7041 Yes Yes Fiber optic No ... 7042 Yes No Fiber optic Yes ... DeviceProtection TechSupport StreamingTV StreamingMovies Contract \ 0 No No No No Month-to-month 1 Yes No No No One year 2 No No No No Month-to-month 3 Yes Yes No No One year 4 No No No No Month-to-month ... ... ... ... ... ... 7038 Yes Yes Yes Yes One year 7039 Yes No Yes Yes One year 7040 No No No No Month-to-month 7041 No No No No Month-to-month 7042 Yes Yes Yes Yes Two year PaperlessBilling PaymentMethod MonthlyCharges TotalCharges \ 0 Yes Electronic check 29.85 29.85 1 No Mailed check 56.95 1889.5 2 Yes Mailed check 53.85 108.15 3 No Bank transfer (automatic) 42.30 1840.75 4 Yes Electronic check 70.70 151.65 ... ... ... ... ... 7038 Yes Mailed check 84.80 1990.5 7039 Yes Credit card (automatic) 103.20 7362.9 7040 Yes Electronic check 29.60 346.45 7041 Yes Mailed check 74.40 306.6 7042 Yes Bank transfer (automatic) 105.65 6844.5 Churn 0 No 1 No 2 Yes 3 No 4 Yes ... ... 7038 No 7039 No 7040 No 7041 Yes 7042 No [7043 rows x 21 columns]Code cell <dmrSkSJt-G0l># %% [code]dataframe.head()Execution output from 4 Nov 2023 17:2813KB text/plain customerID gender SeniorCitizen Partner Dependents tenure PhoneService \ 0 7590-VHVEG Female 0 Yes No 1 No 1 5575-GNVDE Male 0 No No 34 Yes 2 3668-QPYBK Male 0 No No 2 Yes 3 7795-CFOCW Male 0 No No 45 No 4 9237-HQITU Female 0 No No 2 Yes MultipleLines InternetService OnlineSecurity ... DeviceProtection \ 0 No phone service DSL No ... No 1 No DSL Yes ... Yes 2 No DSL Yes ... No 3 No phone service DSL Yes ... Yes 4 No Fiber optic No ... No TechSupport StreamingTV StreamingMovies Contract PaperlessBilling \ 0 No No No Month-to-month Yes 1 No No No One year No 2 No No No Month-to-month Yes 3 Yes No No One year No 4 No No No Month-to-month Yes PaymentMethod MonthlyCharges TotalCharges Churn 0 Electronic check 29.85 29.85 No 1 Mailed check 56.95 1889.5 No 2 Mailed check 53.85 108.15 Yes 3 Bank transfer (automatic) 42.30 1840.75 No 4 Electronic check 70.70 151.65 Yes [5 rows x 21 columns]Code cell <R-8ABeVP_11C># %% [code]dataframe.info()Execution output from 4 Nov 2023 17:281KB Stream <class 'pandas.core.frame.DataFrame'> RangeIndex: 7043 entries, 0 to 7042 Data columns (total 21 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 customerID 7043 non-null object 1 gender 7043 non-null object 2 SeniorCitizen 7043 non-null int64 3 Partner 7043 non-null object 4 Dependents 7043 non-null object 5 tenure 7043 non-null int64 6 PhoneService 7043 non-null object 7 MultipleLines 7043 non-null object 8 InternetService 7043 non-null object 9 OnlineSecurity 7043 non-null object 10 OnlineBackup 7043 non-null object 11 DeviceProtection 7043 non-null object 12 TechSupport 7043 non-null object 13 StreamingTV 7043 non-null object 14 StreamingMovies 7043 non-null object 15 Contract 7043 non-null object 16 PaperlessBilling 7043 non-null object 17 PaymentMethod 7043 non-null object 18 MonthlyCharges 7043 non-null float64 19 TotalCharges 7043 non-null object 20 Churn 7043 non-null object dtypes: float64(1), int64(2), object(18) memory usage: 1.1+ MBText cell <tcXJ1mNcB4Jt># %% [markdown]## DATA CLEANING'TotalCharges' is a numerical data, but it listed as object, lets convert itCode cell <jVv9I52_Bdz7># %% [code]# Convert 'TotalCharges' to a numeric typedataframe['TotalCharges'] = pd.to_numeric(dataframe['TotalCharges'], errors='coerce')Code cell <5mzpv67J_-dS># %% [code]# drop null valuedataframe = dataframe.dropna(axis=0, how='any')# drop customer iddataframe = dataframe.drop(['customerID'], axis=1)Code cell <RF8BnQZiB_B4># %% [code]dataframe.info()Execution output from 4 Nov 2023 17:281KB Stream <class 'pandas.core.frame.DataFrame'> Int64Index: 7032 entries, 0 to 7042 Data columns (total 20 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 gender 7032 non-null object 1 SeniorCitizen 7032 non-null int64 2 Partner 7032 non-null object 3 Dependents 7032 non-null object 4 tenure 7032 non-null int64 5 PhoneService 7032 non-null object 6 MultipleLines 7032 non-null object 7 InternetService 7032 non-null object 8 OnlineSecurity 7032 non-null object 9 OnlineBackup 7032 non-null object 10 DeviceProtection 7032 non-null object 11 TechSupport 7032 non-null object 12 StreamingTV 7032 non-null object 13 StreamingMovies 7032 non-null object 14 Contract 7032 non-null object 15 PaperlessBilling 7032 non-null object 16 PaymentMethod 7032 non-null object 17 MonthlyCharges 7032 non-null float64 18 TotalCharges 7032 non-null float64 19 Churn 7032 non-null object dtypes: float64(2), int64(2), object(16) memory usage: 1.1+ MBCode cell <75l5FaB5Ek99># %% [code]# Check unique values in all columnsfor col in dataframe.columns: unique_values = dataframe[col].unique() print(col, "\n", unique_values, "\n")Execution output from 4 Nov 2023 17:281KB Stream gender ['Female' 'Male'] SeniorCitizen [0 1] Partner ['Yes' 'No'] Dependents ['No' 'Yes'] tenure [ 1 34 2 45 8 22 10 28 62 13 16 58 49 25 69 52 71 21 12 30 47 72 17 27 5 46 11 70 63 43 15 60 18 66 9 3 31 50 64 56 7 42 35 48 29 65 38 68 32 55 37 36 41 6 4 33 67 23 57 61 14 20 53 40 59 24 44 19 54 51 26 39] PhoneService ['No' 'Yes'] MultipleLines ['No phone service' 'No' 'Yes'] InternetService ['DSL' 'Fiber optic' 'No'] OnlineSecurity ['No' 'Yes' 'No internet service'] OnlineBackup ['Yes' 'No' 'No internet service'] DeviceProtection ['No' 'Yes' 'No internet service'] TechSupport ['No' 'Yes' 'No internet service'] StreamingTV ['No' 'Yes' 'No internet service'] StreamingMovies ['No' 'Yes' 'No internet service'] Contract ['Month-to-month' 'One year' 'Two year'] PaperlessBilling ['Yes' 'No'] PaymentMethod ['Electronic check' 'Mailed check' 'Bank transfer (automatic)' 'Credit card (automatic)'] MonthlyCharges [29.85 56.95 53.85 ... 63.1 44.2 78.7 ] TotalCharges [ 29.85 1889.5 108.15 ... 346.45 306.6 6844.5 ] Churn ['No' 'Yes']Text cell <heRn8gcWIfqu># %% [markdown]## EXPLORATORY DATA ANALYSIS (EDA)essential functionCode cell <rY_244tLVKAb># %% [code]# Make sure the column name is 'Churn' and not 'churn' (case-sensitive)if 'Churn' in dataframe.columns: # Display statistical summary of the numeric columns print(dataframe.describe())Execution output from 4 Nov 2023 17:281KB Stream SeniorCitizen tenure MonthlyCharges TotalCharges count 7032.000000 7032.000000 7032.000000 7032.000000 mean 0.162400 32.421786 64.798208 2283.300441 std 0.368844 24.545260 30.085974 2266.771362 min 0.000000 1.000000 18.250000 18.800000 25% 0.000000 9.000000 35.587500 401.450000 50% 0.000000 29.000000 70.350000 1397.475000 75% 0.000000 55.000000 89.862500 3794.737500 max 1.000000 72.000000 118.750000 8684.800000Code cell <WR-uVJ6tG6-r># %% [code]# Display statistical summary of the numeric columnsprint(dataframe.describe())# Confirm the 'Churn' column is loadedif 'Churn' in dataframe.columns: # EDA # Visualize the distribution of Churn plt.figure(figsize=(6, 4)) sns.countplot(data=dataframe, x='Churn') plt.title('Churn Distribution') plt.show() # Explore the relationships between numerical features and Churn numeric_features = ['tenure', 'MonthlyCharges', 'TotalCharges'] for feature in numeric_features: plt.figure(figsize=(6, 4)) sns.boxplot(data=dataframe, x='Churn', y=feature) plt.title(f'{feature} vs. Churn') plt.show() # Explore the relationships between categorical features and Churn categorical_features = ['gender', 'SeniorCitizen', 'Partner', 'Dependents', 'PhoneService', 'MultipleLines', 'InternetService', 'OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 'TechSupport', 'StreamingTV', 'StreamingMovies', 'Contract', 'PaperlessBilling', 'PaymentMethod'] for feature in categorical_features: plt.figure(figsize=(8, 4)) sns.countplot(data=dataframe, x=feature, hue='Churn') plt.title(f'{feature} vs. Churn') plt.xticks(rotation=45) plt.show()else: print("The 'Churn' column is not present in the dataframe. Please check your data loading steps.")Execution output from 4 Nov 2023 18:08523KB Stream SeniorCitizen tenure MonthlyCharges TotalCharges count 7032.000000 7032.000000 7032.000000 7032.000000 mean 0.162400 32.421786 64.798208 2283.300441 std 0.368844 24.545260 30.085974 2266.771362 min 0.000000 1.000000 18.250000 18.800000 25% 0.000000 9.000000 35.587500 401.450000 50% 0.000000 29.000000 70.350000 1397.475000 75% 0.000000 55.000000 89.862500 3794.737500 max 1.000000 72.000000 118.750000 8684.800000 text/plain <Figure size 600x400 with 1 Axes> <Figure size 600x400 with 1 Axes> <Figure size 600x400 with 1 Axes> <Figure size 600x400 with 1 Axes> <Figure size 800x400 with 1 Axes> <Figure size 800x400 with 1 Axes> <Figure size 800x400 with 1 Axes> <Figure size 800x400 with 1 Axes> <Figure size 800x400 with 1 Axes> <Figure size 800x400 with 1 Axes> <Figure size 800x400 with 1 Axes> <Figure size 800x400 with 1 Axes> <Figure size 800x400 with 1 Axes> <Figure size 800x400 with 1 Axes> <Figure size 800x400 with 1 Axes> <Figure size 800x400 with 1 Axes> <Figure size 800x400 with 1 Axes> <Figure size 800x400 with 1 Axes> <Figure size 800x400 with 1 Axes> <Figure size 800x400 with 1 Axes>Code cell <wA0WcLpSO-Fp># %% [code]import matplotlib.pyplot as plt# Count the number of customers in each Churn categorychurn_counts = dataframe['Churn'].value_counts()# Create a pie chartplt.figure(figsize=(6, 6))plt.pie(churn_counts, labels=churn_counts.index, autopct='%1.1f%%', startangle=90)plt.title('Churn Distribution')plt.show()Execution output from 4 Nov 2023 18:0822KB text/plain <Figure size 600x600 with 1 Axes>Code cell <kUsPDK0sWWK-># %% [code]print(dataframe.columns)Execution output from 4 Nov 2023 18:100KB Stream Index(['gender', 'SeniorCitizen', 'Partner', 'Dependents', 'tenure', 'PhoneService', 'MultipleLines', 'InternetService', 'OnlineSecurity', 'OnlineBackup', 'DeviceProtection', 'TechSupport', 'StreamingTV', 'StreamingMovies', 'Contract', 'PaperlessBilling', 'PaymentMethod', 'MonthlyCharges', 'TotalCharges', 'Churn'], dtype='object')Code cell <61Q0pPp1PrVH># %% [code]Code cell <KZ10D9ifUnAt># %% [code]do the feature selectionChatGPTIt seems like you've performed data cleaning and exploratory data analysis (EDA) on your dataset. The next step is feature selection. Here's a brief overview of how you can proceed with feature selection:Remove Uninformative Features: Remove any features that are not informative or redundant. For example, you might remove the "customerID" column as it doesn't provide predictive information.Handle Categorical Features: You'll need to convert categorical features into numerical representations. You can use techniques like one-hot encoding or label encoding for this purpose.Select Features: Based on your EDA, select the features that are most relevant for predicting churn. You can consider using techniques like correlation analysis to identify features that are highly correlated with the target variable (Churn). You might also perform feature importance analysis using machine learning models.Train-Test Split: Before training a machine learning model, split your dataset into training and testing sets. This allows you to evaluate the model's performance on unseen data.Scaling: Scale your numerical features. This is important for some machine learning algorithms, especially those based on distances or gradients (e.g., SVM, K-Means).Model Building: Build and train a machine learning model using your selected features. You can start with a simple model like Logistic Regression and then try more complex models like Random Forest or Gradient Boosting.Evaluate the Model: Assess the model's performance using metrics like accuracy, precision, recall, F1-score, and ROC-AUC. Make sure to evaluate it on the test set to ensure it generalizes well.Iterate: Depending on your model's performance, you may need to iterate through feature selection, model tuning, and data preprocessing steps to improve results.Without the specific code, it's challenging to provide the exact implementation. If you have any specific questions or need help with a particular aspect of feature selection or model building, please let me know.

Question

Question

Solution

Similar Questions

Upgrade your grade with Knowee