In the field of Machine Learning, logistic regression is still the top choice for classification problems. It is simple yet efficient algorithm which produces accurate models in most of the cases. In its basic form, it uses the logistic function to calculate the probability score which helps to classify the binary dependent variable to its respective class. Logistic regression is the transformed form of the linear regression. In this post I have explained the end to end step involved in the classification machine learning problems using the logistic regression and also performed the detailed analysis of the model output with various performance parameters.
This post is more of practical exercise using python, hence if you want to brush-up the theoretical concept on logistic regression, then please refer my post on logistic regression using the link below.
Key Takeaways from this Post:
Problem: Predict whether the client will subscribe for the term deposit or not
- Exploratory Data Analysis on Banking Data
- Data Preprocessing:
- Handling Categorical Variable
- Oversampling using SMOTE
- Random Feature Elimination – RFE
- Model Building
- Understanding model Output
- Confusion Matrix
- ROC curve
import pandas as pd import numpy as np import matplotlib.pyplot as plt from sklearn.linear_model import LogisticRegression from sklearn.model_selection import train_test_split
import seaborn as sns sns.set(style="white") sns.set(style="whitegrid",color_codes = True)
data = pd.read_csv("banking.csv")
Exploratory Data Analysis
This is all for now. Hope you enjoyed reading.
Thank You, Happy Learning !!!