Credit Card Fraud Detection Using Machine Learning

Credit Card Fraud Detection Using Machine Learning

For years, fraud has been a major issue in sectors like banking, medical, insurance, and many others. Due to the increase in online transactions through different payment options, such as credit/debit cards, PhonePe, Gpay, Paytm, etc., fraudulent activities have also increased. Moreover, fraudsters or criminals have become very skilled in finding escapes so that they can loot more. Since no system is perfect and there is always a loophole, it has become a challenging task to make a secure system for authentication and preventing customers from fraud. So, Fraud detection algorithms are very useful for preventing fraud.

Here comes Machine Learning which can be used for creating a fraud detection algorithm that helps in solving these real-world problems.

Table of content

  1. Types of Internet Fraud
  2. Machine Learning in Credit Card Fraud Detection
  3. Challenges in Credit Card Fraud Detection
  4. Machine Learning Implementation

Types of Internet Fraud

1. Email Phishing

This is a fraud or cybercrime wherein attackers send fake sites and messages to users via email. These emails are seemingly legit and authentic that anyone can misjudge them and enter the vulnerable data that puts them at risk.

How to Prevent Email Phishing?

Verify Before You Trust: Avoid entering sensitive data in these emails until you verify their credentials.

Ignore Suspicious Emails: The best practice is to ignore emails or messages that flash on your screen if they seem suspicious.

2. Credit Card Fraud

Frauds related to payment are a common issue in modern banking card systems. Fraudsters use methods like stealing cards, creating counterfeit cards, or obtaining Card IDs to commit fraud.

How Payment Fraud Happens

Once fraudsters steal confidential data, they can:

  • Make purchases.
  • Apply for loans.
  • Exploit the victim’s financial information in various ways.

3. Identity Theft

Identity Theft occurs when attackers or cybercriminals hack into a victim’s account and access sensitive credentials, such as:

  • Name
  • Bank account details
  • Email address
  • Passwords

Identity theft can cause significant harm to victims and is a growing threat in the digital era.

Machine Learning in Credit Card Fraud Detection

Let us say that you are using your credit card to buy some products from an online platform or book tickets for a movie. Now, think someone stole your credit card information and tries to buy the products that you did not approve of. This is what we call credit card fraud and it is one of the big problems that people and banks face.

With the help of machine learning, banks can catch these frauds before they can do serious damage. To achieve this, we will simply collect the data that banks provide. Most of the data will be PII data (Personally Identifiable Information), which ensures that the features do not reveal anyone’s identity.

Challenges in Credit Card Fraud Detection

We know that credit card fraud is not going to be very frequent, especially if we talk about current scenarios where UPI/online transactions are at their peak. Building a model that predicts fraudulent transactions is somewhat challenging. Major challenges are mentioned below.

  1. A large amount of data is being processed every single day. Building a model that would be fast enough to give a response in time is difficult.
  2. As we mentioned above, out of the entire data, most of the transactions would not be a fraud transaction. This makes it hard to detect fraud data as we have more samples of non-fraud data.
  3. Data can be misclassified as we are not aware of whether a transaction is identified as a fraudulent transaction or not.

We know that every lock has its key. Similarly, even if we build an optimised model, scammers will use some adaptive techniques against our model.

Machine Learning Implementation

1. Import the necessary libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

2. Load the data

df = pd.read_csv('creditcard.csv')

df.head()
Load the data

3. Clean Null Values

print(df.isnull().sum().sum())

df.dropna(inplace = True)

4. Exploring Data

print(df.shape)

df.info()

df.describe()
Exploring Data

5. Calculate the number of genuine and fraud transactions

genuine_transactions = df[df['Class'] == 0]  # assuming 'Class' indicates fraud (1) or genuine (0)
fraud_transactions = df[df['Class'] == 1]

num_genuine = len(genuine_transactions)
num_fraud = len(fraud_transactions)

fraud_percentage = (num_fraud / len(df)) * 100
print(f"Number of genuine transactions: {num_genuine}")
print(f"Number of fraud transactions: {num_fraud}")
print(f"Percentage of fraud transactions: {fraud_percentage}%")
genuine and fraud transactions

6. Correlation map

plt.figure(figsize=(20, 6))

numData = df.select_dtypes(include=[int,float])

corrMat = numData.corr()

sns.heatmap(corrMat,cmap='Blues')

plt.show()
Correlation map

7. Standardizing the Data

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()

df['NormalizedAmount'] = scaler.fit_transform(df[['Amount']])

8. Split the dataset into training and testing sets

from sklearn.model_selection import train_test_split

X = df.drop(['Class'], axis=1)

y = df['Class'] 

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

9. Performing Random Forest

from sklearn.ensemble import RandomForestClassifier

rf_model = RandomForestClassifier(n_estimators= 100)

rf_model.fit(X_train, y_train)

rf_pred = rf_model.predict(X_test)

print("Random Forest Predictions:", rf_pred)

random_forest_score = rf_model.score(X_test, y_test) * 100

print("Random Forest Score: ", random_forest_score)
Performing Random Forest

10. Check the performance metrics

from sklearn.metrics import classification_report

print("Random Forest Performance Metrics:\n", classification_report(y_test, rf_pred))
Check the performance metrics

11. ROC Curve

from sklearn.metrics import roc_curve, auc

rf_probs = rf_model.predict_proba(X_test)[:, 1]

rf_fpr, rf_tpr, _ = roc_curve(y_test, rf_probs)

rf_auc = auc(rf_fpr, rf_tpr)

plt.plot(rf_fpr, rf_tpr, label=f"Random Forest (AUC = {rf_auc:.2f})")

plt.plot([0, 1], [0, 1], 'k--') 

plt.title("ROC Curve")

plt.xlabel("False Positive Rate")

plt.ylabel("True Positive Rate")

plt.legend()

plt.show()
ROC Curve

12. Precision Recall Curve

from sklearn.metrics import precision_recall_curve

rf_precision, rf_recall, _ = precision_recall_curve(y_test, rf_probs)

plt.plot(rf_recall, rf_precision, label="Random Forest")

plt.title("Precision-Recall Curve")

plt.xlabel("Recall")

plt.ylabel("Precision")

plt.legend()
Precision Recall Curve

Notebook link: CreditCardFraud.ipynb

Dataset Link :  creditcard.csv

Conclusion

To catch fraud credit card transactions, machine learning algorithms have become a crucial tool, allowing banks to track unusual transactions on the spot. This ensures we keep our money safe and try to adapt new techniques that are being used by fraudsters. It is not necessary that it will be this perfect always but it is a big step toward making online transactions more safe and secure.

About the Author

Principal Data Scientist

Meet Akash, a Principal Data Scientist with expertise in advanced analytics, machine learning, and AI-driven solutions. With a master’s degree from IIT Kanpur, Aakash combines technical knowledge with industry insights to deliver impactful, scalable models for complex business challenges.

EPGC Data Science Artificial Intelligence