For years, fraud has been a major issue in sectors like banking, medical, insurance, and many others. Due to the increase in online transactions through different payment options, such as credit/debit cards, PhonePe, Gpay, Paytm, etc., fraudulent activities have also increased. Moreover, fraudsters or criminals have become very skilled in finding escapes so that they can loot more. Since no system is perfect and there is always a loophole, it has become a challenging task to make a secure system for authentication and preventing customers from fraud. So, Fraud detection algorithms are very useful for preventing fraud.
Here comes Machine Learning which can be used for creating a fraud detection algorithm that helps in solving these real-world problems.
Table of content
- Types of Internet Fraud
- Machine Learning in Credit Card Fraud Detection
- Challenges in Credit Card Fraud Detection
- Machine Learning Implementation
Types of Internet Fraud
1. Email Phishing
This is a fraud or cybercrime wherein attackers send fake sites and messages to users via email. These emails are seemingly legit and authentic that anyone can misjudge them and enter the vulnerable data that puts them at risk.
How to Prevent Email Phishing?
Verify Before You Trust: Avoid entering sensitive data in these emails until you verify their credentials.
Ignore Suspicious Emails: The best practice is to ignore emails or messages that flash on your screen if they seem suspicious.
2. Credit Card Fraud
Frauds related to payment are a common issue in modern banking card systems. Fraudsters use methods like stealing cards, creating counterfeit cards, or obtaining Card IDs to commit fraud.
How Payment Fraud Happens
Once fraudsters steal confidential data, they can:
- Make purchases.
- Apply for loans.
- Exploit the victim’s financial information in various ways.
3. Identity Theft
Identity Theft occurs when attackers or cybercriminals hack into a victim’s account and access sensitive credentials, such as:
- Name
- Bank account details
- Email address
- Passwords
Identity theft can cause significant harm to victims and is a growing threat in the digital era.
Machine Learning in Credit Card Fraud Detection
Let us say that you are using your credit card to buy some products from an online platform or book tickets for a movie. Now, think someone stole your credit card information and tries to buy the products that you did not approve of. This is what we call credit card fraud and it is one of the big problems that people and banks face.
With the help of machine learning, banks can catch these frauds before they can do serious damage. To achieve this, we will simply collect the data that banks provide. Most of the data will be PII data (Personally Identifiable Information), which ensures that the features do not reveal anyone’s identity.
Challenges in Credit Card Fraud Detection
We know that credit card fraud is not going to be very frequent, especially if we talk about current scenarios where UPI/online transactions are at their peak. Building a model that predicts fraudulent transactions is somewhat challenging. Major challenges are mentioned below.
- A large amount of data is being processed every single day. Building a model that would be fast enough to give a response in time is difficult.
- As we mentioned above, out of the entire data, most of the transactions would not be a fraud transaction. This makes it hard to detect fraud data as we have more samples of non-fraud data.
- Data can be misclassified as we are not aware of whether a transaction is identified as a fraudulent transaction or not.
We know that every lock has its key. Similarly, even if we build an optimised model, scammers will use some adaptive techniques against our model.
Machine Learning Implementation
1. Import the necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
2. Load the data
df = pd.read_csv('creditcard.csv')
df.head()
3. Clean Null Values
print(df.isnull().sum().sum())
df.dropna(inplace = True)
4. Exploring Data
print(df.shape)
df.info()
df.describe()
5. Calculate the number of genuine and fraud transactions
genuine_transactions = df[df['Class'] == 0] # assuming 'Class' indicates fraud (1) or genuine (0)
fraud_transactions = df[df['Class'] == 1]
num_genuine = len(genuine_transactions)
num_fraud = len(fraud_transactions)
fraud_percentage = (num_fraud / len(df)) * 100
print(f"Number of genuine transactions: {num_genuine}")
print(f"Number of fraud transactions: {num_fraud}")
print(f"Percentage of fraud transactions: {fraud_percentage}%")
6. Correlation map
plt.figure(figsize=(20, 6))
numData = df.select_dtypes(include=[int,float])
corrMat = numData.corr()
sns.heatmap(corrMat,cmap='Blues')
plt.show()
7. Standardizing the Data
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
df['NormalizedAmount'] = scaler.fit_transform(df[['Amount']])
8. Split the dataset into training and testing sets
from sklearn.model_selection import train_test_split
X = df.drop(['Class'], axis=1)
y = df['Class']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
from sklearn.ensemble import RandomForestClassifier
rf_model = RandomForestClassifier(n_estimators= 100)
rf_model.fit(X_train, y_train)
rf_pred = rf_model.predict(X_test)
print("Random Forest Predictions:", rf_pred)
random_forest_score = rf_model.score(X_test, y_test) * 100
print("Random Forest Score: ", random_forest_score)
from sklearn.metrics import classification_report
print("Random Forest Performance Metrics:\n", classification_report(y_test, rf_pred))
11. ROC Curve
from sklearn.metrics import roc_curve, auc
rf_probs = rf_model.predict_proba(X_test)[:, 1]
rf_fpr, rf_tpr, _ = roc_curve(y_test, rf_probs)
rf_auc = auc(rf_fpr, rf_tpr)
plt.plot(rf_fpr, rf_tpr, label=f"Random Forest (AUC = {rf_auc:.2f})")
plt.plot([0, 1], [0, 1], 'k--')
plt.title("ROC Curve")
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.legend()
plt.show()
12. Precision Recall Curve
from sklearn.metrics import precision_recall_curve
rf_precision, rf_recall, _ = precision_recall_curve(y_test, rf_probs)
plt.plot(rf_recall, rf_precision, label="Random Forest")
plt.title("Precision-Recall Curve")
plt.xlabel("Recall")
plt.ylabel("Precision")
plt.legend()
Notebook link: CreditCardFraud.ipynb
Dataset Link : creditcard.csv
Conclusion
To catch fraud credit card transactions, machine learning algorithms have become a crucial tool, allowing banks to track unusual transactions on the spot. This ensures we keep our money safe and try to adapt new techniques that are being used by fraudsters. It is not necessary that it will be this perfect always but it is a big step toward making online transactions more safe and secure.