Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
118 views3 pages

Fraud Detection with ML Algorithms

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
118 views3 pages

Fraud Detection with ML Algorithms

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Vidyavardhini’s College of Engineering & Technology

Department of Artificial Intelligence & Data Science

Experiment No. 6
Aim Fraud Detection using Machine Learning Algorithms.
Objective The primary objective of fraud detection using machine learning algorithms is to
accurately identify fraudulent activities or transactions within a dataset.
Theory Fraud detection using machine learning algorithms is a critical application across various
industries such as banking, insurance, e-commerce, and healthcare, among others.
Machine learning techniques offer powerful tools to identify patterns and anomalies in
data, which can help detect fraudulent activities more effectively than traditional rule-
based systems. Here's a general framework for implementing fraud detection using
machine learning algorithms:

1. Data Collection and Preprocessing:


• Collect relevant data sources such as transaction logs, user profiles, device
information, etc.
• Preprocess the data to handle missing values, outliers, and inconsistencies.
• Feature engineering: Create relevant features from the raw data that can help
in distinguishing between normal and fraudulent transactions. This might
include transaction amount, frequency, location, time of day, user behavior
patterns, etc.
2. Data Splitting:
• Split the dataset into training, validation, and testing sets. The training set is
used to train the model, the validation set is used for hyperparameter tuning,
and the testing set is used for evaluating the model's performance.
3. Model Selection:
• Choose appropriate machine learning algorithms based on the nature of the
problem and the characteristics of the dataset. Commonly used algorithms
for fraud detection include:
• Logistic Regression
• Random Forest
• Support Vector Machines (SVM)
• Neural Networks
4. Model Training:
• Train the selected machine learning models using the training data.
• Fine-tune hyperparameters using the validation set to optimize the model's
performance.
5. Model Evaluation:
• Evaluate the trained models using the testing set.
• Metrics for evaluation might include accuracy, precision, recall, F1-score,
ROC-AUC, and lift curves.
6. Model Deployment:
• Deploy the trained model into the production environment where it can be
used to detect fraud in real-time or batch processing.

CSDOL8011-AI for financial & Banking application Lab


Vidyavardhini’s College of Engineering & Technology
Department of Artificial Intelligence & Data Science

• Implement monitoring mechanisms to track the model's performance over


time and retrain it periodically to adapt to changing patterns of fraudulent
behavior.
7. Continuous Improvement:
• Regularly update and improve the model based on new data, feedback, and
evolving fraud patterns.
• Incorporate domain knowledge and feedback from fraud analysts to enhance
the model's effectiveness.

Program
Program: Email Spam Detection using Random Forest Algorithm
import numpy as np
import pandas as pd
import joblib
from nltk.corpus import stopwords
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import average_precision_score

# Load the dataset


df = pd.read_csv(r"D:\VCET\SEM 8\AIFB\fraud_email_.csv")

# Drop rows with NaN values


df = df.dropna()

# Download NLTK stopwords


import nltk
nltk.download('stopwords')

# Define stopwords as a list


stopset = stopwords.words("english")

# Initialize TfidfVectorizer with stop words list


vectorizer = TfidfVectorizer(stop_words=stopset, binary=True)

# Extract feature column 'Text'


X = vectorizer.fit_transform(df.Text)

# Extract target column 'Class'


y = df.Class

# Split the dataset into train and test sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, train_size=0.80,
random_state=42)

# Initialize RandomForestClassifier

CSDOL8011-AI for financial & Banking application Lab


Vidyavardhini’s College of Engineering & Technology
Department of Artificial Intelligence & Data Science

clf = RandomForestClassifier(n_estimators=15)

# Train the classifier


clf.fit(X_train, y_train)

# Save the trained model to a file


joblib.dump(clf, 'fraud_detection_model.pkl')

# Load the saved model


loaded_model = joblib.load('fraud_detection_model.pkl')

# Test the model with a given email input


def test_email(input_email):
# Preprocess the input email
input_email_vectorized = vectorizer.transform([input_email])
# Predict the probability of fraud
fraud_probability = loaded_model.predict_proba(input_email_vectorized)[:, 1]
return fraud_probability

# Example usage:
input_email = "MR. CHEUNG PUIHANG SENG BANK LTD.DES VOEUX RD.
BRANCH,CENTRAL HONG KONG,HONG KONG. Let me start by introducing
myself. I am Mr.Cheung Pui director of operations of the Hang Seng Bank Ltd. I have
an urgent business suggestion for you.I honestly apologize and hope I do not cause you
much embarrassment by contacting you through this means for a transaction of this
magnitude,but this is due to confidentiality and prompt access reposed on this medium."

fraud_probability = test_email(input_email)
print("Probability of fraud:", fraud_probability)

Output

Conclusion

CSDOL8011-AI for financial & Banking application Lab

You might also like