Adversarial Autoencoder Data Synthesis for Enhancing Machine Learning-Based Phishing Detection Algorithms

NCJ Number

308071

Journal

IEEE Transactions on Services Computing Volume: 16 Issue: 4 Dated: July-Aug 2023 Pages: 2411-2422

Author(s)

Hossein Shirazi; Shashika R. Muramudalige; Indrakshi Ray; Anura P. Jayasumana; Haonan Wang

Date Published

July 2023

Length

12 pages

Annotation

This study demonstrates how to implement classifiers with higher performance and more resistance to adversarial attacks using two Generative Adversarial Network (GAN) based approaches that synthesize phishing and legitimate samples to mimic real-world websites.

Abstract

Supervised machine learning is often used to detect phishing websites, but the scarcity of phishing data for training purposes limits the classifier's performance, and machine learning algorithms are prone to adversarial attacks: small perturbations on attack data can bypass the classifier. These problems make machine learning less effective for phishing detection. Using both real and synthesized data, the authors of this study demonstrate how to implement classifiers with higher performance and more resistance to adversarial attacks. The authors propose two Generative Adversarial Network (GAN) based approaches that synthesize phishing and legitimate samples to mimic real-world websites. The authors propose a set of hypotheses and validate them through experiments to demonstrate: (i) indistinguishability of synthesized samples from actual ones, (ii) susceptibility of classifiers to adversarial attacks, (iii) mitigating adversarial attacks by training on larger datasets that include correctly labeled synthesized samples, and (iv) better performance of classifiers trained on large datasets. The authors’ AAE and WGAN have been trained on a wide range of datasets, making them optimistic about its widespread applicability. Information about real-world datasets is obtained from ten publicly available phishing datasets which are used by the AAE (Adversarial Autoencoder) and WGAN (Wasserstein GAN) for generating synthetic data. (Published Abstract Provided)

Date Published: July 1, 2023

Downloads

HTML

Adversarial Autoencoder Data Synthesis for Enhancing Machine Learning-Based Phishing Detection Algorithms

Downloads

Related Topics

Similar Publications

Adversarial Autoencoder Data Synthesis for Enhancing Machine Learning-Based Phishing Detection Algorithms

Additional Details

Downloads

Related Topics

Similar Publications