A Reliable Click-Fraud Detection System For The Investigation of Fraudulent Publishers In online Advertising

Singh, Lokesh; Sisodia, Deepti; Shashvat, Kumar; Kaur, Arshpreet; Sharma, Prakash Chandra

Please use this identifier to cite or link to this item: https://gnanaganga.inflibnet.ac.in:8443/jspui/handle/123456789/2542

Title:	A Reliable Click-Fraud Detection System For The Investigation of Fraudulent Publishers In online Advertising
Authors:	Singh, Lokesh Sisodia, Deepti Shashvat, Kumar Kaur, Arshpreet Sharma, Prakash Chandra
Keywords:	Pay-per-click (PPC) Online advertising Fraudulent publishers Click-fraud detection (CFD) Gradient tree boosting (GTB)
Issue Date:	2023
Publisher:	CRC Press
Citation:	Chapter 13; pp. 221-254
Abstract:	In the pay-per-click (PPC) model of online advertising, an advertiser pays an amount to the publishers for every click generated on the published advertisement, which results in click fraud. Click fraud is deliberate clicking by a publisher on the advert. The highly skewed class distribution of the dataset makes the identification of fraudsters more challenging for current machine learning methods. This work thus proposes a reliable click-fraud detection (CFD) system for the efficient investigation of fraudulent publishers. The proposed CFD system has many novel features. First, the problem of class imbalance is overcome using the synthetic minority oversampling technique (SMOTE) and random under-sampling (RUSBOOST). Second, a novel Hybrid-Manifold Feature Subset Selection (H-MFSS) is proposed to obtain optimal informative features. Third, the gradient tree boosting (GTB) model addresses the challenges encountered in investigating and classifying the behavior of fraudsters from balanced and optimally selected user-click data. Experiments are conducted on FDMA2012 mobile advertising user-click data in dual mode: with all features (original data and data sampled through data sampling methods); and with selected features (original data and data sampled through data sampling methods). Classification bias towards the majority class is avoided by evaluating the performance of the models using the average precision (AP), recall (SE), specificity (SP), and G-mean (GM) metrics rather than accuracy. The efficacy of the proposed GTB model is further evaluated by comparing the performance with 12 other conventional machine learning models. The empirical results prove that GTB generalizes well with an achieved AP score of 64.86% without sampling, 65.25% with RUSBoost and 66.78% with SMOTE using significant selected features. A significant improvement in the classification performance is achieved with the impact of sampling methods and selected optimal features. © 2023 selection and editorial matter, Sulabh Bansal, Prakash Chandra Sharma, Abhishek Sharma and Jieh-Ren Chang individual chapters, the contributors.
URI:	https://doi.org/10.1201/9781003415466-13 http://gnanaganga.inflibnet.ac.in:8080/jspui/handle/123456789/2542
ISBN:	9781000917918 9781032392769
Appears in Collections:	Book/ Book Chapters

Files in This Item:

There are no files associated with this item.

Show full item record

Alliance University, Bengaluru

Institutional Repository