Stacked Generalization Architecture for Predicting Publisher Behaviour from Highly Imbalanced User-Click Data Set for Click Fraud Detection

Sisodia, Deepti; Sisodia, Dilip Singh

Please use this identifier to cite or link to this item: https://gnanaganga.inflibnet.ac.in:8443/jspui/handle/123456789/2025

Full metadata record

DC Field	Value	Language
dc.contributor.author	Sisodia, Deepti	-
dc.contributor.author	Sisodia, Dilip Singh	-
dc.date.accessioned	2023-11-09T09:12:07Z	-
dc.date.available	2023-11-09T09:12:07Z	-
dc.date.issued	2023-05-29	-
dc.identifier.issn	1882-7055	-
dc.identifier.issn	0288-3635	-
dc.identifier.uri	https://doi.org/10.1007/s00354-023-00218-1	-
dc.identifier.uri	http://gnanaganga.inflibnet.ac.in:8080/jspui/handle/123456789/2025	-
dc.description.abstract	In online advertising, a change in the publisher’s actual status label with every generated click shows the suspicious behaviour of the publisher. Furthermore, only a small proportion of the clicks generated by the publishers are invalid, resulting in class skewness in the dataset and a challenging issue for the conventional classification methods as they get biased towards the outnumbered class. This suspicious behaviour of publishers with an uneven class distribution ratio adversely affects the classifier’s performance and increases model complexities. Thus, developing machine-learning methods capable of producing efficacious predictive models towards detecting fraudulent publishers is pivotal. This paper’s novel stacked generalization framework comprises two stacked generalization architectures, one for resampling and the second for classification. The framework employs a stacked generalization approach using generalizers to improve the learning model’s performance in two steps: first, reducing the error rate of algorithms towards reducing the bias in a learning set. Second, the results obtained through level-0 generalizers are fed as input to the level-1 generalizer with stacked integrated output towards combining the predictions for improving the predictive performance. Broad experimentations are conducted on FDMA 2012 user click dataset using ten-fold cross-validation. The performance of the proposed architecture is generalized by performing experiments on eight other highly imbalanced benchmark datasets, and performance is measured using average precision, recall, and F1-score. Results empirically prove the superiority of the proposed architecture in the publisher's behaviour prediction and classification as legitimate or illegitimate.	en_US
dc.language.iso	en	en_US
dc.publisher	New Generation Computing	en_US
dc.subject	Click fraud	en_US
dc.subject	Class imbalance	en_US
dc.subject	Data sampling algorithms	en_US
dc.subject	Base models	en_US
dc.subject	Meta-model	en_US
dc.subject	Stacked generalization	en_US
dc.title	Stacked Generalization Architecture for Predicting Publisher Behaviour from Highly Imbalanced User-Click Data Set for Click Fraud Detection	en_US
dc.type	Article	en_US
Appears in Collections:	Journal Articles

Files in This Item:

There are no files associated with this item.

Show simple item record

Alliance University, Bengaluru

Institutional Repository