Scraping and Preprocessing Commercial Auction Data for Fraud Classification

Ahmad Alzahrani; Samira Sadaoui

arxiv: 1806.00656 · v2 · pith:RA6L4EOYnew · submitted 2018-06-02 · 💻 cs.LG · stat.ML

Scraping and Preprocessing Commercial Auction Data for Fraud Classification

Ahmad Alzahrani , Samira Sadaoui This is my paper

classification 💻 cs.LG stat.ML

keywords auctionsdatafraudauctiondatasetbiddingclassificationcommercial

0 comments

read the original abstract

In the last three decades, we have seen a significant increase in trading goods and services through online auctions. However, this business created an attractive environment for malicious moneymakers who can commit different types of fraud activities, such as Shill Bidding (SB). The latter is predominant across many auctions but this type of fraud is difficult to detect due to its similarity to normal bidding behaviour. The unavailability of SB datasets makes the development of SB detection and classification models burdensome. Furthermore, to implement efficient SB detection models, we should produce SB data from actual auctions of commercial sites. In this study, we first scraped a large number of eBay auctions of a popular product. After preprocessing the raw auction data, we build a high-quality SB dataset based on the most reliable SB strategies. The aim of our research is to share the preprocessed auction dataset as well as the SB training (unlabelled) dataset, thereby researchers can apply various machine learning techniques by using authentic data of auctions and fraud.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Knowledge Cascade: Reverse Knowledge Distillation on Nonparametric Multivariate Functional Estimation
stat.ME 2026-06 unverdicted novelty 7.0

KCas transfers student-selected smoothing parameters to full-sample teacher models via asymptotic scaling laws in smoothing splines and kernel methods, cutting computation while retaining performance guarantees.